WO2023070732A1 - Text recommendation method and apparatus based on deep learning, and related medium - Google Patents

Text recommendation method and apparatus based on deep learning, and related medium Download PDF

Info

Publication number
WO2023070732A1
WO2023070732A1 PCT/CN2021/129027 CN2021129027W WO2023070732A1 WO 2023070732 A1 WO2023070732 A1 WO 2023070732A1 CN 2021129027 W CN2021129027 W CN 2021129027W WO 2023070732 A1 WO2023070732 A1 WO 2023070732A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
milvus
database
information
vector
Prior art date
Application number
PCT/CN2021/129027
Other languages
French (fr)
Chinese (zh)
Inventor
钱启
王天星
杨东泉
程佳宇
Original Assignee
深圳前海环融联易信息科技服务有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海环融联易信息科技服务有限公司 filed Critical 深圳前海环融联易信息科技服务有限公司
Publication of WO2023070732A1 publication Critical patent/WO2023070732A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of computer software, in particular to a text recommendation method, device and related media based on deep learning.
  • Natural language processing is an important direction in the field of artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language.
  • natural language processing technology includes text processing, machine translation, semantic understanding, knowledge graph, intelligent question answering and other technologies.
  • text matching is a very important application direction of text processing, which plays a very important role in real life.
  • the development of this technology provides a feasible solution for users to search and match better in the sea of complicated information. In fact, text matching plays an important role in many practical scenarios.
  • the system needs to search the corpus for content as semantically similar as possible to the text to be matched, and return the matching result to the user.
  • the system needs to find the most similar question in the question answer database according to the question raised by the user, and return the answer corresponding to the similar question. In these scenarios, the accuracy of text matching directly affects the user experience.
  • the so-called text matching generally involves calculating the semantic similarity between two texts through an algorithm, and judging the matching degree between the two through the similarity. The higher the similarity value, the better the match. On the contrary, the more mismatched.
  • the current text matching mainly adopts more complex methods, and does not have dynamic scalability.
  • dynamic scalability means that the text database does not automatically expand, but needs to be expanded manually.
  • Embodiments of the present application provide a text recommendation method, device, computer equipment, and storage medium based on deep learning, aiming at improving the efficiency and accuracy of text recommendation.
  • the embodiment of the present application provides a text recommendation method based on deep learning, including:
  • the text feature vector is converted into Milvus vector index information, and stored in the Milvus database;
  • the sentence vector containing semantic information in the text to be matched is obtained through the twin neural network structure
  • the embodiment of the present application provides a text recommendation device based on deep learning, including:
  • the first vector generation unit is used to collect different types of text information to construct a text database, and generates a text feature vector for each text information in the text database through a twin neural network structure;
  • the first vector conversion unit is used to convert the text feature vector into Milvus vector index information, and store it in the Milvus database;
  • the second vector generation unit is used to obtain the sentence vector containing semantic information in the text to be matched through the twin neural network structure when the text to be matched is matched;
  • the text matching unit is used to select the top N pieces of Milvus vector index information with the highest semantic similarity in the Milvus database, and based on the correspondence between the Milvus vector index information and text feature vectors, select the corresponding top N pieces in the text database. N pieces of text information are used as matching results of the text to be matched.
  • an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the computer program, Realize the text recommendation method based on deep learning as described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program described in the first aspect is implemented.
  • the embodiment of the present application provides a deep learning-based text recommendation method, device, computer equipment, and storage medium, the method including: collecting different types of text information to construct a text database, and using a twin neural network structure to generate the text database Each text information in generates a text feature vector; the text feature vector is converted into Milvus vector index information, and stored in the Milvus database; when the text to be matched is matched, the text to be matched is obtained through the twin neural network structure Include the sentence vector of semantic information; Select the top N pieces of Milvus vector index information with the highest semantic similarity in the Milvus database, and based on the correspondence between Milvus vector index information and text feature vectors, select the corresponding The first N pieces of text information are used as the matching results of the text to be matched.
  • the embodiment of the present application solves the time-consuming and labor-intensive defect of matching the text to be matched with the text information one by one by constructing a text database and introducing the Milvus database, and the recommended matching process of this embodiment is simple to implement, has high accuracy and consumes a lot of effort. It doesn't take long. When recommending text, it can achieve fast retrieval, real-time feedback, and has the dynamic scalability of text data in the text database.
  • FIG. 1 is a schematic flow diagram of a text recommendation method based on deep learning provided in an embodiment of the present application
  • FIG. 2 is a schematic subflow diagram of a text recommendation method based on deep learning provided in an embodiment of the present application
  • FIG. 3 is a schematic block diagram of a text recommendation device based on deep learning provided by an embodiment of the present application
  • FIG. 4 is a sub-schematic block diagram of an apparatus for recommending text based on deep learning provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a text recommendation method based on deep learning provided by an embodiment of the present application, which specifically includes steps S101 to S104.
  • S101 Collect different types of text information to construct a text database, and generate a text feature vector for each text information in the text database through a twin neural network structure;
  • a text database is constructed by using different types of text information, and at the same time, a text feature vector is generated for the text information in the text database through a twin neural network structure. Then convert the generated text feature vectors into Milvus vector index information and store them in the Milvus database.
  • the corresponding sentence vector is also generated for the text to be matched through the twin neural network structure, and then the similarity between the sentence vector and each Milvus vector index information is calculated through the Milvus database, and one of them is selected.
  • the top N pieces of Milvus vector index information with the highest similarity can then select the corresponding text information in the text database as the matching result or recommendation result.
  • the text database is a CSV text database (that is, a text database in CSV format).
  • the specific steps of constructing the text database may be as follows: divide the texts according to the categories to be recommended, find several texts under the categories, and classify them according to the categories, and each category is a CSV file.
  • the column names of the CSV file content can be question, flag. Among them, question represents the content of the text, and flag represents the name of the category. In a CSV file, the name of the flag is uniform.
  • this embodiment in order to facilitate data modification and data cleaning, stores the text database in the MySQL database in the form of structured data.
  • the purpose of this is that when performing data cleaning, you can directly write Python scripts to operate MySQL data tables to update text data.
  • MySQL database Compared with the general use of CSV format files as text databases, MySQL database has the characteristics of intuitive display, flexible operation, and convenient data dynamic expansion in text databases.
  • this embodiment uses the Milvus database to store the characteristic information of the text database, so as to realize fast retrieval.
  • the so-called Milvus database is an open-source vector database that supports addition, deletion, modification, and near-real-time query and retrieval of TB-level vectors. It has the characteristics of high flexibility, stability, reliability, and high-speed query.
  • text features of two text messages are usually extracted, and then based on the extracted text features, it is judged whether the two text messages match.
  • the word vectors of the text information are often added directly, or directly combined with the weights of the words in the text information, and weighted to construct the text features of the text information.
  • the obtained text vector may be affected by individual words in the text, so the constructed text features cannot accurately reflect the semantics of the text information, resulting in a low matching accuracy.
  • the most commonly used method of representing sentence vectors is to average the vectors of the BERT output layer, or use the first word of the BERT output layer to represent, which will undoubtedly produce poor sentence encoding information.
  • the deep learning-based text recommendation method solves the time-consuming and labor-intensive defect of matching the text to be matched with the text information one by one by constructing a text database and introducing the Milvus database, and
  • the recommendation matching process of this embodiment is simple to implement, has high accuracy and does not take long time.
  • a request only takes about 30 milliseconds to return the result.
  • the step S101 includes:
  • the text information in the text database is first combined in pairs, and then for each combination of two text information, the two are respectively input into the same structure
  • the BERT network model and the average pooling layer and correspondingly obtain 2 encoding results.
  • the encoding result of this model is the obtained text feature vector with semantic information. It is worth noting that this Siamese neural network structure can generate fixed-size vectors for the input sentences, and the semantic information of these vectors can be used to calculate similarity.
  • this embodiment makes improvements based on the BERT network model.
  • the full name of the BERT network model is: Bidirectional Encoder Representations from Transformer, which is a pre-trained network.
  • the goal of the BERT network model is to use large-scale unlabeled corpus training to obtain the semantic representation of the text containing rich semantic information, and then fine-tune the semantic representation of the text in specific NLP tasks, and finally apply for this NLP task.
  • These tasks can include intelligent question answering, sentence classification, sentence pair representation, etc.
  • a major disadvantage of the BERT network model is that it does not calculate independent sentence codes, which makes it difficult to obtain good sentence codes through the BERT network model.
  • the improvement of this embodiment mainly lies in adding an average pooling operation after the output layer of the BERT network model.
  • the role of the pooling layer is feature translation invariance.
  • the advantage of this setting is that after adding the average pooling layer, the final output vector size is fixed for different sentences.
  • the step S102 includes:
  • the text feature vector when the text feature vector is converted into Milvus vector index information, the text feature vector is first normalized, and the specific steps of the normalization process are: input 2 text information, and then pass through the same structure respectively
  • the BERT network model and the average pooling layer obtain two encoding results respectively, and then normalize the two encoding results respectively to obtain a normalized text feature vector.
  • the normalized feature vectors are converted into Milvus vector index information and stored in the Milvus database to obtain Milvus vector information.
  • the text database and the Milvus database are corresponding (that is, the ID numbers of the two are exactly the same), which is convenient for returning the text information of the original text database after query, instead of only returning the difficult-to-recognize index information of Milvus.
  • the step S103 includes:
  • the vector size of the text semantic representation is fixed through an average pooling layer to obtain the sentence vector.
  • the twin neural network structure is used when generating the text feature vector and generating the sentence vector, when generating the text feature vector, since the twin neural network structure is provided with 2 inputs, 2 text information are input at the same time .
  • the twin neural network structure already has the feature representation ability to adapt to similar data after generating text feature vectors, it is only necessary to input the text to be matched separately, that is, to input the text to be matched into the BERT network in sequence Model and average pooling layers.
  • the step S104 includes:
  • the cosine similarity method is used to calculate the similarity between the sentence vector and the Milvus vector index information, so as to search and select the top N text matching results with similar semantics, that is, the top N N pieces of Milvus vector index information.
  • the corresponding text information can then be found in the text database.
  • the Milvus vector index information is sorted and selected according to the degree of confidence.
  • the container before starting the Milvus Docker container, it is necessary to modify the MySQL address in the configuration file and expose port 19530. Once started, the container automatically creates 4 Milvus metadata tables in the MySQL database. If the text matching model is updated, the Milvus index vector needs to be rebuilt.
  • the Milvus vector database and the Siamese neural network structure jointly build a semantic search engine for text recommendation.
  • the text recommendation based on deep learning further includes: steps S201-S204.
  • the sample label score value output by the Siamese neural network structure is set as a number 0-5.
  • the advantage of setting the sample label score in this way is that it can describe the similarity between two texts more finely, instead of the 0-1 label, there are only two cases of similarity and dissimilarity. Because how similar the two texts are, or exactly the same, it cannot be seen only through label 1.
  • the number 0 means that the semantics of text A and text B are completely different.
  • the number 5 means that the semantics of text A and text B are exactly the same.
  • Other numbers (such as: 1, 2, 3, 4) represent the degree of semantic similarity between the two sentences in the middle.
  • these label score values need to be divided by 5 to obtain a normalized score value.
  • the optimizer uses the Adam optimizer, and the learning rate is 2e-5.
  • the loss function used is the cosine similarity loss function.
  • the loss function can also use other loss functions, but compared with other functions, the cosine similarity loss function has more obvious advantages in terms of speed. Directly using cosine similarity to measure the similarity between two sentence vectors can greatly improve the reasoning speed.
  • the performance effect on the test set can be observed quantitatively and qualitatively.
  • the text recommendation based on deep learning also includes:
  • the corresponding update text feature vector is generated through the twin neural network structure, and the update text feature vector is converted into Milvus update vector index information and stored in the Milvus database.
  • the text information in the MySQL database may come from multiple tables, for example, the information in the text database contains 3 categories, and the information of these 3 categories can be passed through the specific content information of a field in the 3 database tables Data cleaning is obtained. Multiple tables represent different types of data sources. After the data of multiple tables is cleaned, the desired data can be obtained and stored in the text database. These tables are called data source tables. Assuming a situation is considered, after the text database is made, the data source table is still continuously adding data.
  • this embodiment therefore adds a timing synchronization stage, which aims to perform data cleaning on the newly added data (that is, the text update information) according to querying the data source table at a specific time, and synchronize the latest data to the text database.
  • the reason for data cleaning is that the data of the text database itself is cleaned, processed and extracted from the specific content information of a certain field in the database table.
  • the primary key ID number is returned.
  • the apscheduler timing framework in the Python language can be used to complete the execution of the timing task.
  • the data in the data source table is constantly updated. If there is no timing synchronization mechanism, the newly added data cannot be automatically stored in the text database, nor can the Milvus index be created in time and inserted into the Milvus vector database. Then, when querying, the text information cannot keep pace with the times.
  • FIG. 3 is a schematic block diagram of a text recommendation device 300 based on deep learning provided by an embodiment of the present application.
  • the device 300 includes:
  • the first vector generation unit 301 is used to collect different types of text information to construct a text database, and generate a text feature vector for each text information in the text database through a twin neural network structure;
  • the first vector conversion unit 302 is used to convert the text feature vector into Milvus vector index information, and store it in the Milvus database;
  • the second vector generation unit 303 is used to obtain the sentence vector containing semantic information in the text to be matched through the twin neural network structure when the text to be matched is matched;
  • the text matching unit 304 is used to select the top N pieces of Milvus vector index information with the highest semantic similarity in the Milvus database, and based on the correspondence between the Milvus vector index information and text feature vectors, select the corresponding in the text database.
  • the first N pieces of text information are used as the matching results of the text to be matched.
  • the first vector generation unit 301 includes:
  • the encoding output unit is used to combine the text information in the text database in pairs, and input the two text information in the combination into the BERT network model and the average pooling layer of the same structure respectively, and output them respectively to obtain An encoding result corresponding to the two text information, and then use the encoding result as a text feature vector corresponding to the two text information.
  • the first vector conversion unit 302 includes:
  • a normalization unit configured to perform normalization processing on the text feature vector to obtain a normalized text feature vector
  • the second vector conversion unit is used to convert the normalized text feature vector into Milvus vector index information.
  • the second vector generating unit 303 includes:
  • the text semantic representation acquisition unit is used to separately input the text to be matched into the BERT network model to obtain the text semantic representation corresponding to the text to be matched;
  • the vector fixing unit is used to fix the vector size of the text semantic representation through the average pooling layer to obtain the sentence vector.
  • the text matching unit 304 includes:
  • a similarity calculation unit for utilizing the cosine similarity method to calculate the similarity score between the sentence vector and each of the Milvus vector index information
  • the index information selection unit is used to select the first N Milvus vector index information with the highest similarity score.
  • the deep learning-based text recommendation device 300 further includes:
  • the training learning unit 402 is used to use the text data in the training set to train and learn the twin neural network structure, and set the hyperparameter batch size of the twin neural network structure to 16, and the learning rate is 2e-5;
  • An optimization evaluation unit 403 configured to optimize parameters of the twin neural network structure by using an Adam optimizer, and perform performance evaluation on the twin neural network structure by using a cosine similarity loss function;
  • a parameter update unit 404 configured to update the parameters of the optimized Siamese neural network structure using the text data in the test set.
  • the text recommendation device 300 based on deep learning also includes:
  • An update information acquiring unit configured to acquire text update information, and store the text update information in the text database after data cleaning
  • the update storage unit is used to generate a corresponding update text feature vector through the twin neural network structure according to the text update information in the text database, and store the update text feature vector into the Milvus update vector index information after converting it into the Milvus update vector index information.
  • Milvus database Milvus database.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored. When the computer program is executed, the steps provided in the above-mentioned embodiments can be realized.
  • the storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, and other media capable of storing program codes.
  • the embodiment of the present application also provides a computer device, which may include a memory and a processor.
  • a computer program is stored in the memory.
  • the processor invokes the computer program in the memory, the steps provided in the above embodiments can be implemented.
  • the computer equipment may also include components such as various network interfaces and power supplies.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed in the present application are a text recommendation method and apparatus based on deep learning, and a related medium. The method comprises: collecting different types of text information to construct a text database, and generating a text feature vector for each piece of text information in the text database by means of a siamese neural network structure; converting the text feature vector into Milvus vector index information, and storing same in a Milvus database; when matching is performed on text to be matched, acquiring a sentence vector, which includes semantic information, in said text by means of the siamese neural network structure; and selecting, from the Milvus database, the first N pieces of Milvus vector index information having the highest semantic similarity, and selecting on the basis of the correspondence between Milvus vector index information and text feature vectors, the corresponding first N pieces of text information from the text database to serve as a matching result of said text. According to the embodiments of the present application, a text database is constructed and a Milvus database is introduced, such that when text is recommended, rapid retrieval and real-time feedback can be achieved, and the accuracy is high.

Description

一种基于深度学习的文本推荐方法、装置及相关介质A text recommendation method, device and related media based on deep learning
本申请是以申请号为202111255426.8、申请日为2021年10月27日的中国专利申请为基础,并主张其优先权,该申请的全部内容在此作为整体引入本申请中。This application is based on a Chinese patent application with application number 202111255426.8 and a filing date of October 27, 2021, and claims its priority. The entire content of this application is hereby incorporated into this application as a whole.
技术领域technical field
本申请涉及计算机软件技术领域,特别涉及一种基于深度学习的文本推荐方法、装置及相关介质。The present application relates to the technical field of computer software, in particular to a text recommendation method, device and related media based on deep learning.
背景技术Background technique
随着科技的快速发展,机器学习领域在深度学习方向也取得了具有前景的迅猛发展。自然语言处理是人工智能领域中的一个重要方向,它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。通常来说,自然语言处理技术包括文本处理、机器翻译、语义理解、知识图谱、智能问答等技术。其中,文本匹配是文本处理的一个非常重要的应用方向,在现实生活中起到了十分重要的作用。与此同时,这一技术的发展,为用户在纷繁冗杂的信息海洋中进行比较好的检索、匹配提供了一个可行的方案。事实上,文本匹配在很多实际场景中都扮演着重要角色。比如,在搜索场景中,用户输入一条待匹配文本,系统需要去语料库中寻找与该待匹配文本尽可能语义相似的内容,并将匹配结果返回给用户。再比如,在智能问答系统中,用户提出一个问题,系统需根据用户提出的问题在问答库中找到最相似的问题,并返回该相似问题对应的答案。在这些场景中,文本匹配的准确性直接影响用户体验效果。With the rapid development of science and technology, the field of machine learning has also achieved promising rapid development in the direction of deep learning. Natural language processing is an important direction in the field of artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language. Generally speaking, natural language processing technology includes text processing, machine translation, semantic understanding, knowledge graph, intelligent question answering and other technologies. Among them, text matching is a very important application direction of text processing, which plays a very important role in real life. At the same time, the development of this technology provides a feasible solution for users to search and match better in the sea of complicated information. In fact, text matching plays an important role in many practical scenarios. For example, in a search scenario, when a user inputs a piece of text to be matched, the system needs to search the corpus for content as semantically similar as possible to the text to be matched, and return the matching result to the user. For another example, in the intelligent question answering system, when a user asks a question, the system needs to find the most similar question in the question answer database according to the question raised by the user, and return the answer corresponding to the similar question. In these scenarios, the accuracy of text matching directly affects the user experience.
所谓文本匹配,其过程一般是针对两个文本,通过算法计算二者语义相似度,通过相似度大小来判定二者的匹配度。相似度数值越高,越匹配。反之,越不匹配。当前文本匹配主要是采用较为复杂的方法,且不具备动态扩展性。这里,动态扩展性指文本资料库不自动进行扩充,需要人为手动扩充。The so-called text matching generally involves calculating the semantic similarity between two texts through an algorithm, and judging the matching degree between the two through the similarity. The higher the similarity value, the better the match. On the contrary, the more mismatched. The current text matching mainly adopts more complex methods, and does not have dynamic scalability. Here, dynamic scalability means that the text database does not automatically expand, but needs to be expanded manually.
申请内容application content
本申请实施例提供了一种基于深度学习的文本推荐方法、装置、计算机设备及存储介质,旨在提高文本推荐效率和精度。Embodiments of the present application provide a text recommendation method, device, computer equipment, and storage medium based on deep learning, aiming at improving the efficiency and accuracy of text recommendation.
第一方面,本申请实施例提供了一种基于深度学习的文本推荐方法,包括:In the first aspect, the embodiment of the present application provides a text recommendation method based on deep learning, including:
采集不同类别的文本信息构建文本资料库,通过孪生神经网络结构为所述文本资料库中的每一文本信息生成文本特征向量;Collect different types of text information to build a text database, and generate a text feature vector for each text information in the text database through a twin neural network structure;
将所述文本特征向量转换为Milvus向量索引信息,并存储至Milvus数据库中;The text feature vector is converted into Milvus vector index information, and stored in the Milvus database;
在对待匹配文本进行匹配时,通过孪生神经网络结构获取所述待匹配文本中包含语义信息的句子向量;When the text to be matched is matched, the sentence vector containing semantic information in the text to be matched is obtained through the twin neural network structure;
在所述Milvus数据库选取语义相似度最高的前N条Milvus向量索引信息,并基于Milvus向量索引信息和文本特征向量的对应关系,在所述文本资料库中选取对应的前N条文本信息作为待匹配文本的匹配结果。Select the top N pieces of Milvus vector index information with the highest semantic similarity in the Milvus database, and based on the correspondence between the Milvus vector index information and the text feature vector, select the corresponding top N pieces of text information in the text database as the waiting list The matching result for the matched text.
第二方面,本申请实施例提供了一种基于深度学习的文本推荐装置,包括:In the second aspect, the embodiment of the present application provides a text recommendation device based on deep learning, including:
第一向量生成单元,用于采集不同类别的文本信息构建文本资料库,通过孪生神经网络结构为所述文本资料库中的每一文本信息生成文本特征向量;The first vector generation unit is used to collect different types of text information to construct a text database, and generates a text feature vector for each text information in the text database through a twin neural network structure;
第一向量转换单元,用于将所述文本特征向量转换为Milvus向量索引信息,并存储至Milvus数据库中;The first vector conversion unit is used to convert the text feature vector into Milvus vector index information, and store it in the Milvus database;
第二向量生成单元,用于在对待匹配文本进行匹配时,通过孪生神经网络结构获取所述待匹配文本中包含语义信息的句子向量;The second vector generation unit is used to obtain the sentence vector containing semantic information in the text to be matched through the twin neural network structure when the text to be matched is matched;
文本匹配单元,用于在所述Milvus数据库选取语义相似度最高的前N条Milvus向量索引信息,并基于Milvus向量索引信息和文本特征向量的对应关系,在所述文本资料库中选取对应的前N条文本信息作为待匹配文本的匹配结果。The text matching unit is used to select the top N pieces of Milvus vector index information with the highest semantic similarity in the Milvus database, and based on the correspondence between the Milvus vector index information and text feature vectors, select the corresponding top N pieces in the text database. N pieces of text information are used as matching results of the text to be matched.
第三方面,本申请实施例提供了一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面所述的基于深度学习的文本推荐方法。In a third aspect, an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, Realize the text recommendation method based on deep learning as described in the first aspect.
第四方面,本申请实施例提供了一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面所述的基于深度学习的文本推荐方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program described in the first aspect is implemented. A text recommendation method based on deep learning.
本申请实施例提供了一种基于深度学习的文本推荐方法、装置、计算机设备及存储介质,该方法包括:采集不同类别的文本信息构建文本资料库,通过孪生神经网络结构为所述文本资料库中的每一文本信息生成文本特征向量;将所述文本特征向量转换为Milvus向量索引信息,并存储至Milvus数据库中;在对待匹配文本进行匹配时,通过孪生神经网络结构获取所述待匹配文本中包含语义信息的句子向量;在所述Milvus数据库选取语义相似度最高的前N条Milvus向量索引信息,并基于Milvus向量索引信息和文本特征向量的对应关系,在所述文本资料库中选取对应的前N条文本信息作为待匹配文本的匹配结果。本申请实施例通过构建文本资料库和引入Milvus数据库,解决了将待匹配文本与文本信息逐条匹配这一耗时耗力的缺陷,并且本实施例的推荐匹配过程实现简单、准确率高且耗时不长,在推荐文本时,能做到快速检索,实时反馈,且具备文本资料库文本数据的动态扩展性。The embodiment of the present application provides a deep learning-based text recommendation method, device, computer equipment, and storage medium, the method including: collecting different types of text information to construct a text database, and using a twin neural network structure to generate the text database Each text information in generates a text feature vector; the text feature vector is converted into Milvus vector index information, and stored in the Milvus database; when the text to be matched is matched, the text to be matched is obtained through the twin neural network structure Include the sentence vector of semantic information; Select the top N pieces of Milvus vector index information with the highest semantic similarity in the Milvus database, and based on the correspondence between Milvus vector index information and text feature vectors, select the corresponding The first N pieces of text information are used as the matching results of the text to be matched. The embodiment of the present application solves the time-consuming and labor-intensive defect of matching the text to be matched with the text information one by one by constructing a text database and introducing the Milvus database, and the recommended matching process of this embodiment is simple to implement, has high accuracy and consumes a lot of effort. It doesn't take long. When recommending text, it can achieve fast retrieval, real-time feedback, and has the dynamic scalability of text data in the text database.
附图说明Description of drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can also obtain other drawings based on these drawings on the premise of not paying creative work.
图1为本申请实施例提供的一种基于深度学习的文本推荐方法的流程示意图;FIG. 1 is a schematic flow diagram of a text recommendation method based on deep learning provided in an embodiment of the present application;
图2为本申请实施例提供的一种基于深度学习的文本推荐方法的子流程示意图;FIG. 2 is a schematic subflow diagram of a text recommendation method based on deep learning provided in an embodiment of the present application;
图3为本申请实施例提供的一种基于深度学习的文本推荐装置的示意性框图;FIG. 3 is a schematic block diagram of a text recommendation device based on deep learning provided by an embodiment of the present application;
图4为本申请实施例提供的一种基于深度学习的文本推荐装置的子示意性框图。FIG. 4 is a sub-schematic block diagram of an apparatus for recommending text based on deep learning provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和 “包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in this specification and the appended claims, the terms "comprising" and "comprises" indicate the presence of described features, integers, steps, operations, elements and/or components, but do not exclude one or Presence or addition of multiple other features, integers, steps, operations, elements, components and/or collections thereof.
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the specification of this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this specification and the appended claims, the singular forms "a", "an" and "the" are intended to include plural referents unless the context clearly dictates otherwise.
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/ 或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should be further understood that the term "and/or" used in the description of the present application and the appended claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations .
下面请参见图1,图1为本申请实施例提供的一种基于深度学习的文本推荐方法的流程示意图,具体包括:步骤S101~S104。Please refer to FIG. 1 below. FIG. 1 is a schematic flowchart of a text recommendation method based on deep learning provided by an embodiment of the present application, which specifically includes steps S101 to S104.
S101、采集不同类别的文本信息构建文本资料库,通过孪生神经网络结构为所述文本资料库中的每一文本信息生成文本特征向量;S101. Collect different types of text information to construct a text database, and generate a text feature vector for each text information in the text database through a twin neural network structure;
S102、将所述文本特征向量转换为Milvus向量索引信息,并存储至Milvus数据库中;S102, converting the text feature vector into Milvus vector index information, and storing it in the Milvus database;
S103、在对待匹配文本进行匹配时,通过孪生神经网络结构获取所述待匹配文本中包含语义信息的句子向量;S103. When matching the text to be matched, obtain a sentence vector containing semantic information in the text to be matched through a twin neural network structure;
S104、在所述Milvus数据库选取语义相似度最高的前N条Milvus向量索引信息,并基于Milvus向量索引信息和文本特征向量的对应关系,在所述文本资料库中选取对应的前N条文本信息作为待匹配文本的匹配结果。S104. Select the top N pieces of Milvus vector index information with the highest semantic similarity in the Milvus database, and select the corresponding top N pieces of text information in the text database based on the correspondence between the Milvus vector index information and the text feature vector As the matching result of the text to be matched.
本实施例中,首先利用不同类别的文本信息构建文本资料库,同时通过孪生神经网络结构为文本资料库中的文本信息生成文本特征向量。然后将生成的文本特征向量转换为Milvus向量索引信息存储至Milvus数据库中。当需要对待匹配文本进行匹配推荐时,同样通过孪生神经网络结构为待匹配文本生成对应的句子向量,再通过Milvus数据库计算句子向量和各Milvus向量索引信息之间的相似度,并以此选取其中相似度最高的前N条Milvus向量索引信息,进而可以在所述文本资料库中选取对应的文本信息作为匹配结果或者推荐结果。In this embodiment, firstly, a text database is constructed by using different types of text information, and at the same time, a text feature vector is generated for the text information in the text database through a twin neural network structure. Then convert the generated text feature vectors into Milvus vector index information and store them in the Milvus database. When it is necessary to make matching recommendations for the text to be matched, the corresponding sentence vector is also generated for the text to be matched through the twin neural network structure, and then the similarity between the sentence vector and each Milvus vector index information is calculated through the Milvus database, and one of them is selected. The top N pieces of Milvus vector index information with the highest similarity can then select the corresponding text information in the text database as the matching result or recommendation result.
在具体应用场景中,所述文本资料库为CSV文本资料库(即CSV格式的文本资料库)。进一步的,构建文本资料库的具体步骤可以为:按照所需推荐的类别划分文本,寻找该类别下的若干条文本,按类别归类,每一类是一个CSV文件。CSV文件内容的列名可以是question,flag。其中,question表示文本的内容,flag表示类别名称,在一个CSV文件中,flag的名字是统一的。有几类文本,就有几个CSV文件。In a specific application scenario, the text database is a CSV text database (that is, a text database in CSV format). Further, the specific steps of constructing the text database may be as follows: divide the texts according to the categories to be recommended, find several texts under the categories, and classify them according to the categories, and each category is a CSV file. The column names of the CSV file content can be question, flag. Among them, question represents the content of the text, and flag represents the name of the category. In a CSV file, the name of the flag is uniform. There are several types of text, and there are several CSV files.
在一具体实施例中,为了便于修改数据和数据清洗,本实施例将文本资料库以结构化数据的形式存储至MySQL数据库中。这样做的目的是,当进行数据清洗时,可直接编写Python脚本操作MySQL数据表更新文本数据。与一般用CSV格式的文件作为文本资料库而言,MySQL数据库具有显示直观,操作灵活,便于文本资料库中数据动态扩展等特点。In a specific embodiment, in order to facilitate data modification and data cleaning, this embodiment stores the text database in the MySQL database in the form of structured data. The purpose of this is that when performing data cleaning, you can directly write Python scripts to operate MySQL data tables to update text data. Compared with the general use of CSV format files as text databases, MySQL database has the characteristics of intuitive display, flexible operation, and convenient data dynamic expansion in text databases.
虽然通过以上步骤,可以返回若干文本,但为保证推荐文本时的时效性,需要提前把文本资料库中的全部文本存成离线的特征文件,并且,该特征文件占用了较大的存储空间。同时,一般而言,当出现一个文本时,可以用模型将该文本与文本资料库的所有文本进行逐一比对,返回与之语义最为相似的若干条。值得注意的是,这种方法对于数据量大的文本资料库而言,耗时过大。于是,想到将文本资料库的特征离线存储下来,这样不用每次逐个生成文本资料库的特征向量。但是,这样生成的离线特征文件所占的空间较大,并且,当文本资料库发生变化时,离线特征文件将失效,不能再使用,不便于维护。因此,在一实施例中,为解决这一问题,本实施例使用Milvus数据库存储文本资料库的特征信息,实现快速检索。所谓Milvus 数据库是一款开源的向量数据库,支持针对 TB 级向量的增删改操作和近实时查询检索,具有高度灵活、稳定可靠以及高速查询等特点。Although several texts can be returned through the above steps, in order to ensure the timeliness of recommending texts, it is necessary to save all the texts in the text database as offline feature files in advance, and the feature files occupy a large storage space. At the same time, generally speaking, when a text appears, the model can be used to compare the text with all the texts in the text database one by one, and return the most similar semantics. It is worth noting that this method is too time-consuming for a text database with a large amount of data. Therefore, I thought of storing the features of the text database offline, so that there is no need to generate the feature vectors of the text database one by one every time. However, the offline feature files generated in this way occupy a large space, and when the text database changes, the offline feature files will become invalid and cannot be used any more, which is inconvenient for maintenance. Therefore, in an embodiment, in order to solve this problem, this embodiment uses the Milvus database to store the characteristic information of the text database, so as to realize fast retrieval. The so-called Milvus database is an open-source vector database that supports addition, deletion, modification, and near-real-time query and retrieval of TB-level vectors. It has the characteristics of high flexibility, stability, reliability, and high-speed query.
在现有文本匹配的技术中,通常是提取两个文本信息的文本特征,进而基于提取的文本特征判断两个文本信息是否匹配。在提取文本特征的过程中,常常直接将文本信息的词向量相加,或直接结合文本信息中词语在本身的权重,进行加权,来构建文本信息的文本特征。但所得到的文本向量可能会受文本中个别词的影响,导致所构建的文本特征无法准确反映文本信息的语义,从而导致匹配准确率较低。另外,最常用的表示句子向量的方法是将BERT输出层的向量进行平均,或者使用BERT输出层的第一个词来表示,这毫无疑问会产生不太好的句子编码信息。更有甚者,在10000个句子集合中,用上述方法寻找最为相似的句子对耗时需要65小时。可见,这些技术存在比较复杂,存在成本高,效率低,耗时较多等问题。因此,针对上述问题,本实施例提供的基于深度学习的文本推荐方法,通过构建文本资料库和引入Milvus数据库,解决了将待匹配文本与文本信息逐条匹配这一耗时耗力的缺陷,并且本实施例的推荐匹配过程实现简单、准确率高且耗时不长,在推荐文本时,能做到快速检索,实时反馈,且具备文本资料库文本数据的动态扩展性。在具体测试场景中,一次请求只需要大约30毫秒即可返回结果。In existing text matching technologies, text features of two text messages are usually extracted, and then based on the extracted text features, it is judged whether the two text messages match. In the process of extracting text features, the word vectors of the text information are often added directly, or directly combined with the weights of the words in the text information, and weighted to construct the text features of the text information. However, the obtained text vector may be affected by individual words in the text, so the constructed text features cannot accurately reflect the semantics of the text information, resulting in a low matching accuracy. In addition, the most commonly used method of representing sentence vectors is to average the vectors of the BERT output layer, or use the first word of the BERT output layer to represent, which will undoubtedly produce poor sentence encoding information. What's more, in a collection of 10,000 sentences, it takes 65 hours to find the most similar sentence pair using the above method. It can be seen that these technologies are relatively complicated, and there are problems such as high cost, low efficiency, and more time-consuming. Therefore, in view of the above problems, the deep learning-based text recommendation method provided in this embodiment solves the time-consuming and labor-intensive defect of matching the text to be matched with the text information one by one by constructing a text database and introducing the Milvus database, and The recommendation matching process of this embodiment is simple to implement, has high accuracy and does not take long time. When recommending text, it can achieve fast retrieval, real-time feedback, and has dynamic scalability of text data in the text database. In a specific test scenario, a request only takes about 30 milliseconds to return the result.
在一实施例中,所述步骤S101包括:In one embodiment, the step S101 includes:
将所述文本资料库中的文本信息两两组合,并将组合中的两个文本信息分别依次输入至相同结构的BERT网络模型和平均池化层中,并分别输出得到两个文本信息对应的编码结果,然后将所述编码结果作为两个文本信息对应的文本特征向量。Combine the text information in the text database in pairs, and input the two text information in the combination into the BERT network model and the average pooling layer of the same structure respectively, and respectively output the two text information corresponding to The encoding result is used as a text feature vector corresponding to the two text information.
本实施例中,在通过孪生神经网络结构生成文本特征向量时,首先将文本资料库中的文本信息两两组合,然后对于每一组合中的两个文本信息,将二者分别输入至相同结构的BERT网络模型和平均池化层,并对应得到2个编码结果,这模型编码结果即为所获得的具有语义信息的文本特征向量。值得注意的是,这种孪生神经网络结构能够对所输入的句子产生固定大小的向量,而这些向量所具有的语义信息,可以用来计算相似性。In this embodiment, when the text feature vector is generated by the twin neural network structure, the text information in the text database is first combined in pairs, and then for each combination of two text information, the two are respectively input into the same structure The BERT network model and the average pooling layer, and correspondingly obtain 2 encoding results. The encoding result of this model is the obtained text feature vector with semantic information. It is worth noting that this Siamese neural network structure can generate fixed-size vectors for the input sentences, and the semantic information of these vectors can be used to calculate similarity.
另外,为获得固定大小的文本特征向量,本实施例基于BERT网络模型做改进。BERT网络模型的全称是:Bidirectional Encoder Representations from Transformer,是一种预训练网络。从名字中可以看出,BERT网络模型的目标是利用大规模无标注语料训练、获得文本的包含丰富语义信息的文本的语义表示,然后将文本的语义表示在特定NLP任务中作微调,最终应用于该NLP任务。这些任务可包括智能问答,句子分类,句子对表示等。但是,BERT网络模型的一大缺点是没有计算独立的句子编码,这使得通过BERT网络模型获取很好的句子编码是较为困难的。In addition, in order to obtain a text feature vector with a fixed size, this embodiment makes improvements based on the BERT network model. The full name of the BERT network model is: Bidirectional Encoder Representations from Transformer, which is a pre-trained network. As can be seen from the name, the goal of the BERT network model is to use large-scale unlabeled corpus training to obtain the semantic representation of the text containing rich semantic information, and then fine-tune the semantic representation of the text in specific NLP tasks, and finally apply for this NLP task. These tasks can include intelligent question answering, sentence classification, sentence pair representation, etc. However, a major disadvantage of the BERT network model is that it does not calculate independent sentence codes, which makes it difficult to obtain good sentence codes through the BERT network model.
考虑到BERT网络模型的上述局限,本实施例的改进主要在于BERT网络模型的输出层之后添加平均池化操作。池化层的作用是特征平移不变性,这样设置的好处在于,加上平均池化层后,对于不同的句子,最终输出的向量大小固定。Considering the above limitations of the BERT network model, the improvement of this embodiment mainly lies in adding an average pooling operation after the output layer of the BERT network model. The role of the pooling layer is feature translation invariance. The advantage of this setting is that after adding the average pooling layer, the final output vector size is fixed for different sentences.
在一实施例中,所述步骤S102包括:In one embodiment, the step S102 includes:
对所述文本特征向量进行归一化处理,得到归一化的文本特征向量;Performing normalization processing on the text feature vector to obtain a normalized text feature vector;
将归一化的文本特征向量转换为Milvus向量索引信息。Convert normalized text feature vectors to Milvus vector index information.
本实施例中,在将文本特征向量转换Milvus向量索引信息时,首先对文本特征向量进行归一化处理,而归一化处理的具体步骤为:输入2个文本信息,然后分别经过相同结构的BERT网络模型和平均池化层,分别得到2个编码结果,再将这2个编码结果分别进行归一化,即可得到归一化的文本特征向量。之后,将归一化的特文本征向量转换成Milvus向量索引信息,存储至Milvus数据库中,以得到Milvus向量信息。如此,所述文本资料库与Milvus数据库就对应起来了(即二者的ID号完全一样),方便查询后返回原始文本资料库的文本信息,而不是只返回不好辨认的Milvus的索引信息。In this embodiment, when the text feature vector is converted into Milvus vector index information, the text feature vector is first normalized, and the specific steps of the normalization process are: input 2 text information, and then pass through the same structure respectively The BERT network model and the average pooling layer obtain two encoding results respectively, and then normalize the two encoding results respectively to obtain a normalized text feature vector. Afterwards, the normalized feature vectors are converted into Milvus vector index information and stored in the Milvus database to obtain Milvus vector information. In this way, the text database and the Milvus database are corresponding (that is, the ID numbers of the two are exactly the same), which is convenient for returning the text information of the original text database after query, instead of only returning the difficult-to-recognize index information of Milvus.
在一实施例中,所述步骤S103包括:In one embodiment, the step S103 includes:
将待匹配文本单独输入至BERT网络模型,得到待匹配文本对应的文本语义表示;Input the text to be matched into the BERT network model separately to obtain the text semantic representation corresponding to the text to be matched;
通过平均池化层对所述文本语义表示进行向量大小固定,得到所述句子向量。The vector size of the text semantic representation is fixed through an average pooling layer to obtain the sentence vector.
本实施例中,虽然在生成文本特征向量时和生成句子向量均使用了孪生神经网络结构,但是在生成文本特征向量时,由于孪生神经网络结构设置有2个输入,因此同时输入2个文本信息。而在生成句子向量时,由于孪生神经网络结构经过生成文本特征向量后已经具备了适应类似数据的特征表示能力,因此只需将待匹配文本单独输入即可,即将待匹配文本依次输入至BERT网络模型和平均池化层。In this embodiment, although the twin neural network structure is used when generating the text feature vector and generating the sentence vector, when generating the text feature vector, since the twin neural network structure is provided with 2 inputs, 2 text information are input at the same time . When generating sentence vectors, since the twin neural network structure already has the feature representation ability to adapt to similar data after generating text feature vectors, it is only necessary to input the text to be matched separately, that is, to input the text to be matched into the BERT network in sequence Model and average pooling layers.
在一实施例中,所述步骤S104包括:In one embodiment, the step S104 includes:
利用余弦相似度方法计算所述句子向量与每一所述Milvus向量索引信息的相似度得分;Utilize cosine similarity method to calculate the similarity score of described sentence vector and each described Milvus vector index information;
选取相似度得分最高的前N条Milvus向量索引信息。Select the top N Milvus vector index information with the highest similarity score.
本实施例中,在进行文本推荐时,对于待匹配的文本,通过余弦相似度方法对句子向量和Milvus向量索引信息进行相似度计算,从而搜索选取语义相似的前N条文本匹配结果,即前N条Milvus向量索引信息。随后便可以在文本资料库中找到对应的文本信息。在一具体实施例中,按照置信度高低对各Milvus向量索引信息进行排序选择。In this embodiment, when performing text recommendation, for the text to be matched, the cosine similarity method is used to calculate the similarity between the sentence vector and the Milvus vector index information, so as to search and select the top N text matching results with similar semantics, that is, the top N N pieces of Milvus vector index information. The corresponding text information can then be found in the text database. In a specific embodiment, the Milvus vector index information is sorted and selected according to the degree of confidence.
在另一具体实施例中,在启动Milvus Docker容器之前,需要修改配置文件中的MySQL地址,并暴露19530端口。一启动,容器自动在MySQL数据库中建立4张Milvus元数据表。如果文本匹配模型有更新,则需要重建Milvus索引向量。Milvus向量数据库和孪生神经网络结构共同构建了语义搜索引擎,用于文本推荐。In another specific embodiment, before starting the Milvus Docker container, it is necessary to modify the MySQL address in the configuration file and expose port 19530. Once started, the container automatically creates 4 Milvus metadata tables in the MySQL database. If the text matching model is updated, the Milvus index vector needs to be rebuilt. The Milvus vector database and the Siamese neural network structure jointly build a semantic search engine for text recommendation.
在一实施例中,如图2所示,所述基于深度学习的文本推荐还包括:步骤S201~S204。In an embodiment, as shown in FIG. 2 , the text recommendation based on deep learning further includes: steps S201-S204.
S201、选取文本数据集,并将所述文本数据集按照训练集:测试集=7:3的比例划分为训练集和测试集;S201. Select a text data set, and divide the text data set into a training set and a test set according to the ratio of training set:test set=7:3;
S202、利用所述训练集中的文本数据对所述孪生神经网络结构进行训练学习,并设置所述孪生神经网络结构的超参数批处理大小为16,学习率为2e-5;S202. Use the text data in the training set to train and learn the twin neural network structure, and set the hyperparameter batch size of the twin neural network structure to 16, and the learning rate to 2e-5;
S203、采用Adam优化器对所述孪生神经网络结构进行参数优化,以及采用余弦相似性损失函数对所述孪生神经网络结构进行性能评估;S203, using the Adam optimizer to optimize the parameters of the twin neural network structure, and using a cosine similarity loss function to evaluate the performance of the twin neural network structure;
S204、利用所述测试集中的文本数据对优化后的孪生神经网络结构进行参数更新。S204. Utilize the text data in the test set to update the parameters of the optimized Siamese neural network structure.
本实施例中,在设置数据集方面,为保证具有一定的泛化性,故遵循训练集与测试集比例为7:3的原则。进一步的,将所述孪生神经网络结构输出的样本标签分数值设置为数字0-5。这样设置样本标签分数的好处在于可以更加精细地刻画两个文本之间的相似程度,而不是像0-1标签那样,只有相似和不相似这两种情况。因为两个文本有多相似,还是完全一样,是无法只通过标签1看出来的。数字0表示文本A和文本B语义完全不一样。数字5表示文本A和文本B语义完全一样。其他数字(如:1、2、3、4)代表中间两大句子语义相似的程度。同时,为了正常训练网络,因此在训练过程中,需要把这些标签分数值除以5,以获得归一化的分数值。In this embodiment, in setting the data set, in order to ensure a certain degree of generalization, the principle that the ratio of the training set to the test set is 7:3 is followed. Further, the sample label score value output by the Siamese neural network structure is set as a number 0-5. The advantage of setting the sample label score in this way is that it can describe the similarity between two texts more finely, instead of the 0-1 label, there are only two cases of similarity and dissimilarity. Because how similar the two texts are, or exactly the same, it cannot be seen only through label 1. The number 0 means that the semantics of text A and text B are completely different. The number 5 means that the semantics of text A and text B are exactly the same. Other numbers (such as: 1, 2, 3, 4) represent the degree of semantic similarity between the two sentences in the middle. At the same time, in order to train the network normally, during the training process, these label score values need to be divided by 5 to obtain a normalized score value.
在训练时,还会存在诸多超参数。例如,设定超参数批处理大小为16,优化器使用Adam优化器,学习率为2e-5。所采用的损失函数为余弦相似性损失函数。这里,损失函数也可以使用其他的损失函数,但是与其他函数相比,余弦相似性损失函数在速度方面的优点更加明显。直接使用余弦相似度来衡量两个句子向量之间的相似度,可极大的提升推理速度。During training, there will be many hyperparameters. For example, set the hyperparameter batch size to 16, the optimizer uses the Adam optimizer, and the learning rate is 2e-5. The loss function used is the cosine similarity loss function. Here, the loss function can also use other loss functions, but compared with other functions, the cosine similarity loss function has more obvious advantages in terms of speed. Directly using cosine similarity to measure the similarity between two sentence vectors can greatly improve the reasoning speed.
在利用测试集进行预测时,可定量定性的观察在测试集上的表现效果。通过预测结果,我们可以判断模型是否已收敛。When using the test set for prediction, the performance effect on the test set can be observed quantitatively and qualitatively. By predicting the outcome, we can tell if the model has converged.
在一实施例中,所述基于深度学习的文本推荐还包括:In one embodiment, the text recommendation based on deep learning also includes:
获取文本更新信息,并将所述文本更新信息进行数据清洗后存储至所述文本资料库;Acquiring text update information, and storing the text update information in the text database after data cleaning;
根据所述文本资料库中的文本更新信息,通过孪生神经网络结构生成对应更新文本特征向量,并将所述更新文本特征向量转换为Milvus更新向量索引信息后存储至所述Milvus数据库中。According to the text update information in the text database, the corresponding update text feature vector is generated through the twin neural network structure, and the update text feature vector is converted into Milvus update vector index information and stored in the Milvus database.
考虑到MySQL数据库中的文本信息可能来源于多个表,比如,文本资料库的信息包含3个类别,而这3个类别的信息可以从3张数据库表的某个字段的具体内容信息中经过数据清洗获得,多个表表示不同类别的数据源头,多个表的数据在经过数据清洗后,因此可以获取想要的数据存储在文本资料库中,这些表称为数据源头表。假设考虑一种情况,在制作好文本资料库后,数据源头表仍在不断的增加数据。这时,为了增强动态扩展性,本实施例因此增加了定时同步阶段,旨在根据在特定时间查询数据源头表,对新增加的数据(即所述文本更新信息)进行数据清洗,同步最新数据至文本资料库。这里,进行数据清洗的原因是文本资料库的数据本身是从数据库表的某个字段的具体内容信息中清洗加工提炼而成的。与此同时,根据同步至文本资料库的信息,返回主键ID号。将新增的文本数据通过训练好的文本匹配模型,获取文本特征编码向量,将该特征向量归一化,结合此ID号,创建Milvus索引,将该文本特征向量进行编码,生成索引,插入Milvus向量数据库中,方便后续高效查询。在进行文本相似度比较时,搜索Milvus向量数据库中的索引向量,进行余弦相似度计算匹配结果。Considering that the text information in the MySQL database may come from multiple tables, for example, the information in the text database contains 3 categories, and the information of these 3 categories can be passed through the specific content information of a field in the 3 database tables Data cleaning is obtained. Multiple tables represent different types of data sources. After the data of multiple tables is cleaned, the desired data can be obtained and stored in the text database. These tables are called data source tables. Assuming a situation is considered, after the text database is made, the data source table is still continuously adding data. At this time, in order to enhance the dynamic scalability, this embodiment therefore adds a timing synchronization stage, which aims to perform data cleaning on the newly added data (that is, the text update information) according to querying the data source table at a specific time, and synchronize the latest data to the text database. Here, the reason for data cleaning is that the data of the text database itself is cleaned, processed and extracted from the specific content information of a certain field in the database table. At the same time, according to the information synchronized to the text database, the primary key ID number is returned. Pass the newly added text data through the trained text matching model to obtain the text feature encoding vector, normalize the feature vector, combine the ID number, create a Milvus index, encode the text feature vector, generate an index, and insert it into Milvus In the vector database, it is convenient for subsequent efficient query. When comparing text similarity, search the index vector in the Milvus vector database, and perform cosine similarity to calculate the matching result.
在具体实施例中,可以采用Python语言中apscheduler定时框架来完成定时任务的执行。数据源头表中的数据在不断更新,如果缺乏定时同步的机制,那么新增的数据则无法自动存储在文本资料库中,也无法及时创建Milvus索引,插入Milvus向量数据库中。那么,查询时,文本信息无法与时俱进。In a specific embodiment, the apscheduler timing framework in the Python language can be used to complete the execution of the timing task. The data in the data source table is constantly updated. If there is no timing synchronization mechanism, the newly added data cannot be automatically stored in the text database, nor can the Milvus index be created in time and inserted into the Milvus vector database. Then, when querying, the text information cannot keep pace with the times.
图3为本申请实施例提供的一种基于深度学习的文本推荐装置300的示意性框图,该装置300包括:FIG. 3 is a schematic block diagram of a text recommendation device 300 based on deep learning provided by an embodiment of the present application. The device 300 includes:
第一向量生成单元301,用于采集不同类别的文本信息构建文本资料库,通过孪生神经网络结构为所述文本资料库中的每一文本信息生成文本特征向量;The first vector generation unit 301 is used to collect different types of text information to construct a text database, and generate a text feature vector for each text information in the text database through a twin neural network structure;
第一向量转换单元302,用于将所述文本特征向量转换为Milvus向量索引信息,并存储至Milvus数据库中;The first vector conversion unit 302 is used to convert the text feature vector into Milvus vector index information, and store it in the Milvus database;
第二向量生成单元303,用于在对待匹配文本进行匹配时,通过孪生神经网络结构获取所述待匹配文本中包含语义信息的句子向量;The second vector generation unit 303 is used to obtain the sentence vector containing semantic information in the text to be matched through the twin neural network structure when the text to be matched is matched;
文本匹配单元304,用于在所述Milvus数据库选取语义相似度最高的前N条Milvus向量索引信息,并基于Milvus向量索引信息和文本特征向量的对应关系,在所述文本资料库中选取对应的前N条文本信息作为待匹配文本的匹配结果。The text matching unit 304 is used to select the top N pieces of Milvus vector index information with the highest semantic similarity in the Milvus database, and based on the correspondence between the Milvus vector index information and text feature vectors, select the corresponding in the text database. The first N pieces of text information are used as the matching results of the text to be matched.
在一实施例中,所述第一向量生成单元301包括:In an embodiment, the first vector generation unit 301 includes:
编码输出单元,用于将所述文本资料库中的文本信息两两组合,并将组合中的两个文本信息分别依次输入至相同结构的BERT网络模型和平均池化层中,并分别输出得到两个文本信息对应的编码结果,然后将所述编码结果作为两个文本信息对应的文本特征向量。The encoding output unit is used to combine the text information in the text database in pairs, and input the two text information in the combination into the BERT network model and the average pooling layer of the same structure respectively, and output them respectively to obtain An encoding result corresponding to the two text information, and then use the encoding result as a text feature vector corresponding to the two text information.
在一实施例中,所述第一向量转换单元302包括:In an embodiment, the first vector conversion unit 302 includes:
归一化单元,用于对所述文本特征向量进行归一化处理,得到归一化的文本特征向量;A normalization unit, configured to perform normalization processing on the text feature vector to obtain a normalized text feature vector;
第二向量转换单元,用于将归一化的文本特征向量转换为Milvus向量索引信息。The second vector conversion unit is used to convert the normalized text feature vector into Milvus vector index information.
在一实施例中,所述第二向量生成单元303包括:In an embodiment, the second vector generating unit 303 includes:
文本语义表示获取单元,用于将待匹配文本单独输入至BERT网络模型,得到待匹配文本对应的文本语义表示;The text semantic representation acquisition unit is used to separately input the text to be matched into the BERT network model to obtain the text semantic representation corresponding to the text to be matched;
向量固定单元,用于通过平均池化层对所述文本语义表示进行向量大小固定,得到所述句子向量。The vector fixing unit is used to fix the vector size of the text semantic representation through the average pooling layer to obtain the sentence vector.
在一实施例中,所述文本匹配单元304包括:In one embodiment, the text matching unit 304 includes:
相似度计算单元,用于利用余弦相似度方法计算所述句子向量与每一所述Milvus向量索引信息的相似度得分;A similarity calculation unit, for utilizing the cosine similarity method to calculate the similarity score between the sentence vector and each of the Milvus vector index information;
索引信息选取单元,用于选取相似度得分最高的前N条Milvus向量索引信息。The index information selection unit is used to select the first N Milvus vector index information with the highest similarity score.
在一实施例中,如图4所示,所述基于深度学习的文本推荐装置300还包括:In one embodiment, as shown in FIG. 4, the deep learning-based text recommendation device 300 further includes:
数据集划分单元401,用于选取文本数据集,并将所述文本数据集按照训练集:测试集=7:3的比例划分为训练集和测试集;The data set division unit 401 is used to select a text data set, and divide the text data set into a training set and a test set according to the ratio of training set: test set=7:3;
训练学习单元402,用于利用所述训练集中的文本数据对所述孪生神经网络结构进行训练学习,并设置所述孪生神经网络结构的超参数批处理大小为16,学习率为2e-5;The training learning unit 402 is used to use the text data in the training set to train and learn the twin neural network structure, and set the hyperparameter batch size of the twin neural network structure to 16, and the learning rate is 2e-5;
优化评估单元403,用于采用Adam优化器对所述孪生神经网络结构进行参数优化,以及采用余弦相似性损失函数对所述孪生神经网络结构进行性能评估;An optimization evaluation unit 403, configured to optimize parameters of the twin neural network structure by using an Adam optimizer, and perform performance evaluation on the twin neural network structure by using a cosine similarity loss function;
参数更新单元404,用于关于利用所述测试集中的文本数据对优化后的孪生神经网络结构进行参数更新。A parameter update unit 404, configured to update the parameters of the optimized Siamese neural network structure using the text data in the test set.
在一实施例中,所述基于深度学习的文本推荐装置300还包括:In one embodiment, the text recommendation device 300 based on deep learning also includes:
更新信息获取单元,用于获取文本更新信息,并将所述文本更新信息进行数据清洗后存储至所述文本资料库;An update information acquiring unit, configured to acquire text update information, and store the text update information in the text database after data cleaning;
更新存储单元,用于根据所述文本资料库中的文本更新信息,通过孪生神经网络结构生成对应更新文本特征向量,并将所述更新文本特征向量转换为Milvus更新向量索引信息后存储至所述Milvus数据库中。The update storage unit is used to generate a corresponding update text feature vector through the twin neural network structure according to the text update information in the text database, and store the update text feature vector into the Milvus update vector index information after converting it into the Milvus update vector index information. Milvus database.
由于装置部分的实施例与方法部分的实施例相互对应,因此装置部分的实施例请参见方法部分的实施例的描述,这里暂不赘述。Since the embodiment of the device part corresponds to the embodiment of the method part, please refer to the description of the embodiment of the method part for the embodiment of the device part, and details will not be repeated here.
本申请实施例还提供了一种计算机可读存储介质,其上存有计算机程序,该计算机程序被执行时可以实现上述实施例所提供的步骤。该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored. When the computer program is executed, the steps provided in the above-mentioned embodiments can be realized. The storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, and other media capable of storing program codes.
本申请实施例还提供了一种计算机设备,可以包括存储器和处理器,存储器中存有计算机程序,处理器调用存储器中的计算机程序时,可以实现上述实施例所提供的步骤。当然计算机设备还可以包括各种网络接口,电源等组件。The embodiment of the present application also provides a computer device, which may include a memory and a processor. A computer program is stored in the memory. When the processor invokes the computer program in the memory, the steps provided in the above embodiments can be implemented. Of course, the computer equipment may also include components such as various network interfaces and power supplies.
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。Each embodiment in the description is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part. It should be pointed out that those skilled in the art can make some improvements and modifications to the application without departing from the principles of the application, and these improvements and modifications also fall within the protection scope of the claims of the application.
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的状况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that in this specification, relative terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations There is no such actual relationship or order between the operations. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

Claims (10)

  1. 一种基于深度学习的文本推荐方法,其特征在于,包括:A text recommendation method based on deep learning, characterized in that it includes:
    采集不同类别的文本信息构建文本资料库,通过孪生神经网络结构为所述文本资料库中的每一文本信息生成文本特征向量;Collect different types of text information to build a text database, and generate a text feature vector for each text information in the text database through a twin neural network structure;
    将所述文本特征向量转换为Milvus向量索引信息,并存储至Milvus数据库中;The text feature vector is converted into Milvus vector index information, and stored in the Milvus database;
    在对待匹配文本进行匹配时,通过孪生神经网络结构获取所述待匹配文本中包含语义信息的句子向量;When the text to be matched is matched, the sentence vector containing semantic information in the text to be matched is obtained through the twin neural network structure;
    在所述Milvus数据库选取语义相似度最高的前N条Milvus向量索引信息,并基于Milvus向量索引信息和文本特征向量的对应关系,在所述文本资料库中选取对应的前N条文本信息作为待匹配文本的匹配结果。Select the top N pieces of Milvus vector index information with the highest semantic similarity in the Milvus database, and based on the correspondence between the Milvus vector index information and the text feature vector, select the corresponding top N pieces of text information in the text database as the waiting list The matching result for the matched text.
  2. 根据权利要求1所述的基于深度学习的文本推荐方法,其特征在于,所述通过孪生神经网络结构为所述文本资料库中的每一文本信息生成文本特征向量,包括:The text recommendation method based on deep learning according to claim 1, wherein said generating a text feature vector for each text information in said text database through a twin neural network structure comprises:
    将所述文本资料库中的文本信息两两组合,并将组合中的两个文本信息分别依次输入至相同结构的BERT网络模型和平均池化层中,并分别输出得到两个文本信息对应的编码结果,然后将所述编码结果作为两个文本信息对应的文本特征向量。Combine the text information in the text database in pairs, and input the two text information in the combination into the BERT network model and the average pooling layer of the same structure respectively, and respectively output the two text information corresponding to The encoding result is used as a text feature vector corresponding to the two text information.
  3. 根据权利要求1所述的基于深度学习的文本推荐方法,其特征在于,所述将所述文本特征向量转换为Milvus向量索引信息,并存储至Milvus数据库中,包括:The text recommendation method based on deep learning according to claim 1, wherein said text feature vector is converted into Milvus vector index information, and stored in the Milvus database, comprising:
    对所述文本特征向量进行归一化处理,得到归一化的文本特征向量;Performing normalization processing on the text feature vector to obtain a normalized text feature vector;
    将归一化的文本特征向量转换为Milvus向量索引信息。Convert normalized text feature vectors to Milvus vector index information.
  4. 根据权利要求1所述的基于深度学习的文本推荐方法,其特征在于,所述在对待匹配文本进行匹配时,通过孪生神经网络结构获取所述待匹配文本中包含语义信息的句子向量,包括:The text recommendation method based on deep learning according to claim 1, wherein when the text to be matched is matched, the sentence vector containing semantic information in the text to be matched is obtained through a twin neural network structure, including:
    将待匹配文本单独输入至BERT网络模型,得到待匹配文本对应的文本语义表示;Input the text to be matched into the BERT network model separately to obtain the text semantic representation corresponding to the text to be matched;
    通过平均池化层对所述文本语义表示进行向量大小固定,得到所述句子向量。The vector size of the semantic representation of the text is fixed through an average pooling layer to obtain the sentence vector.
  5. 根据权利要求1所述的基于深度学习的文本推荐方法,其特征在于,所述在所述Milvus数据库选取语义相似度最高的前N条Milvus向量索引信息,包括:The text recommendation method based on deep learning according to claim 1, wherein the first N pieces of Milvus vector index information with the highest semantic similarity are selected in the Milvus database, including:
    利用余弦相似度方法计算所述句子向量与每一所述Milvus向量索引信息的相似度得分;Utilize cosine similarity method to calculate the similarity score of described sentence vector and each described Milvus vector index information;
    选取相似度得分最高的前N条Milvus向量索引信息。Select the top N Milvus vector index information with the highest similarity score.
  6. 根据权利要求1所述的基于深度学习的文本推荐方法,其特征在于,还包括:The text recommendation method based on deep learning according to claim 1, further comprising:
    选取文本数据集,并将所述文本数据集按照训练集:测试集=7:3的比例划分为训练集和测试集;Select a text data set, and divide the text data set into a training set and a test set according to a training set: a test set=7:3 ratio;
    利用所述训练集中的文本数据对所述孪生神经网络结构进行训练学习,并设置所述孪生神经网络结构的超参数批处理大小为16,学习率为2e-5;Utilize the text data in the training set to train and learn the twin neural network structure, and set the hyperparameter batch size of the twin neural network structure to be 16, and the learning rate is 2e-5;
    采用Adam优化器对所述孪生神经网络结构进行参数优化,以及采用余弦相似性损失函数对所述孪生神经网络结构进行性能评估;Using the Adam optimizer to optimize the parameters of the twin neural network structure, and using a cosine similarity loss function to evaluate the performance of the twin neural network structure;
    利用所述测试集中的文本数据对优化后的孪生神经网络结构进行参数更新。Using the text data in the test set to update the parameters of the optimized Siamese neural network structure.
  7. 根据权利要求1所述的基于深度学习的文本推荐方法,其特征在于,还包括:The text recommendation method based on deep learning according to claim 1, further comprising:
    获取文本更新信息,并将所述文本更新信息进行数据清洗后存储至所述文本资料库;Acquiring text update information, and storing the text update information in the text database after data cleaning;
    根据所述文本资料库中的文本更新信息,通过孪生神经网络结构生成对应更新文本特征向量,并将所述更新文本特征向量转换为Milvus更新向量索引信息后存储至所述Milvus数据库中。According to the text update information in the text database, the corresponding update text feature vector is generated through the twin neural network structure, and the update text feature vector is converted into Milvus update vector index information and stored in the Milvus database.
  8. 一种基于深度学习的文本推荐装置,其特征在于,包括:A text recommendation device based on deep learning, characterized in that it includes:
    第一向量生成单元,用于采集不同类别的文本信息构建文本资料库,通过孪生神经网络结构为所述文本资料库中的每一文本信息生成文本特征向量;The first vector generation unit is used to collect different types of text information to construct a text database, and generates a text feature vector for each text information in the text database through a twin neural network structure;
    第一向量转换单元,用于将所述文本特征向量转换为Milvus向量索引信息,并存储至Milvus数据库中;The first vector conversion unit is used to convert the text feature vector into Milvus vector index information, and store it in the Milvus database;
    第二向量生成单元,用于在对待匹配文本进行匹配时,通过孪生神经网络结构获取所述待匹配文本中包含语义信息的句子向量;The second vector generation unit is used to obtain the sentence vector containing semantic information in the text to be matched through the twin neural network structure when the text to be matched is matched;
    文本匹配单元,用于在所述Milvus数据库选取语义相似度最高的前N条Milvus向量索引信息,并基于Milvus向量索引信息和文本特征向量的对应关系,在所述文本资料库中选取对应的前N条文本信息作为待匹配文本的匹配结果。The text matching unit is used to select the top N pieces of Milvus vector index information with the highest semantic similarity in the Milvus database, and based on the correspondence between the Milvus vector index information and text feature vectors, select the corresponding top N pieces in the text database. N pieces of text information are used as matching results of the text to be matched.
  9. 一种计算机设备,其特征在于,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的基于深度学习的文本推荐方法。A computer device, characterized in that it includes a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program, it realizes the requirements of claims 1 to 1. 7. The text recommendation method based on deep learning according to any one of the above.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的基于深度学习的文本推荐方法。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the deep learning-based text recommendation method.
PCT/CN2021/129027 2021-10-27 2021-11-05 Text recommendation method and apparatus based on deep learning, and related medium WO2023070732A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111255426.8A CN113704386A (en) 2021-10-27 2021-10-27 Text recommendation method and device based on deep learning and related media
CN202111255426.8 2021-10-27

Publications (1)

Publication Number Publication Date
WO2023070732A1 true WO2023070732A1 (en) 2023-05-04

Family

ID=78647112

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/129027 WO2023070732A1 (en) 2021-10-27 2021-11-05 Text recommendation method and apparatus based on deep learning, and related medium

Country Status (2)

Country Link
CN (1) CN113704386A (en)
WO (1) WO2023070732A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116384494A (en) * 2023-06-05 2023-07-04 安徽思高智能科技有限公司 RPA flow recommendation method and system based on multi-modal twin neural network
CN117762917A (en) * 2024-01-16 2024-03-26 北京三维天地科技股份有限公司 Medical instrument data cleaning method and system based on deep learning

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114386421A (en) * 2022-01-13 2022-04-22 平安科技(深圳)有限公司 Similar news detection method and device, computer equipment and storage medium
CN114817511B (en) * 2022-06-27 2022-09-23 深圳前海环融联易信息科技服务有限公司 Question-answer interaction method and device based on kernel principal component analysis and computer equipment
CN115238065B (en) * 2022-09-22 2022-12-20 太极计算机股份有限公司 Intelligent document recommendation method based on federal learning
CN116911641B (en) * 2023-09-11 2024-02-02 深圳市华傲数据技术有限公司 Sponsored recommendation method, sponsored recommendation device, computer equipment and storage medium
CN117574877B (en) * 2023-11-21 2024-05-24 北京假日阳光环球旅行社有限公司 Session text matching method and device, storage medium and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740126A (en) * 2019-01-04 2019-05-10 平安科技(深圳)有限公司 Text matching technique, device and storage medium, computer equipment
CN110413988A (en) * 2019-06-17 2019-11-05 平安科技(深圳)有限公司 Method, apparatus, server and the storage medium of text information matching measurement
CN111026937A (en) * 2019-11-13 2020-04-17 百度在线网络技术(北京)有限公司 Method, device and equipment for extracting POI name and computer storage medium
CN112395385A (en) * 2020-11-17 2021-02-23 中国平安人寿保险股份有限公司 Text generation method and device based on artificial intelligence, computer equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740126A (en) * 2019-01-04 2019-05-10 平安科技(深圳)有限公司 Text matching technique, device and storage medium, computer equipment
CN110413988A (en) * 2019-06-17 2019-11-05 平安科技(深圳)有限公司 Method, apparatus, server and the storage medium of text information matching measurement
CN111026937A (en) * 2019-11-13 2020-04-17 百度在线网络技术(北京)有限公司 Method, device and equipment for extracting POI name and computer storage medium
CN112395385A (en) * 2020-11-17 2021-02-23 中国平安人寿保险股份有限公司 Text generation method and device based on artificial intelligence, computer equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116384494A (en) * 2023-06-05 2023-07-04 安徽思高智能科技有限公司 RPA flow recommendation method and system based on multi-modal twin neural network
CN116384494B (en) * 2023-06-05 2023-08-08 安徽思高智能科技有限公司 RPA flow recommendation method and system based on multi-modal twin neural network
CN117762917A (en) * 2024-01-16 2024-03-26 北京三维天地科技股份有限公司 Medical instrument data cleaning method and system based on deep learning

Also Published As

Publication number Publication date
CN113704386A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
WO2023070732A1 (en) Text recommendation method and apparatus based on deep learning, and related medium
US10346540B2 (en) Self-learning statistical natural language processing for automatic production of virtual personal assistants
JP7162648B2 (en) Systems and methods for intent discovery from multimedia conversations
CN109101479A (en) A kind of clustering method and device for Chinese sentence
CN109829104A (en) Pseudo-linear filter model information search method and system based on semantic similarity
CN111462749B (en) End-to-end dialogue system and method based on dialogue state guidance and knowledge base retrieval
CN112256847B (en) Knowledge base question-answering method integrating fact texts
CN112948562A (en) Question and answer processing method and device, computer equipment and readable storage medium
Bai et al. Applied research of knowledge in the field of artificial intelligence in the intelligent retrieval of teaching resources
CN112632250A (en) Question and answer method and system under multi-document scene
CN113064999A (en) Knowledge graph construction algorithm, system, equipment and medium based on IT equipment operation and maintenance
Aghaei et al. Question answering over knowledge graphs: A case study in tourism
CN112417170B (en) Relationship linking method for incomplete knowledge graph
CN113159187A (en) Classification model training method and device, and target text determining method and device
CN116432653A (en) Method, device, storage medium and equipment for constructing multilingual database
CN116361416A (en) Speech retrieval method, system and medium based on semantic analysis and high-dimensional modeling
CN114970733A (en) Corpus generation method, apparatus, system, storage medium and electronic device
CN114942981A (en) Question-answer query method and device, electronic equipment and computer readable storage medium
Wang et al. Refbert: Compressing bert by referencing to pre-computed representations
CN113536772A (en) Text processing method, device, equipment and storage medium
Mazumder On-the-job continual and interactive learning of factual knowledge and language grounding
Zajíc et al. First insight into the processing of the language consulting center data
CN117313748B (en) Multi-feature fusion semantic understanding method and device for government affair question and answer
Li et al. An Innovative Similar Complaint Recommendation Model Integrating Semantic and Graph Embeddings
Yan Research on keyword extraction based on abstract extraction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21962079

Country of ref document: EP

Kind code of ref document: A1