CN110297880B - Corpus product recommendation method, apparatus, device and storage medium - Google Patents

Corpus product recommendation method, apparatus, device and storage medium Download PDF

Info

Publication number
CN110297880B
CN110297880B CN201910433178.8A CN201910433178A CN110297880B CN 110297880 B CN110297880 B CN 110297880B CN 201910433178 A CN201910433178 A CN 201910433178A CN 110297880 B CN110297880 B CN 110297880B
Authority
CN
China
Prior art keywords
corpus
product
information
user
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910433178.8A
Other languages
Chinese (zh)
Other versions
CN110297880A (en
Inventor
韩亚洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201910433178.8A priority Critical patent/CN110297880B/en
Publication of CN110297880A publication Critical patent/CN110297880A/en
Application granted granted Critical
Publication of CN110297880B publication Critical patent/CN110297880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics

Abstract

The invention belongs to the technical field of big data analysis and discloses a corpus product recommendation method, device, equipment and storage medium. The method comprises the following steps: receiving a corpus product query request triggered by a user, and acquiring a corpus product query requirement provided by the user according to the corpus product query request; carrying out keyword extraction processing on the query requirement of the corpus products to obtain N keywords, wherein N is an integer greater than or equal to 1; determining characteristic information corresponding to the corpus products required by the user according to the N key words; searching corpus information which accords with any feature in the feature information from a corpus according to the feature information; and processing each corpus information according to the characteristic information and the characteristics corresponding to each corpus information to obtain a corpus product with all the characteristics in the characteristic information, and pushing the corpus product to a user. By the method, the corpus products recommended for the users are in accordance with the actual requirements of the users, and the recommendation accuracy of the corpus products is greatly improved.

Description

Corpus product recommendation method, apparatus, device and storage medium
Technical Field
The invention relates to the technical field of big data analysis, in particular to a corpus product recommendation method, a corpus product recommendation device, corpus product recommendation equipment and a storage medium.
Background
Traditional corpora refer to a large-scale electronic text library that is scientifically sampled and processed. With the development of the times, the current corpus is not limited to storing only corpus information of text types, but also can store various types of corpus information such as pictures, audio, video and the like.
Although the corpus information stored in the existing corpus is various and huge in quantity. However, the query requirements of the user cannot be comprehensively identified in the conventional corpus query mode, so that the screened corpus information does not meet the actual requirements of the user, and the recommendation accuracy of corpus products is low. .
Therefore, it is highly desirable to provide a method for recommending corpus products to users according to the actual requirements of the users, so as to improve the accuracy of recommending corpus products.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for recommending a corpus product, aiming at recommending the corpus product for a user according to the actual requirement of the user so as to improve the recommendation accuracy of the corpus product.
In order to achieve the above object, the present invention provides a method for recommending corpus products, comprising the following steps:
receiving a corpus product query request triggered by a user, and acquiring a corpus product query requirement provided by the user according to the corpus product query request;
extracting keywords from the query requirement of the corpus product to obtain N keywords, wherein N is an integer greater than or equal to 1;
determining characteristic information corresponding to the corpus products required by the user according to the N key words;
searching corpus information which accords with any feature in the feature information from a corpus according to the feature information;
and processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain a corpus product with all the features in the feature information, and pushing the corpus product to the user.
Preferably, the step of extracting keywords from the query requirement of the corpus product to obtain N keywords includes:
performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words, wherein M is an integer less than or equal to N;
calculating the weight value of each word in the M words according to a preset part-of-speech weight distribution standard;
traversing the N words, comparing the weight value of the traversed current word with a preset weight threshold value, and filtering out words with the weight values larger than the weight threshold value to obtain the N keywords.
Preferably, before the step of performing word segmentation and part-of-speech tagging on the query requirement of the corpus product to obtain M words, the method further comprises:
determining a format of the corpus product query requirement;
if the corpus product query requirement is in a voice format, converting the corpus product query requirement in the voice format into a corpus product query requirement in a text format by utilizing a voice recognition technology;
if the corpus product query requirement is in a picture format, converting the corpus product query requirement in the picture format into a corpus product query requirement in a text format by using an optical character recognition technology;
the step of performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words comprises the following steps:
according to punctuation marks in the corpus product query requirement in the text format, carrying out sentence segmentation on the corpus product query requirement in the text format to obtain a sentence to be segmented;
performing maximum reverse matching segmentation on the sentence to be segmented, and determining the M words according to a user-defined dictionary;
and performing part-of-speech tagging on the M words according to preset part-of-speech standard information.
Preferably, before the step of searching corpus information corresponding to any feature in the feature information from a corpus according to the feature information, the method further includes:
detecting whether the characteristic information contains characteristics for identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product;
if the feature information contains the features of identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product, executing the following steps: searching corpus information which accords with any feature in the feature information from a corpus according to the feature information;
otherwise, executing the following steps:
acquiring a historical query record of the user in a preset period;
analyzing the historical query records by utilizing a big data analysis technology to determine the query requirement of the user at the current moment;
taking the query requirement at the current moment as a first element, and taking the N keywords as a second element;
and determining the category of the corpus product, the language format of the corpus product and the multimedia style characteristic of the corpus product according to the first element and the second element.
Preferably, the step of processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain a corpus product having all features in the feature information includes:
according to the characteristics corresponding to the corpus information, screening out the corpus information with the most characteristics, and taking the corpus information as an initial corpus product;
determining features to be integrated according to the feature information and features corresponding to the initial corpus products;
extracting the corpus information corresponding to the features to be integrated from the corpus information except the initial corpus product;
and combining the extracted corpus information with the initial corpus product to obtain a corpus product with all the characteristics in the characteristic information.
Preferably, before the step of pushing the corpus product to the user, the method further comprises:
judging whether the corpus product needs to be charged or not;
if the corpus product does not need to be charged, executing the following steps: pushing the corpus product to the operation of the user;
and if the corpus products need to be charged, issuing a charging notice to the user, deducting the cost required by the corpus products from a payment account preset by the user after receiving an instruction of agreeing to deduction made by the user, and pushing the corpus products to the user.
Preferably, after the step of pushing the corpus product to the user, the method further comprises:
and receiving feedback information submitted by the user, and maintaining the corpus information in the corpus according to the feedback information.
In addition, in order to achieve the above object, the present invention further provides a corpus product recommendation apparatus, including:
the acquisition module is used for receiving a corpus product query request triggered by a user and acquiring a corpus product query demand provided by the user according to the corpus product query request;
the extraction module is used for extracting keywords from the query requirement of the corpus product to obtain N keywords, wherein N is an integer greater than or equal to 1;
the determining module is used for determining the characteristic information corresponding to the corpus products required by the user according to the N key words;
the searching module is used for searching the corpus information which accords with any feature in the feature information from the corpus according to the feature information;
and the generating module is used for processing each corpus information according to the characteristic information and the characteristics corresponding to each corpus information to obtain a corpus product with all the characteristics in the characteristic information, and pushing the corpus product to the user.
In addition, in order to achieve the above object, the present invention further provides a recommendation apparatus for corpus products, including: the recommendation program of the corpus product is configured to implement the steps of the recommendation method of the corpus product as described above.
In addition, in order to achieve the above object, the present invention further provides a storage medium, where a recommendation program of a corpus product is stored, and the recommendation program of the corpus product, when executed by a processor, implements the steps of the recommendation method of the corpus product as described above.
According to the recommendation scheme of the corpus products, the corpus product query requirement provided by the user is extracted from the corpus product query request triggered by the user, the feature information corresponding to the corpus product required by the user is determined according to the N key words extracted from the corpus product query requirement, the corpus information which accords with any feature in the determined feature information is searched out from the corpus according to the determined feature information, and finally, the queried corpus is processed according to the determined feature information and the features corresponding to the queried corpus information, so that the corpus products with the feature information can be obtained, the finally screened corpus information is the corpus information meeting the actual requirements of the user, and the recommendation accuracy of the corpus products is greatly improved.
Drawings
Fig. 1 is a schematic structural diagram of a recommendation device for a corpus product of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for recommending corpus products according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a specific implementation of step S20 in the corpus product recommendation method according to the present invention;
FIG. 4 is a flowchart illustrating a method for recommending corpus products according to a second embodiment of the present invention;
FIG. 5 is a block diagram illustrating a first exemplary embodiment of a corpus product recommendation device according to the present invention.
The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a recommendation device for a corpus product of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the apparatus for recommending corpus products may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001 described previously.
Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the recommendation device for a material product and may include more or less components than those shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include an operating system, a network communication module, a user interface module, and a recommendation program for a corpus product.
In the apparatus for recommending corpus products shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the apparatus for recommending a corpus product according to the present invention may be disposed in the apparatus for recommending a corpus product, and the apparatus for recommending a corpus product calls the recommendation program of a corpus product stored in the memory 1005 through the processor 1001, and executes the method for recommending a corpus product according to the embodiment of the present invention.
An embodiment of the present invention provides a method for recommending a corpus product, and referring to fig. 2, fig. 2 is a schematic flow diagram of a first embodiment of the method for recommending a corpus product according to the present invention.
In this embodiment, the method for recommending a corpus product includes the following steps:
step S10, receiving a language material product query request triggered by a user, and acquiring a language material product query requirement provided by the user according to the language material product query request.
Specifically, the execution main body of the embodiment may be any terminal device used by a user for performing a corpus product query operation, such as a smart phone, a tablet computer, a personal computer, and the like, which are not listed one by one here, and are not limited thereto.
Correspondingly, the manner of triggering the query request of the corpus product may be specifically that the user opens a corpus query Application program (App) provided by a corpus transaction platform installed on the terminal device, and then the corpus query App is generated by clicking a certain function key on the corpus query App, such as a text input box set on the corpus query App, or a voice input key, or an operation key such as a picture input key.
Correspondingly, the obtained query request for the corpus product may be information input by the user when operating the function key.
And step S20, carrying out keyword extraction processing on the query requirement of the corpus products to obtain N keywords.
It should be understood that in practical applications, the corpus product query requirement provided by the user may include at least one word, several words, a sentence, or more information. Therefore, after the keyword extraction processing is performed on the corpus product query requirement, at least one N keyword is obtained, that is, the value of N should be an integer greater than or equal to 1.
In addition, in order to facilitate understanding of the operation of extracting the keywords from the query requirement of the corpus product to obtain N keywords, a specific extraction method is provided in this embodiment, and the general implementation steps are shown in fig. 3 and described in detail below with reference to fig. 3.
And a substep S201, performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words.
It should be understood that, since the finally determined N keywords are selected from the obtained M words, the value of M must not be greater than the value of N in practical applications, that is, M should be an integer less than or equal to N.
In addition, in this embodiment, the word segmentation and part-of-speech tagging processing performed on the corpus product query requirement in the substep S201 specifically includes:
firstly, according to punctuation marks in the query requirement of the corpus product, such as commas, periods and the like, the query requirement of the corpus product is divided into sentences to obtain sentences to be divided.
For example, the query requirement of the corpus product provided by the user is that "hello, i want to listen to english boy. The system traverses the contents in the sentences, when the traversed current character is 'and' the 'is', the system carries out sentence segmentation, the traversed 'and' the previous contents are used as a sentence to be segmented (called as a first sentence to be segmented), and then the system traverses the subsequent contents and then traverses the 'the sentence to be segmented'. When the punctuation mark is 'used, the punctuation mark is divided into a plurality of punctuation marks', and the punctuation mark is marked at present. "as another sentence to be participled (called a second sentence to be participled).
And then, performing maximum reverse matching segmentation on the sentence to be segmented, and determining the M words according to a user-defined dictionary.
Specifically, the term "maximum reverse matching segmentation" refers to segmentation from right to left when segmenting a sentence to be segmented.
The self-defined dictionary is the existing phrase collected and recorded from each big data platform and dictionary in advance, and basically comprises the existing words in various forms which may appear.
For convenience of understanding, a maximum reverse matching segmentation mode is adopted here to segment the second to-be-segmented sentence obtained in the above example.
Suppose that the words recorded in the custom dictionary D are: d = { "i", "look", "read", "listen", "chinese", "english", "ten thousand reasons", "little prince". }.
When the maximum inverse matching segmentation operation is performed on the second sentence to be segmented (S = { "i want to listen to english," small prince "}), a maximum segmentation length is defined, for example, 6, and then segmentation is performed from right to left:
(1) The candidate word W1 taken out of S is 'I want to listen to English';
(2) Searching for words recorded in the custom dictionary D, finding that the candidate word W1 is not in the custom dictionary D, and removing the first leftmost word of the candidate word W1 to obtain a candidate word W2 'which wants to listen to English';
(3) Searching words recorded in the custom dictionary D, finding that the candidate word W2 is not in the custom dictionary D, and removing the first leftmost character of the candidate word W2 to obtain a candidate word W3 which is 'listening to English';
(4) Searching for the word recorded in the custom dictionary D, finding that the candidate word W3 is not in the custom dictionary D, and removing the first leftmost word of the candidate word W3 to obtain a candidate word W4 in English;
(5) Searching for words recorded in the custom dictionary D, finding that the candidate word W4 is in the custom dictionary D, at the moment, splitting the candidate word W4 from S, wherein S is changed into 'I want to listen to the prince';
(6) According to the segmentation length 6, intercepting the content in the S again to obtain a candidate word W5' I want to listen to the prince;
(7) And (5) repeatedly executing the operations in the step (1) to the step (6) until the content in the S is completely cut.
According to the segmentation operation, words segmented from the second sentence to be segmented, namely the small prince that i want to listen to English, are as follows: i, listen, english, xiaowangzi.
It should be understood that the above-mentioned is only a specific word segmentation manner, and the technical solution of the present invention is not limited in any way, and in practical applications, those skilled in the art can perform the setting according to the needs, and the present invention is not limited herein.
In addition, it is worth mentioning that in practical application, the administrator of the corpus can update the custom dictionary according to the historical query record of the user.
And finally, performing part-of-speech tagging on the M words according to preset part-of-speech standard information.
It should be noted that the part-of-speech standard information in this embodiment specifically refers to chinese part-of-speech standard information, and it is specifically specified in the part-of-speech standard information which type of word is a noun, which type of word is a standing-name word, which type of word is a verb, which type of word is an adjective, which type of word is a time word, and the like, and this is not listed one by one.
Still taking the 4 words obtained by splitting as an example, the result obtained by performing part-of-speech tagging on the 4 words according to the part-of-speech standard information may be as follows: "i" < pronouns >, "listen" < verbs >, "english" < adjectives >, "little prince" < nouns >.
It should be understood that the above description is given only by way of illustration, and the technical solution of the present invention is not limited thereto.
In addition, it is worth mentioning that in practical applications, the query requirement of the corpus product provided by the user may be different in format due to different operation keys corresponding to the triggered corpus product query request.
For example, when the key operated by the user is a text input box, the query requirement of the obtained corpus product is specifically in a text format.
For example, when the key operated by the user is a voice input key, the query requirement of the obtained corpus product is specifically in a voice format.
For example, when the button operated by the user is a picture input button, the query requirement of the obtained corpus product is specifically in a picture format.
The word segmentation and part-of-speech standard processing required for the corpus product query are performed on the basis of a text format, so that in order to ensure that the word segmentation and part-of-speech tagging processing can be smoothly performed on the corpus product query requirement to obtain M words, before performing substep S201, the format of the corpus product query requirement may be determined, and then adaptive adjustment may be performed according to the format of the corpus product query requirement.
For example, if it is determined that the corpus product query requirement is in a voice format, the corpus product query requirement in the voice format is converted into a corpus product query requirement in a text format by using a voice recognition technology, and then the substep S201 is executed; if the query requirement of the corpus product is determined to be in a picture format, the query requirement of the corpus product in the picture format is converted into the query requirement of the corpus product in a text format by using an Optical Character Recognition (OCR) technology, and then the substep S201 is executed, and if the query requirement of the corpus product is in the text format, the substep S201 is directly executed.
That is, the operation in the sub-step S201 is substantially:
according to punctuation marks in the corpus product query requirement in the text format, carrying out sentence segmentation on the corpus product query requirement in the text format to obtain a sentence to be segmented;
performing maximum reverse matching segmentation on the sentence to be segmented according to a user-defined dictionary to obtain the M words;
and performing part-of-speech tagging on the M words according to preset part-of-speech standard information.
Further, in order to ensure that the subsequently determined feature information has a higher reference value, before the keyword extraction operation is performed, a text preprocessing operation may be performed on the corpus product query requirement in the text format.
For example, the stop word is removed, that is, the feedback information is removed, such as: wool, mo, o, etc. have no actual meaning.
For example, invalid special characters, such as emoticons, various punctuation marks, and the like, are removed.
Correspondingly, before the query requirement of the corpus product in the voice format is converted into the query requirement of the corpus product in the text format, a series of preprocessing operations such as filtering, interference sound removal and the like can be performed on the query requirement of the corpus product in the voice format, so that the converted text information is more accurate.
Similarly, before the query requirement of the corpus product in the picture format is converted into the query requirement of the corpus product in the text format, a series of preprocessing operations, such as grayscale processing, denoising and other operations, can be performed on the query requirement of the corpus product in the picture format to ensure that the converted text information is more accurate.
And a substep S202, calculating a weighted value of each word in the M words according to a preset part-of-speech weight distribution standard.
It should be understood that, in an actual query process, pronouns, sighs, conjunctions, vocabularies, etc. are usually not helpful to the query, so that weights assigned to such words should be low, and verbs capable of representing multimedia formats of corpus products required by users (for example, "listen" may refer to multimedia formats of corpus products as audio, "see" as video, "and" read "as text), adjectives capable of representing language formats of corpus products (for example," english "and" chinese "), names capable of representing categories of corpus products, and higher weights assigned to them.
And a substep S203 of traversing the N words, comparing the weight values of the traversed current words with a preset weight threshold value, and filtering out the words with the weight values larger than the weight threshold value to obtain the N keywords.
It should be understood that the above is only a specific implementation manner for extracting the keywords from the query requirement of the corpus product, and the technical solution of the present invention is not limited at all, and in practical applications, those skilled in the art can set the keywords as needed, and the present invention is not limited herein.
And S30, determining the characteristic information corresponding to the corpus products required by the user according to the N key words.
Specifically, the feature information corresponding to the corpus product is a key feature that can identify the corpus product.
For example, by performing the above extraction operation, if the obtained keywords are "listening", "english", and "little prince", it can be determined that the corpus product required by the user should be the audio data according to the keyword "listening", it can be determined that the corpus product needs to be the english version according to the keyword "english", and it can be determined that the type to which the corpus product belongs is the fairy tale category according to the keyword "little prince".
It should be understood that, in practical applications, in order to determine the feature information corresponding to the corpus products according to the keywords, the corresponding relationships between the different keywords and the features of the different corpus products may be pre-constructed, and then determined according to the pre-constructed mapping relationships.
It should be understood that the above description is only for illustrative purposes, and the technical solution of the present invention is not limited in any way, and in practical applications, those skilled in the art can make settings according to the needs, and the present invention is not limited herein.
And S40, searching corpus information which accords with any feature in the feature information from the corpus according to the feature information.
Specifically, the corpus described in this embodiment is a pre-constructed corpus capable of storing a plurality of types of corpus information such as text, picture, audio, video, and the like.
Furthermore, in order to ensure that a multidimensional query, i.e. a query of a plurality of features, can be performed on the corpus on the basis of the determined feature information. The corpus constructed in this embodiment is composed of an elastic search (a search server based on a full-text search engine, abbreviated as ES) as a core, and a montdb (a database based on distributed file storage) and a MySql (a relational database management system) as auxiliary materials.
Specifically, in the ES, the last three digits of the identification number (hereinafter referred to as "ID") of the corpus information collected from each big data platform are used as index (index), the ID of the corpus information is used as type (type), and a plurality of index names such as corpus name, corpus description, corpus tag, language direction, price, sales volume, etc. are created in the data table.
And then, storing the specific language material information by adopting MongoDB, and establishing a corresponding relation between the index of the ES and each language material information in the MongoDB.
Meanwhile, the original data information of the corpus information (i.e. without any processing operations such as the above classification, tag addition, etc.) is stored in the MySql and is in one-to-one correspondence with the index information in the ES.
Therefore, when corpus information is queried from a corpus according to the determined feature information, features in the feature information are directly brought into a query statement written by using a DSL (general-purpose large-data query language DSL (digital subscriber line) of ES (ES), which is used for realizing retrieval and analysis of mass machine data).
For example, fuzzy search matchQuery (…), prefix search (…), filter like termFilter, wizardFilter, etc.
For example, when the subject name "queen" of the determined corpus product to be queried is used as query information, the corpus information queried from the corpus may be corpus information related to the "queen" in any language version and any multimedia format.
And S50, processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain a corpus product with all the features in the feature information, and pushing the corpus product to the user.
Specifically, in practical applications, the operation of obtaining a corpus product having all the features in the feature information can be roughly implemented by the following sub-steps:
firstly, according to the corresponding characteristics of each corpus information, screening out the corpus information with the most characteristics, and taking the corpus information as an initial corpus product;
then, determining features to be integrated according to the feature information and the features corresponding to the initial corpus products;
then, the corpus information corresponding to the features to be integrated is extracted from the corpus information except the initial corpus product;
and finally, combining the extracted corpus information with the initial corpus product to obtain a corpus product with all the characteristics in the characteristic information.
For ease of understanding, the above steps are exemplified below:
for example, the corpus information directly queried from the corpus does not include all the above feature information, and the queried corpus information with the most features is english little prince speech with only audio information, and a chinese version of chinese little prince word novel.
The processing may specifically be to perform language conversion on the queen son word novel of the chinese version to a corresponding english version;
and then, combining the audio information of the English princess with the English version of the word novel, calibrating to ensure that the played audio content and the English version of the word novel can be played synchronously, and highlighting the corresponding words in the process of voice playing for the convenience of the user to check.
The above description is only an example, and the technical solution of the present invention is not limited at all.
In addition, it is worth mentioning that when the corpus information of multiple formats is combined to obtain the corpus product required by the user, tika (a public tool for extracting document contents, introduced by Apache) can be selected, and the existing parsing class library is utilized to detect and extract the metadata and the structured contents from the documents of different formats (such as HTML, PDF, doc).
It is easy to find out through the above description that the method for recommending a corpus product provided in this embodiment extracts a corpus product query requirement provided by a user from a corpus product query request triggered by the user, further determines feature information corresponding to a corpus product required by the user according to N keywords extracted from the corpus product query requirement, then finds out corpus information according to any feature in the determined feature information according to the determined feature information, and finally processes the queried corpus according to the determined feature information and features corresponding to the queried corpus information, so as to obtain a corpus product having the feature information, thereby making the finally screened corpus information be the corpus information meeting the actual needs of the user, and further greatly improving the recommendation accuracy of the corpus product.
In addition, it is worth mentioning that in practical application, the corpus product generated according to the corpus product query requirement provided by the user may need to be charged, so that when the obtained corpus product is recommended to the user, it can be firstly determined whether the corpus product needs to be charged.
Correspondingly, if the fact that the corpus product does not need to be charged is determined, the corpus product is directly pushed to the user; if the fact that the corpus products need to be charged is determined, a charging notice can be issued to the user, then feedback made by the user is monitored, if an instruction of agreeing to charge deduction made by the user is received, the cost needed by the corpus products is deducted from a payment account preset by the user, and then the corpus products are pushed to the user.
Through the operation mode, the user can determine whether the corpus product needs to be paid or not according to the actual situation, and the user experience is greatly improved while the corpus product recommendation accuracy is ensured.
Further, in order to better improve user experience, when an instruction fed back by a user is not agreed, time is consumed, in order to keep the amount of users using the corpus as much as possible and avoid user loss, a mode of obtaining the corpus products free of charge can be recommended to the user, for example, related information of the corpus is shared to a preset number of chat groups or a preset number of new users is invited, so that user loss is avoided, and popularization of the corpus can be achieved.
In addition, in order to better maintain and manage the corpus information in the corpus and further enable the corpus finished product synthesized according to the corpus information in the corpus to better fit with the user requirements, the corpus finished product is pushed to the user, then the feedback information submitted by the user can be further received, and then the corpus information in the corpus is maintained and managed according to the feedback information.
Referring to fig. 4, fig. 4 is a flowchart illustrating a second embodiment of a corpus product recommendation method according to the present invention.
Based on the first embodiment, before step S40, the method for recommending a corpus product in this embodiment further includes:
step S00, detecting whether the characteristic information contains the characteristics of identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product. If the feature information contains the features of identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product through detection, executing a step S40; otherwise, step S01 is executed.
And S01, acquiring a historical query record of the user in a preset period, and determining the characteristic information according to the historical query record and the N keywords.
In practical applications, the operation in step S01 can be specifically realized through the following sub-steps:
(1) And acquiring the historical query record of the user in a preset period.
Specifically, the historical query records mainly record the type, feature information, and the like of the corpus products that the user has queried before (for example, in the last month), so that the user's preference can be determined according to the historical query records.
In addition, in the embodiment, the acquired historical query record is limited to be the content of a preset period, for example, the last week, so that the information in the acquired historical query record has a higher reference value.
(2) And analyzing the historical query records by utilizing a big data analysis technology to determine the query requirement of the user at the current moment.
Specifically, the big data score technology is used for analyzing the historical query records, specifically, by counting which keywords in the historical query records are used with high frequency, categories to which the corpus products frequently searched by the user recently belong, language formats of the corpus products, and multimedia formats.
(3) And taking the query requirement at the current moment as a first element, and taking the N keywords as a second element.
(4) And determining the category of the corpus product, the language format of the corpus product and the multimedia style characteristic of the corpus product according to the first element and the second element.
Further, in practical application, in order to make the feature information of the corpus product finally determined more accurate, that is, the corpus product recommended to the user according to the determined feature information more meets the user requirement, when the historical query record is obtained, the biological feature information of the user, preferably, the human face feature information and the voiceprint feature information, can be obtained, so that the gender and the approximate age of the user can be determined through the analysis of the biological feature information, and the content concerned by the user of the gender in the age interval can be screened out, so that the corpus product finally recommended to the user more meets the user requirement.
It should be noted that, in practical applications, when most users use the corpus, perfect personal information cannot be filled, so that the actual age, sex, and the like of the user cannot be obtained from the personal information.
Further, in practical application, for convenience, rapidness and accuracy, the age and the sex of the user are determined according to the acquired biological characteristic information, and a big data analysis model can be constructed in advance by using a big data analysis technology and assisting a machine learning algorithm. Then, after the biometric information is acquired, the acquired biometric information is directly input into the analysis model, so that the age and the gender of the user can be acquired.
Regarding the construction of the big data analysis model, roughly the following may be made:
firstly, acquiring biological characteristic information of users with known sexes and ages from each big data platform;
then, the biological characteristic information of the known gender and age is used as sample data and input into a big data analysis training model for training until the age and the gender of the user corresponding to the sample data can be accurately output after the trained sample data is input, and then the training can be finished.
Accordingly, the big data analysis training model at the moment is the needed big data analysis model.
In addition, in practical application, the selected machine learning algorithm is preferably a convolutional neural network algorithm.
Because the convolutional neural network algorithm is mature, in a specific implementation, a person skilled in the art can check related data of the convolutional neural network algorithm by himself, and details are not described here.
For example, when the query requirement of the corpus product provided by the user is only two words of "novel speech", after analyzing the biometric information of the user by using big data analysis technology, it is determined that the user is a female aged about 30 years old.
In addition, according to the acquired historical query records of the user, the fact that the user frequently queries the hallucinography-type cartoon novel is found.
Therefore, according to the information, the corpus product which the user needs to inquire is a mythic type cartoon novel which is suitable for women aged 30 or so to read.
Accordingly, the determined characteristic information may be: 30 years old, female, hallucination, animation, novel.
It should be noted that, the foregoing is only an example, and the technical solution of the present invention is not limited at all, and in the specific implementation, a person skilled in the art may perform setting according to needs, and details are not described here.
According to the method for recommending the corpus product, before searching the corpus information which meets any one of the characteristics in the characteristic information from the corpus according to the characteristic information, whether the characteristic information comprises the characteristics for identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product or not is determined by detecting whether the characteristic information comprises the characteristics for identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product or not, the characteristics are determined by reacquiring the parameter information, and then the searching operation of the corpus information is performed, so that the accuracy of the characteristic information for searching the corpus information is effectively guaranteed, and the subsequently obtained corpus product can meet the actual requirements of users.
In addition, an embodiment of the present invention further provides a storage medium, where a recommendation program of a corpus product is stored on the storage medium, and the recommendation program of the corpus product, when executed by a processor, implements the steps of the recommendation method of the corpus product as described above.
Referring to fig. 5, fig. 5 is a block diagram illustrating a first embodiment of a corpus product recommendation device according to the present invention.
As shown in fig. 5, the apparatus for recommending corpus products according to the embodiment of the present invention includes: an obtaining module 5001, an extracting module 5002, a determining module 5003, a searching module 5004 and a generating module 5005.
The obtaining module 5001 is configured to receive a corpus product query request triggered by a user, and obtain a corpus product query requirement provided by the user according to the corpus product query request; an extraction module 5002, configured to perform keyword extraction processing on the corpus product query requirement to obtain N keywords, where N is an integer greater than or equal to 1; the determining module 5003 is configured to determine, according to the N keywords, feature information corresponding to the corpus product required by the user; the searching module 5004 is configured to search corpus information conforming to any feature in the feature information from a corpus according to the feature information; the generating module 5005 is configured to process each corpus information according to the feature information and features corresponding to each corpus information, obtain corpus products having all features in the feature information, and push the corpus products to the user.
To facilitate understanding of the operation of the extraction module 5002 in extracting keywords from the corpus product query request, a specific implementation is given below, which is roughly as follows:
firstly, performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words;
then, calculating a weight value of each word in the M words according to a preset part-of-speech weight distribution standard;
and finally, traversing the N words, comparing the weight value of the traversed current word with a preset weight threshold value, and filtering out the words with the weight values larger than the weight threshold value to obtain the N keywords.
It should be understood that M should be an integer less than or equal to N in practical applications.
In addition, it is worth mentioning that, in practical application, the corpus product query requirement provided by the user may be different in format due to different operation keys corresponding to the corpus product query request triggered by the user, so as to ensure that the extraction module 5002 can smoothly perform word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words, before the extraction module 5002 performs the above operations, the extraction module 5002 is further configured to: and determining the format of the query requirement of the corpus product.
Correspondingly, if the corpus product query requirement is determined to be in a voice format, converting the corpus product query requirement in the voice format into a corpus product query requirement in a text format by using a voice recognition technology; and if the corpus product query requirement is determined to be in a picture format, converting the corpus product query requirement in the picture format into a corpus product query requirement in a text format by using an optical character recognition technology.
Accordingly, in order to facilitate understanding of the operation of performing word segmentation and part-of-speech tagging on the query requirement of the corpus product to obtain M words, a specific implementation manner is provided in this embodiment, which is substantially as follows:
according to punctuation marks in the corpus product query requirement in the text format, carrying out sentence segmentation on the corpus product query requirement in the text format to obtain a sentence to be segmented;
performing maximum reverse matching segmentation on the sentence to be segmented, and determining the M words according to a user-defined dictionary;
and performing part-of-speech tagging on the M words according to preset part-of-speech standard information.
It should be understood that the above is only a specific implementation manner, and the technical solution of the present invention is not limited in any way, and in practical applications, those skilled in the art can perform setting according to needs, and the present invention is not limited herein.
In addition, to facilitate understanding of the operation of the generating module 5005 for generating the corpus product required by the user, a specific implementation is given below, which is roughly as follows:
firstly, screening out corpus information with the most characteristics according to the characteristics corresponding to all corpus information, and taking the corpus information as an initial corpus product;
then, determining features to be integrated according to the feature information and the features corresponding to the initial corpus products;
then, the corpus information corresponding to the features to be integrated is extracted from the corpus information except the initial corpus product;
and finally, combining the extracted corpus information with the initial corpus product to obtain a corpus product with all the characteristics in the characteristic information.
It should be understood that the above is only a specific implementation manner, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the implementation manner as needed, and the present invention is not limited thereto.
Through the above description, it is not difficult to find that the apparatus for recommending corpus products provided in this embodiment extracts corpus product query requirements provided by a user from a corpus product query request triggered by the user, and then determines feature information corresponding to a corpus product required by the user according to N keywords extracted from the corpus product query requirements, and then finds corpus information according to any feature in the determined feature information according to the determined feature information, and finally processes the queried corpus according to the determined feature information and features corresponding to the queried corpus information, so as to obtain a corpus product having the feature information, thereby making the finally screened corpus information be the corpus information meeting the actual needs of the user, and further greatly improving the recommendation accuracy of the corpus product.
In addition, it is worth mentioning that in practical application, the corpus product generated according to the corpus product query requirement provided by the user may need to be charged, so that when the obtained corpus product is recommended to the user, it can be firstly determined whether the corpus product needs to be charged.
Correspondingly, if the fact that the corpus product does not need to be charged is determined, the corpus product is directly pushed to the user; if the fact that the corpus products need to be charged is determined, a charging notice can be issued to the user, then feedback made by the user is monitored, if an instruction of agreeing to charge deduction made by the user is received, the cost needed by the corpus products is deducted from a payment account preset by the user, and then the corpus products are pushed to the user.
Through the operation mode, the user can determine whether to pay to obtain the corpus product according to actual conditions, and the user experience is greatly improved while the recommendation accuracy of the corpus product is ensured.
Further, in order to better improve user experience, when an instruction fed back by a user is not agreed, time is consumed, in order to keep the amount of users using the corpus as much as possible and avoid user loss, a mode of obtaining the corpus products free of charge can be recommended to the user, for example, related information of the corpus is shared to a preset number of chat groups or a preset number of new users is invited, so that user loss is avoided, and popularization of the corpus can be achieved.
In addition, in order to better maintain and manage the corpus information in the corpus and further enable the corpus finished product synthesized according to the corpus information in the corpus to better fit the user requirements, the feedback information submitted by the user can be further received after the corpus product is pushed to the user, and then the corpus information in the corpus is maintained and managed according to the feedback information.
It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.
In addition, the technical details that are not described in detail in this embodiment may refer to the recommendation method for corpus products provided in any embodiment of the present invention, and are not described herein again.
Based on the first embodiment of the apparatus for recommending corpus products, a second embodiment of the apparatus for recommending corpus products is provided.
In this embodiment, the apparatus for recommending corpus products further includes: and a detection module.
The detection module is configured to detect whether the feature information includes a feature that identifies a category to which the corpus product belongs, identifies a language format of the corpus product, and identifies a multimedia style of the corpus product.
Correspondingly, if the characteristic information is determined to contain the characteristics of identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product through detection, the searching module is triggered to execute the operation of searching the corpus information which accords with any characteristic in the characteristic information from the corpus according to the characteristic information.
Otherwise (without any or several of the above features), triggering the search module to perform the following steps:
firstly, acquiring a historical query record of the user in a preset period;
then, analyzing the historical query records by utilizing a big data analysis technology to determine the query requirement of the user at the current moment;
then, taking the query requirement at the current moment as a first element, and taking the N keywords as a second element;
and finally, determining the category of the corpus product, the language format of the corpus product and the multimedia style characteristic of the corpus product according to the first element and the second element.
It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the technical solution as needed, and the present invention is not limited in this respect.
It is not difficult to find out through the above description that the apparatus for recommending corpus products according to this embodiment detects whether the feature information includes the feature for identifying the category to which the corpus product belongs, the language format of the corpus product, and the multimedia style of the corpus product before searching the corpus information according to any feature in the feature information from the corpus, and then determines whether to perform the search operation of the corpus information according to the feature information determined at the beginning or to obtain the parameter information again to determine the above feature, and then performs the search operation of the corpus information, thereby effectively ensuring the accuracy of the feature information used for searching the corpus information, and enabling the subsequently obtained corpus products to better meet the actual requirements of the user.
It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.
In addition, the technical details that are not described in detail in this embodiment may refer to the recommendation method for corpus products provided in any embodiment of the present invention, and are not described herein again.
Further, it is to be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g. a Read Only Memory (ROM)/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.

Claims (9)

1. A method for recommending corpus products, the method comprising:
receiving a corpus product query request triggered by a user, and acquiring a corpus product query requirement provided by the user according to the corpus product query request;
performing keyword extraction processing on the query requirement of the corpus product to obtain N keywords, wherein N is an integer greater than or equal to 1;
determining characteristic information corresponding to the corpus products required by the user according to the N key words;
searching corpus information which accords with any feature in the feature information from a corpus according to the feature information;
processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain a corpus product with all features in the feature information, and pushing the corpus product to the user;
wherein, the step of processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain the corpus product having all the features in the feature information comprises:
according to the characteristics corresponding to each corpus information, screening out the corpus information with the most characteristics, and taking the corpus information as an initial corpus product;
determining features to be integrated according to the feature information and features corresponding to the initial corpus products;
extracting the corpus information corresponding to the features to be integrated from the corpus information except the initial corpus product;
and combining the extracted corpus information with the initial corpus product to obtain a corpus product with all the characteristics in the characteristic information.
2. The method according to claim 1, wherein said step of extracting keywords from said corpus product query request to obtain N keywords comprises:
performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words, wherein M is an integer less than or equal to N;
calculating the weight value of each word in the M words according to a preset part-of-speech weight distribution standard;
traversing the M words, comparing the weight value of the traversed current word with a preset weight threshold value, and filtering out the words with the weight values larger than the weight threshold value to obtain the N keywords.
3. The method as claimed in claim 2, wherein before the step of performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words, the method further comprises:
determining a format of the corpus product query requirement;
if the corpus product query requirement is in a voice format, converting the corpus product query requirement in the voice format into a corpus product query requirement in a text format by utilizing a voice recognition technology;
if the corpus product query requirement is in a picture format, converting the corpus product query requirement in the picture format into a corpus product query requirement in a text format by using an optical character recognition technology;
the step of performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words comprises the following steps:
according to punctuation marks in the corpus product query requirement in the text format, carrying out sentence segmentation on the corpus product query requirement in the text format to obtain a sentence to be segmented;
performing maximum reverse matching segmentation on the sentence to be segmented, and determining the M words according to a user-defined dictionary;
and performing part-of-speech tagging on the M words according to preset part-of-speech standard information.
4. The method according to claim 2, wherein before the step of searching corpus information corresponding to any feature in the feature information from a corpus according to the feature information, the method further comprises:
detecting whether the characteristic information contains characteristics for identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product;
if the feature information contains the features of identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product, executing the following steps: searching corpus information which accords with any feature in the feature information from a corpus according to the feature information;
otherwise, executing the following steps:
acquiring a historical query record of the user in a preset period;
analyzing the historical query record by utilizing a big data analysis technology to determine the query requirement of the user at the current moment;
taking the query requirement at the current moment as a first element, and taking the N keywords as a second element;
and determining the category of the corpus product, the language format of the corpus product and the multimedia style characteristic of the corpus product according to the first element and the second element.
5. The method of any of claims 1 to 4, wherein prior to the step of pushing the corpus product to the user, the method further comprises:
judging whether the corpus product needs to be charged or not;
if the corpus product does not need to be charged, executing the following steps: pushing the corpus product to the operation of the user;
and if the corpus products need to be charged, issuing a charging notice to the user, deducting the cost required by the corpus products from a payment account preset by the user after receiving an instruction of agreeing to deduction made by the user, and pushing the corpus products to the user.
6. The method according to any of the claims 1 to 4, wherein after the step of pushing the corpus product to the user, the method further comprises:
and receiving feedback information submitted by the user, and maintaining the corpus information in the corpus according to the feedback information.
7. An apparatus for recommending corpus products, the apparatus comprising:
the system comprises an acquisition module, a query module and a query module, wherein the acquisition module is used for receiving a corpus product query request triggered by a user and acquiring a corpus product query requirement provided by the user according to the corpus product query request;
the extraction module is used for extracting keywords from the corpus product query requirement to obtain N keywords, wherein N is an integer greater than or equal to 1;
the determining module is used for determining the characteristic information corresponding to the corpus products required by the user according to the N key words;
the searching module is used for searching the corpus information which accords with any feature in the feature information from the corpus according to the feature information;
the generating module is used for processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain a corpus product with all features in the feature information, and pushing the corpus product to the user;
the generating module is further configured to screen out corpus information with the most features according to the features corresponding to the corpus information, and use the corpus information as an initial corpus product; determining features to be integrated according to the feature information and features corresponding to the initial corpus products; extracting the corpus information corresponding to the features to be integrated from the corpus information except the initial corpus product; and combining the extracted corpus information with the initial corpus product to obtain a corpus product with all the characteristics in the characteristic information.
8. A corpus product recommendation apparatus, characterized in that the apparatus comprises: a memory, a processor and a corpus product recommendation program stored on the memory and executable on the processor, the corpus product recommendation program being configured to implement the steps of the corpus product recommendation method according to any one of claims 1 to 6.
9. A storage medium having stored thereon a corpus product recommendation program, which when executed by a processor implements the steps of the corpus product recommendation method according to any one of claims 1 to 6.
CN201910433178.8A 2019-05-21 2019-05-21 Corpus product recommendation method, apparatus, device and storage medium Active CN110297880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910433178.8A CN110297880B (en) 2019-05-21 2019-05-21 Corpus product recommendation method, apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910433178.8A CN110297880B (en) 2019-05-21 2019-05-21 Corpus product recommendation method, apparatus, device and storage medium

Publications (2)

Publication Number Publication Date
CN110297880A CN110297880A (en) 2019-10-01
CN110297880B true CN110297880B (en) 2023-04-18

Family

ID=68027101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910433178.8A Active CN110297880B (en) 2019-05-21 2019-05-21 Corpus product recommendation method, apparatus, device and storage medium

Country Status (1)

Country Link
CN (1) CN110297880B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968800B (en) * 2019-11-26 2023-05-02 北京明略软件系统有限公司 Information recommendation method and device, electronic equipment and readable storage medium
CN111209363B (en) * 2019-12-25 2024-02-09 华为技术有限公司 Corpus data processing method, corpus data processing device, server and storage medium
CN113111155B (en) * 2020-01-10 2024-04-19 阿里巴巴集团控股有限公司 Information display method, device, equipment and storage medium
CN112183089A (en) * 2020-09-25 2021-01-05 中国建设银行股份有限公司 Corpus analysis method and device, electronic equipment and storage medium
CN114385781B (en) * 2021-11-30 2022-09-27 南京数睿数据科技有限公司 Interface file recommendation method, device, equipment and medium based on statement model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0756933A (en) * 1993-06-24 1995-03-03 Xerox Corp Method for retrieval of document
CN103530385A (en) * 2013-10-18 2014-01-22 北京奇虎科技有限公司 Method and device for searching for information based on vertical searching channels
CN107391690A (en) * 2017-07-25 2017-11-24 李小明 A kind of method for handling documentation & info
CN109325182A (en) * 2018-10-12 2019-02-12 平安科技(深圳)有限公司 Dialogue-based information-pushing method, device, computer equipment and storage medium
WO2019049089A1 (en) * 2017-09-11 2019-03-14 Indian Institute Of Technology, Delhi Method, system and apparatus for multilingual and multimodal keyword search in a mixlingual speech corpus

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7904453B2 (en) * 2002-10-17 2011-03-08 Poltorak Alexander I Apparatus and method for analyzing patent claim validity
US7680853B2 (en) * 2006-04-10 2010-03-16 Microsoft Corporation Clickable snippets in audio/video search results

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0756933A (en) * 1993-06-24 1995-03-03 Xerox Corp Method for retrieval of document
CN103530385A (en) * 2013-10-18 2014-01-22 北京奇虎科技有限公司 Method and device for searching for information based on vertical searching channels
CN107391690A (en) * 2017-07-25 2017-11-24 李小明 A kind of method for handling documentation & info
WO2019049089A1 (en) * 2017-09-11 2019-03-14 Indian Institute Of Technology, Delhi Method, system and apparatus for multilingual and multimodal keyword search in a mixlingual speech corpus
CN109325182A (en) * 2018-10-12 2019-02-12 平安科技(深圳)有限公司 Dialogue-based information-pushing method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种建立在对客户端浏览历史进行LDA建模基础上的个性化查询推荐算法;王桂华等;《四川大学学报(自然科学版)》;20151231(第004期);第467-472页 *

Also Published As

Publication number Publication date
CN110297880A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
CN110297880B (en) Corpus product recommendation method, apparatus, device and storage medium
CN106649818B (en) Application search intention identification method and device, application search method and server
US9864741B2 (en) Automated collective term and phrase index
CA2638558C (en) Topic word generation method and system
CN107967250B (en) Information processing method and device
WO2020233386A1 (en) Intelligent question-answering method and device employing aiml, computer apparatus, and storage medium
CN109508458B (en) Legal entity identification method and device
CN111324771B (en) Video tag determination method and device, electronic equipment and storage medium
CN110929498A (en) Short text similarity calculation method and device and readable storage medium
CN114064851A (en) Multi-machine retrieval method and system for government office documents
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN111625621B (en) Document retrieval method and device, electronic equipment and storage medium
CN114266256A (en) Method and system for extracting new words in field
CN110795942B (en) Keyword determination method and device based on semantic recognition and storage medium
CN114222000B (en) Information pushing method, device, computer equipment and storage medium
CN111259645A (en) Referee document structuring method and device
CN109065015B (en) Data acquisition method, device and equipment and readable storage medium
CN110990003A (en) API recommendation method based on word embedding technology
CN108345694B (en) Document retrieval method and system based on theme database
JP5345987B2 (en) Document search apparatus, document search method, and document search program
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN110688559A (en) Retrieval method and device
CN110705285A (en) Government affair text subject word bank construction method, device, server and readable storage medium
JP2004206391A (en) Document information analyzing apparatus
CN109298796B (en) Word association method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant