CN110297880B

CN110297880B - Corpus product recommendation method, apparatus, device and storage medium

Info

Publication number: CN110297880B
Application number: CN201910433178.8A
Authority: CN
Inventors: 韩亚洲
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2023-04-18
Anticipated expiration: 2039-05-21
Also published as: CN110297880A

Abstract

The invention belongs to the technical field of big data analysis and discloses a corpus product recommendation method, device, equipment and storage medium. The method comprises the following steps: receiving a corpus product query request triggered by a user, and acquiring a corpus product query requirement provided by the user according to the corpus product query request; carrying out keyword extraction processing on the query requirement of the corpus products to obtain N keywords, wherein N is an integer greater than or equal to 1; determining characteristic information corresponding to the corpus products required by the user according to the N key words; searching corpus information which accords with any feature in the feature information from a corpus according to the feature information; and processing each corpus information according to the characteristic information and the characteristics corresponding to each corpus information to obtain a corpus product with all the characteristics in the characteristic information, and pushing the corpus product to a user. By the method, the corpus products recommended for the users are in accordance with the actual requirements of the users, and the recommendation accuracy of the corpus products is greatly improved.

Description

Corpus product recommendation method, apparatus, device and storage medium

Technical Field

The invention relates to the technical field of big data analysis, in particular to a corpus product recommendation method, a corpus product recommendation device, corpus product recommendation equipment and a storage medium.

Background

Traditional corpora refer to a large-scale electronic text library that is scientifically sampled and processed. With the development of the times, the current corpus is not limited to storing only corpus information of text types, but also can store various types of corpus information such as pictures, audio, video and the like.

Although the corpus information stored in the existing corpus is various and huge in quantity. However, the query requirements of the user cannot be comprehensively identified in the conventional corpus query mode, so that the screened corpus information does not meet the actual requirements of the user, and the recommendation accuracy of corpus products is low. .

Therefore, it is highly desirable to provide a method for recommending corpus products to users according to the actual requirements of the users, so as to improve the accuracy of recommending corpus products.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a storage medium for recommending a corpus product, aiming at recommending the corpus product for a user according to the actual requirement of the user so as to improve the recommendation accuracy of the corpus product.

In order to achieve the above object, the present invention provides a method for recommending corpus products, comprising the following steps:

receiving a corpus product query request triggered by a user, and acquiring a corpus product query requirement provided by the user according to the corpus product query request;

extracting keywords from the query requirement of the corpus product to obtain N keywords, wherein N is an integer greater than or equal to 1;

determining characteristic information corresponding to the corpus products required by the user according to the N key words;

searching corpus information which accords with any feature in the feature information from a corpus according to the feature information;

and processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain a corpus product with all the features in the feature information, and pushing the corpus product to the user.

Preferably, the step of extracting keywords from the query requirement of the corpus product to obtain N keywords includes:

performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words, wherein M is an integer less than or equal to N;

calculating the weight value of each word in the M words according to a preset part-of-speech weight distribution standard;

traversing the N words, comparing the weight value of the traversed current word with a preset weight threshold value, and filtering out words with the weight values larger than the weight threshold value to obtain the N keywords.

Preferably, before the step of performing word segmentation and part-of-speech tagging on the query requirement of the corpus product to obtain M words, the method further comprises:

determining a format of the corpus product query requirement;

if the corpus product query requirement is in a voice format, converting the corpus product query requirement in the voice format into a corpus product query requirement in a text format by utilizing a voice recognition technology;

if the corpus product query requirement is in a picture format, converting the corpus product query requirement in the picture format into a corpus product query requirement in a text format by using an optical character recognition technology;

the step of performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words comprises the following steps:

according to punctuation marks in the corpus product query requirement in the text format, carrying out sentence segmentation on the corpus product query requirement in the text format to obtain a sentence to be segmented;

performing maximum reverse matching segmentation on the sentence to be segmented, and determining the M words according to a user-defined dictionary;

and performing part-of-speech tagging on the M words according to preset part-of-speech standard information.

Preferably, before the step of searching corpus information corresponding to any feature in the feature information from a corpus according to the feature information, the method further includes:

detecting whether the characteristic information contains characteristics for identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product;

if the feature information contains the features of identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product, executing the following steps: searching corpus information which accords with any feature in the feature information from a corpus according to the feature information;

otherwise, executing the following steps:

acquiring a historical query record of the user in a preset period;

analyzing the historical query records by utilizing a big data analysis technology to determine the query requirement of the user at the current moment;

taking the query requirement at the current moment as a first element, and taking the N keywords as a second element;

and determining the category of the corpus product, the language format of the corpus product and the multimedia style characteristic of the corpus product according to the first element and the second element.

Preferably, the step of processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain a corpus product having all features in the feature information includes:

according to the characteristics corresponding to the corpus information, screening out the corpus information with the most characteristics, and taking the corpus information as an initial corpus product;

determining features to be integrated according to the feature information and features corresponding to the initial corpus products;

extracting the corpus information corresponding to the features to be integrated from the corpus information except the initial corpus product;

and combining the extracted corpus information with the initial corpus product to obtain a corpus product with all the characteristics in the characteristic information.

Preferably, before the step of pushing the corpus product to the user, the method further comprises:

judging whether the corpus product needs to be charged or not;

if the corpus product does not need to be charged, executing the following steps: pushing the corpus product to the operation of the user;

and if the corpus products need to be charged, issuing a charging notice to the user, deducting the cost required by the corpus products from a payment account preset by the user after receiving an instruction of agreeing to deduction made by the user, and pushing the corpus products to the user.

Preferably, after the step of pushing the corpus product to the user, the method further comprises:

and receiving feedback information submitted by the user, and maintaining the corpus information in the corpus according to the feedback information.

In addition, in order to achieve the above object, the present invention further provides a corpus product recommendation apparatus, including:

the acquisition module is used for receiving a corpus product query request triggered by a user and acquiring a corpus product query demand provided by the user according to the corpus product query request;

the extraction module is used for extracting keywords from the query requirement of the corpus product to obtain N keywords, wherein N is an integer greater than or equal to 1;

the determining module is used for determining the characteristic information corresponding to the corpus products required by the user according to the N key words;

the searching module is used for searching the corpus information which accords with any feature in the feature information from the corpus according to the feature information;

and the generating module is used for processing each corpus information according to the characteristic information and the characteristics corresponding to each corpus information to obtain a corpus product with all the characteristics in the characteristic information, and pushing the corpus product to the user.

In addition, in order to achieve the above object, the present invention further provides a recommendation apparatus for corpus products, including: the recommendation program of the corpus product is configured to implement the steps of the recommendation method of the corpus product as described above.

In addition, in order to achieve the above object, the present invention further provides a storage medium, where a recommendation program of a corpus product is stored, and the recommendation program of the corpus product, when executed by a processor, implements the steps of the recommendation method of the corpus product as described above.

According to the recommendation scheme of the corpus products, the corpus product query requirement provided by the user is extracted from the corpus product query request triggered by the user, the feature information corresponding to the corpus product required by the user is determined according to the N key words extracted from the corpus product query requirement, the corpus information which accords with any feature in the determined feature information is searched out from the corpus according to the determined feature information, and finally, the queried corpus is processed according to the determined feature information and the features corresponding to the queried corpus information, so that the corpus products with the feature information can be obtained, the finally screened corpus information is the corpus information meeting the actual requirements of the user, and the recommendation accuracy of the corpus products is greatly improved.

Drawings

Fig. 1 is a schematic structural diagram of a recommendation device for a corpus product of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for recommending corpus products according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a specific implementation of step S20 in the corpus product recommendation method according to the present invention;

FIG. 4 is a flowchart illustrating a method for recommending corpus products according to a second embodiment of the present invention;

FIG. 5 is a block diagram illustrating a first exemplary embodiment of a corpus product recommendation device according to the present invention.

The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a recommendation device for a corpus product of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the apparatus for recommending corpus products may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001 described previously.

Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the recommendation device for a material product and may include more or less components than those shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include an operating system, a network communication module, a user interface module, and a recommendation program for a corpus product.

In the apparatus for recommending corpus products shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the apparatus for recommending a corpus product according to the present invention may be disposed in the apparatus for recommending a corpus product, and the apparatus for recommending a corpus product calls the recommendation program of a corpus product stored in the memory 1005 through the processor 1001, and executes the method for recommending a corpus product according to the embodiment of the present invention.

An embodiment of the present invention provides a method for recommending a corpus product, and referring to fig. 2, fig. 2 is a schematic flow diagram of a first embodiment of the method for recommending a corpus product according to the present invention.

In this embodiment, the method for recommending a corpus product includes the following steps:

step S10, receiving a language material product query request triggered by a user, and acquiring a language material product query requirement provided by the user according to the language material product query request.

Specifically, the execution main body of the embodiment may be any terminal device used by a user for performing a corpus product query operation, such as a smart phone, a tablet computer, a personal computer, and the like, which are not listed one by one here, and are not limited thereto.

Correspondingly, the manner of triggering the query request of the corpus product may be specifically that the user opens a corpus query Application program (App) provided by a corpus transaction platform installed on the terminal device, and then the corpus query App is generated by clicking a certain function key on the corpus query App, such as a text input box set on the corpus query App, or a voice input key, or an operation key such as a picture input key.

Correspondingly, the obtained query request for the corpus product may be information input by the user when operating the function key.

And step S20, carrying out keyword extraction processing on the query requirement of the corpus products to obtain N keywords.

It should be understood that in practical applications, the corpus product query requirement provided by the user may include at least one word, several words, a sentence, or more information. Therefore, after the keyword extraction processing is performed on the corpus product query requirement, at least one N keyword is obtained, that is, the value of N should be an integer greater than or equal to 1.

In addition, in order to facilitate understanding of the operation of extracting the keywords from the query requirement of the corpus product to obtain N keywords, a specific extraction method is provided in this embodiment, and the general implementation steps are shown in fig. 3 and described in detail below with reference to fig. 3.

And a substep S201, performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words.

It should be understood that, since the finally determined N keywords are selected from the obtained M words, the value of M must not be greater than the value of N in practical applications, that is, M should be an integer less than or equal to N.

In addition, in this embodiment, the word segmentation and part-of-speech tagging processing performed on the corpus product query requirement in the substep S201 specifically includes:

firstly, according to punctuation marks in the query requirement of the corpus product, such as commas, periods and the like, the query requirement of the corpus product is divided into sentences to obtain sentences to be divided.

For example, the query requirement of the corpus product provided by the user is that "hello, i want to listen to english boy. The system traverses the contents in the sentences, when the traversed current character is 'and' the 'is', the system carries out sentence segmentation, the traversed 'and' the previous contents are used as a sentence to be segmented (called as a first sentence to be segmented), and then the system traverses the subsequent contents and then traverses the 'the sentence to be segmented'. When the punctuation mark is 'used, the punctuation mark is divided into a plurality of punctuation marks', and the punctuation mark is marked at present. "as another sentence to be participled (called a second sentence to be participled).

And then, performing maximum reverse matching segmentation on the sentence to be segmented, and determining the M words according to a user-defined dictionary.

Specifically, the term "maximum reverse matching segmentation" refers to segmentation from right to left when segmenting a sentence to be segmented.

The self-defined dictionary is the existing phrase collected and recorded from each big data platform and dictionary in advance, and basically comprises the existing words in various forms which may appear.

For convenience of understanding, a maximum reverse matching segmentation mode is adopted here to segment the second to-be-segmented sentence obtained in the above example.

Suppose that the words recorded in the custom dictionary D are: d = { "i", "look", "read", "listen", "chinese", "english", "ten thousand reasons", "little prince". }.

When the maximum inverse matching segmentation operation is performed on the second sentence to be segmented (S = { "i want to listen to english," small prince "}), a maximum segmentation length is defined, for example, 6, and then segmentation is performed from right to left:

(1) The candidate word W1 taken out of S is 'I want to listen to English';

(2) Searching for words recorded in the custom dictionary D, finding that the candidate word W1 is not in the custom dictionary D, and removing the first leftmost word of the candidate word W1 to obtain a candidate word W2 'which wants to listen to English';

(3) Searching words recorded in the custom dictionary D, finding that the candidate word W2 is not in the custom dictionary D, and removing the first leftmost character of the candidate word W2 to obtain a candidate word W3 which is 'listening to English';

(4) Searching for the word recorded in the custom dictionary D, finding that the candidate word W3 is not in the custom dictionary D, and removing the first leftmost word of the candidate word W3 to obtain a candidate word W4 in English;

(5) Searching for words recorded in the custom dictionary D, finding that the candidate word W4 is in the custom dictionary D, at the moment, splitting the candidate word W4 from S, wherein S is changed into 'I want to listen to the prince';

(6) According to the segmentation length 6, intercepting the content in the S again to obtain a candidate word W5' I want to listen to the prince;

(7) And (5) repeatedly executing the operations in the step (1) to the step (6) until the content in the S is completely cut.

According to the segmentation operation, words segmented from the second sentence to be segmented, namely the small prince that i want to listen to English, are as follows: i, listen, english, xiaowangzi.

It should be understood that the above-mentioned is only a specific word segmentation manner, and the technical solution of the present invention is not limited in any way, and in practical applications, those skilled in the art can perform the setting according to the needs, and the present invention is not limited herein.

In addition, it is worth mentioning that in practical application, the administrator of the corpus can update the custom dictionary according to the historical query record of the user.

And finally, performing part-of-speech tagging on the M words according to preset part-of-speech standard information.

It should be noted that the part-of-speech standard information in this embodiment specifically refers to chinese part-of-speech standard information, and it is specifically specified in the part-of-speech standard information which type of word is a noun, which type of word is a standing-name word, which type of word is a verb, which type of word is an adjective, which type of word is a time word, and the like, and this is not listed one by one.

Still taking the 4 words obtained by splitting as an example, the result obtained by performing part-of-speech tagging on the 4 words according to the part-of-speech standard information may be as follows: "i" < pronouns >, "listen" < verbs >, "english" < adjectives >, "little prince" < nouns >.

It should be understood that the above description is given only by way of illustration, and the technical solution of the present invention is not limited thereto.

In addition, it is worth mentioning that in practical applications, the query requirement of the corpus product provided by the user may be different in format due to different operation keys corresponding to the triggered corpus product query request.

For example, when the key operated by the user is a text input box, the query requirement of the obtained corpus product is specifically in a text format.

For example, when the key operated by the user is a voice input key, the query requirement of the obtained corpus product is specifically in a voice format.

For example, when the button operated by the user is a picture input button, the query requirement of the obtained corpus product is specifically in a picture format.

The word segmentation and part-of-speech standard processing required for the corpus product query are performed on the basis of a text format, so that in order to ensure that the word segmentation and part-of-speech tagging processing can be smoothly performed on the corpus product query requirement to obtain M words, before performing substep S201, the format of the corpus product query requirement may be determined, and then adaptive adjustment may be performed according to the format of the corpus product query requirement.

For example, if it is determined that the corpus product query requirement is in a voice format, the corpus product query requirement in the voice format is converted into a corpus product query requirement in a text format by using a voice recognition technology, and then the substep S201 is executed; if the query requirement of the corpus product is determined to be in a picture format, the query requirement of the corpus product in the picture format is converted into the query requirement of the corpus product in a text format by using an Optical Character Recognition (OCR) technology, and then the substep S201 is executed, and if the query requirement of the corpus product is in the text format, the substep S201 is directly executed.

That is, the operation in the sub-step S201 is substantially:

performing maximum reverse matching segmentation on the sentence to be segmented according to a user-defined dictionary to obtain the M words;

Further, in order to ensure that the subsequently determined feature information has a higher reference value, before the keyword extraction operation is performed, a text preprocessing operation may be performed on the corpus product query requirement in the text format.

For example, the stop word is removed, that is, the feedback information is removed, such as: wool, mo, o, etc. have no actual meaning.

For example, invalid special characters, such as emoticons, various punctuation marks, and the like, are removed.

Correspondingly, before the query requirement of the corpus product in the voice format is converted into the query requirement of the corpus product in the text format, a series of preprocessing operations such as filtering, interference sound removal and the like can be performed on the query requirement of the corpus product in the voice format, so that the converted text information is more accurate.

Similarly, before the query requirement of the corpus product in the picture format is converted into the query requirement of the corpus product in the text format, a series of preprocessing operations, such as grayscale processing, denoising and other operations, can be performed on the query requirement of the corpus product in the picture format to ensure that the converted text information is more accurate.

And a substep S202, calculating a weighted value of each word in the M words according to a preset part-of-speech weight distribution standard.

It should be understood that, in an actual query process, pronouns, sighs, conjunctions, vocabularies, etc. are usually not helpful to the query, so that weights assigned to such words should be low, and verbs capable of representing multimedia formats of corpus products required by users (for example, "listen" may refer to multimedia formats of corpus products as audio, "see" as video, "and" read "as text), adjectives capable of representing language formats of corpus products (for example," english "and" chinese "), names capable of representing categories of corpus products, and higher weights assigned to them.

And a substep S203 of traversing the N words, comparing the weight values of the traversed current words with a preset weight threshold value, and filtering out the words with the weight values larger than the weight threshold value to obtain the N keywords.

It should be understood that the above is only a specific implementation manner for extracting the keywords from the query requirement of the corpus product, and the technical solution of the present invention is not limited at all, and in practical applications, those skilled in the art can set the keywords as needed, and the present invention is not limited herein.

And S30, determining the characteristic information corresponding to the corpus products required by the user according to the N key words.

Specifically, the feature information corresponding to the corpus product is a key feature that can identify the corpus product.

For example, by performing the above extraction operation, if the obtained keywords are "listening", "english", and "little prince", it can be determined that the corpus product required by the user should be the audio data according to the keyword "listening", it can be determined that the corpus product needs to be the english version according to the keyword "english", and it can be determined that the type to which the corpus product belongs is the fairy tale category according to the keyword "little prince".

It should be understood that, in practical applications, in order to determine the feature information corresponding to the corpus products according to the keywords, the corresponding relationships between the different keywords and the features of the different corpus products may be pre-constructed, and then determined according to the pre-constructed mapping relationships.

It should be understood that the above description is only for illustrative purposes, and the technical solution of the present invention is not limited in any way, and in practical applications, those skilled in the art can make settings according to the needs, and the present invention is not limited herein.

And S40, searching corpus information which accords with any feature in the feature information from the corpus according to the feature information.

Specifically, the corpus described in this embodiment is a pre-constructed corpus capable of storing a plurality of types of corpus information such as text, picture, audio, video, and the like.

Furthermore, in order to ensure that a multidimensional query, i.e. a query of a plurality of features, can be performed on the corpus on the basis of the determined feature information. The corpus constructed in this embodiment is composed of an elastic search (a search server based on a full-text search engine, abbreviated as ES) as a core, and a montdb (a database based on distributed file storage) and a MySql (a relational database management system) as auxiliary materials.

Specifically, in the ES, the last three digits of the identification number (hereinafter referred to as "ID") of the corpus information collected from each big data platform are used as index (index), the ID of the corpus information is used as type (type), and a plurality of index names such as corpus name, corpus description, corpus tag, language direction, price, sales volume, etc. are created in the data table.

And then, storing the specific language material information by adopting MongoDB, and establishing a corresponding relation between the index of the ES and each language material information in the MongoDB.

Meanwhile, the original data information of the corpus information (i.e. without any processing operations such as the above classification, tag addition, etc.) is stored in the MySql and is in one-to-one correspondence with the index information in the ES.

Therefore, when corpus information is queried from a corpus according to the determined feature information, features in the feature information are directly brought into a query statement written by using a DSL (general-purpose large-data query language DSL (digital subscriber line) of ES (ES), which is used for realizing retrieval and analysis of mass machine data).

For example, fuzzy search matchQuery (…), prefix search (…), filter like termFilter, wizardFilter, etc.

For example, when the subject name "queen" of the determined corpus product to be queried is used as query information, the corpus information queried from the corpus may be corpus information related to the "queen" in any language version and any multimedia format.

And S50, processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain a corpus product with all the features in the feature information, and pushing the corpus product to the user.

Specifically, in practical applications, the operation of obtaining a corpus product having all the features in the feature information can be roughly implemented by the following sub-steps:

firstly, according to the corresponding characteristics of each corpus information, screening out the corpus information with the most characteristics, and taking the corpus information as an initial corpus product;

then, determining features to be integrated according to the feature information and the features corresponding to the initial corpus products;

then, the corpus information corresponding to the features to be integrated is extracted from the corpus information except the initial corpus product;

and finally, combining the extracted corpus information with the initial corpus product to obtain a corpus product with all the characteristics in the characteristic information.

For ease of understanding, the above steps are exemplified below:

for example, the corpus information directly queried from the corpus does not include all the above feature information, and the queried corpus information with the most features is english little prince speech with only audio information, and a chinese version of chinese little prince word novel.

The processing may specifically be to perform language conversion on the queen son word novel of the chinese version to a corresponding english version;

and then, combining the audio information of the English princess with the English version of the word novel, calibrating to ensure that the played audio content and the English version of the word novel can be played synchronously, and highlighting the corresponding words in the process of voice playing for the convenience of the user to check.

The above description is only an example, and the technical solution of the present invention is not limited at all.

In addition, it is worth mentioning that when the corpus information of multiple formats is combined to obtain the corpus product required by the user, tika (a public tool for extracting document contents, introduced by Apache) can be selected, and the existing parsing class library is utilized to detect and extract the metadata and the structured contents from the documents of different formats (such as HTML, PDF, doc).

It is easy to find out through the above description that the method for recommending a corpus product provided in this embodiment extracts a corpus product query requirement provided by a user from a corpus product query request triggered by the user, further determines feature information corresponding to a corpus product required by the user according to N keywords extracted from the corpus product query requirement, then finds out corpus information according to any feature in the determined feature information according to the determined feature information, and finally processes the queried corpus according to the determined feature information and features corresponding to the queried corpus information, so as to obtain a corpus product having the feature information, thereby making the finally screened corpus information be the corpus information meeting the actual needs of the user, and further greatly improving the recommendation accuracy of the corpus product.

In addition, it is worth mentioning that in practical application, the corpus product generated according to the corpus product query requirement provided by the user may need to be charged, so that when the obtained corpus product is recommended to the user, it can be firstly determined whether the corpus product needs to be charged.

Correspondingly, if the fact that the corpus product does not need to be charged is determined, the corpus product is directly pushed to the user; if the fact that the corpus products need to be charged is determined, a charging notice can be issued to the user, then feedback made by the user is monitored, if an instruction of agreeing to charge deduction made by the user is received, the cost needed by the corpus products is deducted from a payment account preset by the user, and then the corpus products are pushed to the user.

Through the operation mode, the user can determine whether the corpus product needs to be paid or not according to the actual situation, and the user experience is greatly improved while the corpus product recommendation accuracy is ensured.

Further, in order to better improve user experience, when an instruction fed back by a user is not agreed, time is consumed, in order to keep the amount of users using the corpus as much as possible and avoid user loss, a mode of obtaining the corpus products free of charge can be recommended to the user, for example, related information of the corpus is shared to a preset number of chat groups or a preset number of new users is invited, so that user loss is avoided, and popularization of the corpus can be achieved.

In addition, in order to better maintain and manage the corpus information in the corpus and further enable the corpus finished product synthesized according to the corpus information in the corpus to better fit with the user requirements, the corpus finished product is pushed to the user, then the feedback information submitted by the user can be further received, and then the corpus information in the corpus is maintained and managed according to the feedback information.

Referring to fig. 4, fig. 4 is a flowchart illustrating a second embodiment of a corpus product recommendation method according to the present invention.

Based on the first embodiment, before step S40, the method for recommending a corpus product in this embodiment further includes:

step S00, detecting whether the characteristic information contains the characteristics of identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product. If the feature information contains the features of identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product through detection, executing a step S40; otherwise, step S01 is executed.

And S01, acquiring a historical query record of the user in a preset period, and determining the characteristic information according to the historical query record and the N keywords.

In practical applications, the operation in step S01 can be specifically realized through the following sub-steps:

(1) And acquiring the historical query record of the user in a preset period.

Specifically, the historical query records mainly record the type, feature information, and the like of the corpus products that the user has queried before (for example, in the last month), so that the user's preference can be determined according to the historical query records.

In addition, in the embodiment, the acquired historical query record is limited to be the content of a preset period, for example, the last week, so that the information in the acquired historical query record has a higher reference value.

(2) And analyzing the historical query records by utilizing a big data analysis technology to determine the query requirement of the user at the current moment.

Specifically, the big data score technology is used for analyzing the historical query records, specifically, by counting which keywords in the historical query records are used with high frequency, categories to which the corpus products frequently searched by the user recently belong, language formats of the corpus products, and multimedia formats.

(3) And taking the query requirement at the current moment as a first element, and taking the N keywords as a second element.

(4) And determining the category of the corpus product, the language format of the corpus product and the multimedia style characteristic of the corpus product according to the first element and the second element.

Further, in practical application, in order to make the feature information of the corpus product finally determined more accurate, that is, the corpus product recommended to the user according to the determined feature information more meets the user requirement, when the historical query record is obtained, the biological feature information of the user, preferably, the human face feature information and the voiceprint feature information, can be obtained, so that the gender and the approximate age of the user can be determined through the analysis of the biological feature information, and the content concerned by the user of the gender in the age interval can be screened out, so that the corpus product finally recommended to the user more meets the user requirement.

It should be noted that, in practical applications, when most users use the corpus, perfect personal information cannot be filled, so that the actual age, sex, and the like of the user cannot be obtained from the personal information.

Further, in practical application, for convenience, rapidness and accuracy, the age and the sex of the user are determined according to the acquired biological characteristic information, and a big data analysis model can be constructed in advance by using a big data analysis technology and assisting a machine learning algorithm. Then, after the biometric information is acquired, the acquired biometric information is directly input into the analysis model, so that the age and the gender of the user can be acquired.

Regarding the construction of the big data analysis model, roughly the following may be made:

firstly, acquiring biological characteristic information of users with known sexes and ages from each big data platform;

then, the biological characteristic information of the known gender and age is used as sample data and input into a big data analysis training model for training until the age and the gender of the user corresponding to the sample data can be accurately output after the trained sample data is input, and then the training can be finished.

Accordingly, the big data analysis training model at the moment is the needed big data analysis model.

In addition, in practical application, the selected machine learning algorithm is preferably a convolutional neural network algorithm.

Because the convolutional neural network algorithm is mature, in a specific implementation, a person skilled in the art can check related data of the convolutional neural network algorithm by himself, and details are not described here.

For example, when the query requirement of the corpus product provided by the user is only two words of "novel speech", after analyzing the biometric information of the user by using big data analysis technology, it is determined that the user is a female aged about 30 years old.

In addition, according to the acquired historical query records of the user, the fact that the user frequently queries the hallucinography-type cartoon novel is found.

Therefore, according to the information, the corpus product which the user needs to inquire is a mythic type cartoon novel which is suitable for women aged 30 or so to read.

Accordingly, the determined characteristic information may be: 30 years old, female, hallucination, animation, novel.

It should be noted that, the foregoing is only an example, and the technical solution of the present invention is not limited at all, and in the specific implementation, a person skilled in the art may perform setting according to needs, and details are not described here.

According to the method for recommending the corpus product, before searching the corpus information which meets any one of the characteristics in the characteristic information from the corpus according to the characteristic information, whether the characteristic information comprises the characteristics for identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product or not is determined by detecting whether the characteristic information comprises the characteristics for identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product or not, the characteristics are determined by reacquiring the parameter information, and then the searching operation of the corpus information is performed, so that the accuracy of the characteristic information for searching the corpus information is effectively guaranteed, and the subsequently obtained corpus product can meet the actual requirements of users.

In addition, an embodiment of the present invention further provides a storage medium, where a recommendation program of a corpus product is stored on the storage medium, and the recommendation program of the corpus product, when executed by a processor, implements the steps of the recommendation method of the corpus product as described above.

Referring to fig. 5, fig. 5 is a block diagram illustrating a first embodiment of a corpus product recommendation device according to the present invention.

As shown in fig. 5, the apparatus for recommending corpus products according to the embodiment of the present invention includes: an obtaining module 5001, an extracting module 5002, a determining module 5003, a searching module 5004 and a generating module 5005.

The obtaining module 5001 is configured to receive a corpus product query request triggered by a user, and obtain a corpus product query requirement provided by the user according to the corpus product query request; an extraction module 5002, configured to perform keyword extraction processing on the corpus product query requirement to obtain N keywords, where N is an integer greater than or equal to 1; the determining module 5003 is configured to determine, according to the N keywords, feature information corresponding to the corpus product required by the user; the searching module 5004 is configured to search corpus information conforming to any feature in the feature information from a corpus according to the feature information; the generating module 5005 is configured to process each corpus information according to the feature information and features corresponding to each corpus information, obtain corpus products having all features in the feature information, and push the corpus products to the user.

To facilitate understanding of the operation of the extraction module 5002 in extracting keywords from the corpus product query request, a specific implementation is given below, which is roughly as follows:

firstly, performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words;

then, calculating a weight value of each word in the M words according to a preset part-of-speech weight distribution standard;

and finally, traversing the N words, comparing the weight value of the traversed current word with a preset weight threshold value, and filtering out the words with the weight values larger than the weight threshold value to obtain the N keywords.

It should be understood that M should be an integer less than or equal to N in practical applications.

In addition, it is worth mentioning that, in practical application, the corpus product query requirement provided by the user may be different in format due to different operation keys corresponding to the corpus product query request triggered by the user, so as to ensure that the extraction module 5002 can smoothly perform word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words, before the extraction module 5002 performs the above operations, the extraction module 5002 is further configured to: and determining the format of the query requirement of the corpus product.

Correspondingly, if the corpus product query requirement is determined to be in a voice format, converting the corpus product query requirement in the voice format into a corpus product query requirement in a text format by using a voice recognition technology; and if the corpus product query requirement is determined to be in a picture format, converting the corpus product query requirement in the picture format into a corpus product query requirement in a text format by using an optical character recognition technology.

Accordingly, in order to facilitate understanding of the operation of performing word segmentation and part-of-speech tagging on the query requirement of the corpus product to obtain M words, a specific implementation manner is provided in this embodiment, which is substantially as follows:

It should be understood that the above is only a specific implementation manner, and the technical solution of the present invention is not limited in any way, and in practical applications, those skilled in the art can perform setting according to needs, and the present invention is not limited herein.

In addition, to facilitate understanding of the operation of the generating module 5005 for generating the corpus product required by the user, a specific implementation is given below, which is roughly as follows:

firstly, screening out corpus information with the most characteristics according to the characteristics corresponding to all corpus information, and taking the corpus information as an initial corpus product;

It should be understood that the above is only a specific implementation manner, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the implementation manner as needed, and the present invention is not limited thereto.

Through the above description, it is not difficult to find that the apparatus for recommending corpus products provided in this embodiment extracts corpus product query requirements provided by a user from a corpus product query request triggered by the user, and then determines feature information corresponding to a corpus product required by the user according to N keywords extracted from the corpus product query requirements, and then finds corpus information according to any feature in the determined feature information according to the determined feature information, and finally processes the queried corpus according to the determined feature information and features corresponding to the queried corpus information, so as to obtain a corpus product having the feature information, thereby making the finally screened corpus information be the corpus information meeting the actual needs of the user, and further greatly improving the recommendation accuracy of the corpus product.

Through the operation mode, the user can determine whether to pay to obtain the corpus product according to actual conditions, and the user experience is greatly improved while the recommendation accuracy of the corpus product is ensured.

In addition, in order to better maintain and manage the corpus information in the corpus and further enable the corpus finished product synthesized according to the corpus information in the corpus to better fit the user requirements, the feedback information submitted by the user can be further received after the corpus product is pushed to the user, and then the corpus information in the corpus is maintained and managed according to the feedback information.

It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.

In addition, the technical details that are not described in detail in this embodiment may refer to the recommendation method for corpus products provided in any embodiment of the present invention, and are not described herein again.

Based on the first embodiment of the apparatus for recommending corpus products, a second embodiment of the apparatus for recommending corpus products is provided.

In this embodiment, the apparatus for recommending corpus products further includes: and a detection module.

The detection module is configured to detect whether the feature information includes a feature that identifies a category to which the corpus product belongs, identifies a language format of the corpus product, and identifies a multimedia style of the corpus product.

Correspondingly, if the characteristic information is determined to contain the characteristics of identifying the category of the corpus product, identifying the language format of the corpus product and identifying the multimedia style of the corpus product through detection, the searching module is triggered to execute the operation of searching the corpus information which accords with any characteristic in the characteristic information from the corpus according to the characteristic information.

Otherwise (without any or several of the above features), triggering the search module to perform the following steps:

firstly, acquiring a historical query record of the user in a preset period;

then, analyzing the historical query records by utilizing a big data analysis technology to determine the query requirement of the user at the current moment;

then, taking the query requirement at the current moment as a first element, and taking the N keywords as a second element;

and finally, determining the category of the corpus product, the language format of the corpus product and the multimedia style characteristic of the corpus product according to the first element and the second element.

It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the technical solution as needed, and the present invention is not limited in this respect.

It is not difficult to find out through the above description that the apparatus for recommending corpus products according to this embodiment detects whether the feature information includes the feature for identifying the category to which the corpus product belongs, the language format of the corpus product, and the multimedia style of the corpus product before searching the corpus information according to any feature in the feature information from the corpus, and then determines whether to perform the search operation of the corpus information according to the feature information determined at the beginning or to obtain the parameter information again to determine the above feature, and then performs the search operation of the corpus information, thereby effectively ensuring the accuracy of the feature information used for searching the corpus information, and enabling the subsequently obtained corpus products to better meet the actual requirements of the user.

Further, it is to be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g. a Read Only Memory (ROM)/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.

Claims

1. A method for recommending corpus products, the method comprising:

performing keyword extraction processing on the query requirement of the corpus product to obtain N keywords, wherein N is an integer greater than or equal to 1;

processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain a corpus product with all features in the feature information, and pushing the corpus product to the user;

wherein, the step of processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain the corpus product having all the features in the feature information comprises:

according to the characteristics corresponding to each corpus information, screening out the corpus information with the most characteristics, and taking the corpus information as an initial corpus product;

2. The method according to claim 1, wherein said step of extracting keywords from said corpus product query request to obtain N keywords comprises:

traversing the M words, comparing the weight value of the traversed current word with a preset weight threshold value, and filtering out the words with the weight values larger than the weight threshold value to obtain the N keywords.

3. The method as claimed in claim 2, wherein before the step of performing word segmentation and part-of-speech tagging on the corpus product query requirement to obtain M words, the method further comprises:

determining a format of the corpus product query requirement;

4. The method according to claim 2, wherein before the step of searching corpus information corresponding to any feature in the feature information from a corpus according to the feature information, the method further comprises:

otherwise, executing the following steps:

acquiring a historical query record of the user in a preset period;

analyzing the historical query record by utilizing a big data analysis technology to determine the query requirement of the user at the current moment;

5. The method of any of claims 1 to 4, wherein prior to the step of pushing the corpus product to the user, the method further comprises:

judging whether the corpus product needs to be charged or not;

6. The method according to any of the claims 1 to 4, wherein after the step of pushing the corpus product to the user, the method further comprises:

7. An apparatus for recommending corpus products, the apparatus comprising:

the system comprises an acquisition module, a query module and a query module, wherein the acquisition module is used for receiving a corpus product query request triggered by a user and acquiring a corpus product query requirement provided by the user according to the corpus product query request;

the extraction module is used for extracting keywords from the corpus product query requirement to obtain N keywords, wherein N is an integer greater than or equal to 1;

the generating module is used for processing each corpus information according to the feature information and the feature corresponding to each corpus information to obtain a corpus product with all features in the feature information, and pushing the corpus product to the user;

the generating module is further configured to screen out corpus information with the most features according to the features corresponding to the corpus information, and use the corpus information as an initial corpus product; determining features to be integrated according to the feature information and features corresponding to the initial corpus products; extracting the corpus information corresponding to the features to be integrated from the corpus information except the initial corpus product; and combining the extracted corpus information with the initial corpus product to obtain a corpus product with all the characteristics in the characteristic information.

8. A corpus product recommendation apparatus, characterized in that the apparatus comprises: a memory, a processor and a corpus product recommendation program stored on the memory and executable on the processor, the corpus product recommendation program being configured to implement the steps of the corpus product recommendation method according to any one of claims 1 to 6.

9. A storage medium having stored thereon a corpus product recommendation program, which when executed by a processor implements the steps of the corpus product recommendation method according to any one of claims 1 to 6.