CN110297880A - Recommended method, device, equipment and the storage medium of corpus product - Google Patents

Recommended method, device, equipment and the storage medium of corpus product Download PDF

Info

Publication number
CN110297880A
CN110297880A CN201910433178.8A CN201910433178A CN110297880A CN 110297880 A CN110297880 A CN 110297880A CN 201910433178 A CN201910433178 A CN 201910433178A CN 110297880 A CN110297880 A CN 110297880A
Authority
CN
China
Prior art keywords
corpus
product
information
user
characteristic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910433178.8A
Other languages
Chinese (zh)
Other versions
CN110297880B (en
Inventor
韩亚洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910433178.8A priority Critical patent/CN110297880B/en
Publication of CN110297880A publication Critical patent/CN110297880A/en
Application granted granted Critical
Publication of CN110297880B publication Critical patent/CN110297880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to big data analysis technical fields, disclose recommended method, device, equipment and the storage medium of a kind of corpus product.This method comprises: receiving the corpus product inquiry request of user's triggering, the corpus product query demand that user provides is obtained according to corpus product inquiry request;Keyword extraction processing is carried out to corpus product query demand, obtains N number of keyword, N is the integer more than or equal to 1;According to N number of keyword, the corresponding characteristic information of corpus product that user needs is determined;According to characteristic information, meet the corpus information of any feature in the characteristic information from lookup in corpus;According to characteristic information and the corresponding feature of each corpus information, each corpus information is handled, the corpus product for having whole features in characteristic information is obtained, corpus product is pushed to user.By the above-mentioned means, making the corpus product recommended for user be to meet user's actual need, to greatly improve the recommendation accuracy rate of corpus product.

Description

Recommended method, device, equipment and the storage medium of corpus product
Technical field
The present invention relates to big data analysis technical field more particularly to a kind of recommended method, device, the equipment of corpus product And storage medium.
Background technique
Traditional corpus refers to the extensive e-text library through scientific sampling and processing.With the development of the times, mesh Preceding corpus be not confined to only store the corpus information of text type, and it is each can also to store picture, audio, video etc. The corpus information of seed type.
Although the corpus information stored in existing corpus is many kinds of, substantial amounts.But existing corpus is looked into Inquiry mode can not identify comprehensively the query demand of user, so that the corpus information screened and it is unsatisfactory for user's actual need, The recommendation accuracy rate of corpus product is low.
So it is urgent to provide a kind of according to user's actual need, recommend the method for corpus product, for user to promote corpus The recommendation accuracy rate of product.
Above content is only used to facilitate the understanding of the technical scheme, and is not represented and is recognized that above content is existing skill Art.
Summary of the invention
The main purpose of the present invention is to provide recommended method, device, equipment and the storage medium of a kind of corpus product, purports According to user's actual need, recommend corpus product for user, to promote the recommendation accuracy rate of corpus product.
To achieve the above object, the present invention provides a kind of recommended method of corpus product, the method includes following steps It is rapid:
The corpus product inquiry request for receiving user's triggering obtains the user according to the corpus product inquiry request and mentions The corpus product query demand of confession;
Keyword extraction processing is carried out to the corpus product query demand, obtains N number of keyword, N is more than or equal to 1 Integer;
According to N number of keyword, the corresponding characteristic information of corpus product that the user needs is determined;
According to the characteristic information, meet the corpus information of any feature in the characteristic information from lookup in corpus;
According to the characteristic information and the corresponding feature of each corpus information, each corpus information is handled, is had The corpus product of whole features, is pushed to the user for the corpus product in the characteristic information.
Preferably, described that keyword extraction processing is carried out to the corpus product query demand, obtain the step of N number of keyword Suddenly, comprising:
Participle and part-of-speech tagging processing are carried out to the corpus product query demand, obtain M word, M is less than or equal to N's Integer;
According to preset part of speech weight distribution standard, the weighted value of each word in the M word is calculated;
N number of word is traversed, the weighted value of the current word traversed is compared with preset weight threshold, The word that weighted value is greater than the weight threshold is filtered out, N number of keyword is obtained.
Preferably, the corpus product query demand carries out participle and part-of-speech tagging processing, the step of obtaining M word it Before, the method also includes:
Determine the format of the corpus product query demand;
If the corpus product query demand is phonetic matrix, speech recognition technology is utilized, by the corpus of phonetic matrix Product query demand is converted to the corpus product query demand of text formatting;
If the corpus product query demand is picture format, optical character recognition technology is utilized, by picture format Corpus product query demand is converted to the corpus product query demand of text formatting;
Wherein, described that participle and part-of-speech tagging processing are carried out to the corpus product query demand, obtain the step of M word Suddenly, comprising:
According to the punctuation mark in the corpus product query demand of the text formatting, the corpus of the text formatting is produced Product query demand carries out subordinate sentence, obtains sentence to be segmented;
Maximum reverse matching cutting is carried out to the sentence to be segmented, determines the M word according to Custom Dictionaries;
According to preset part of speech standard information, part-of-speech tagging is carried out to the M word.
Preferably, described according to the characteristic information, meet any feature in the characteristic information from searching in corpus Corpus information the step of before, the method also includes:
Whether detect in the characteristic information includes the mark corpus product generic, the mark corpus product The feature of language format, the mark corpus product Multimedia Style;
If in the characteristic information including the mark corpus product generic, the mark corpus Product Language lattice The feature of formula, the mark corpus product Multimedia Style, thens follow the steps: according to the characteristic information, looking into from corpus Look for the operation for meeting the corpus information of any feature in the characteristic information;
Otherwise, it thens follow the steps:
Obtain the historical query record of the user in predetermined period;
Using big data analysis technology, historical query record is analyzed, determines the current time of the user Query demand;
Using the query demand at the current time as the first element, using N number of keyword as the second element;
According to the first element and second element, determines and identify described in the corpus product generic, mark The feature of corpus Product Language format, the mark corpus product Multimedia Style.
Preferably, described according to the characteristic information and the corresponding feature of each corpus information, at each corpus information Reason obtains having the step of corpus product of whole features in the characteristic information, comprising:
According to the corresponding feature of each corpus information, filters out and have the most corpus information of feature, which is made For initial corpus product;
According to the characteristic information and the corresponding feature of the initial corpus product, determine to integration characteristic;
It is extracted from the corpus information in addition to the initial corpus product described to the corresponding corpus information of integration characteristic;
The corpus information extracted and the initial corpus product are combined, obtain having complete in the characteristic information The corpus product of portion's feature.
Preferably, before the described the step of corpus product is pushed to the user, the method also includes:
Judge whether the corpus product needs to charge;
If the corpus product does not need to charge, then follow the steps: the corpus product is pushed to the behaviour of the user Make;
If the corpus product needs to charge, notics of charge is issued to the user, and do receiving the user After the instruction that agreement out is deducted fees, the expense that the corpus product needs is deducted from the payment account of the user preset, it will The corpus product is pushed to the user.
Preferably, after the described the step of corpus product is pushed to the user, the method also includes:
Receive the feedback information that the user submits, according to the feedback information to the corpus information in the corpus into Row maintenance.
In addition, to achieve the above object, the present invention also proposes a kind of recommendation apparatus of corpus product, described device includes:
Module is obtained, for receiving the corpus product inquiry request of user's triggering, according to the corpus product inquiry request Obtain the corpus product query demand that the user provides;
Extraction module obtains N number of keyword, N for carrying out keyword extraction processing to the corpus product query demand For the integer more than or equal to 1;
Determining module, for according to N number of keyword, determining that the corresponding feature of corpus product that the user needs is believed Breath;
Searching module, for meeting any spy in the characteristic information from searching in corpus according to the characteristic information The corpus information of sign;
Generation module, for being carried out to each corpus information according to the characteristic information and the corresponding feature of each corpus information Processing obtains the corpus product for having whole features in the characteristic information, the corpus product is pushed to the user.
In addition, to achieve the above object, the present invention also proposes a kind of recommendation apparatus of corpus product, the equipment includes: Memory, processor and the recommended program for being stored in the corpus product that can be run on the memory and on the processor, The recommended program of the corpus product is arranged for carrying out the step of recommended method of corpus product as described above.
In addition, to achieve the above object, the present invention also proposes a kind of storage medium, corpus is stored on the storage medium The recommended program of the recommended program of product, the corpus product realizes corpus product as described above when being executed by processor The step of recommended method.
The suggested design of corpus product provided by the invention, by being extracted from the corpus product inquiry request that user triggers The corpus product query demand that user provides, and then according to the N number of keyword extracted from corpus product query demand come really The corresponding characteristic information of corpus product for determining user's needs finds out symbol then according to determining characteristic information from corpus The corpus information of any feature in determining characteristic information is closed, finally according to each corpus information for determining characteristic information and inquiring Corresponding feature handles the corpus inquired, can obtain the corpus product for having features described above information, so that The corpus information that finishing screen is selected is the recommendation for meeting the corpus information of user's actual need, and then substantially increasing corpus product Accuracy rate.
Detailed description of the invention
Fig. 1 is the structural representation of the recommendation apparatus of the corpus product for the hardware running environment that the embodiment of the present invention is related to Figure;
Fig. 2 is the flow diagram of the recommended method first embodiment of corpus product of the present invention;
Fig. 3 is the specific implementation flow schematic diagram of step S20 in the recommended method of corpus product of the present invention;
Fig. 4 is the flow diagram of the recommended method second embodiment of corpus product of the present invention;
Fig. 5 is the structural block diagram of the recommendation apparatus first embodiment of corpus product of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
Referring to Fig.1, Fig. 1 is the recommendation apparatus knot of the corpus product for the hardware running environment that the embodiment of the present invention is related to Structure schematic diagram.
As shown in Figure 1, the recommendation apparatus of the corpus product may include: processor 1001, such as central processing unit (Central Processing Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005.Wherein, communication bus 1002 is for realizing the connection communication between these components.User interface 1003 may include display Shield (Display), input unit such as keyboard (Keyboard), optional user interface 1003 can also include that the wired of standard connects Mouth, wireless interface.Network interface 1004 optionally may include standard wireline interface and wireless interface (such as Wireless Fidelity (WIreless-FIdelity, WI-FI) interface).Memory 1005 can be the random access memory (Random of high speed Access Memory, RAM) memory, be also possible to stable nonvolatile memory (Non-Volatile Memory, ), such as magnetic disk storage NVM.Memory 1005 optionally can also be the storage device independently of aforementioned processor 1001.
It will be understood by those skilled in the art that structure shown in Fig. 1 is not constituted to the recommendation apparatus of corpus product It limits, may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
As shown in Figure 1, as may include operating system, network communication mould in a kind of memory 1005 of storage medium The recommended program of block, Subscriber Interface Module SIM and corpus product.
In the recommendation apparatus of corpus product shown in Fig. 1, network interface 1004 is mainly used for carrying out with network server Data communication;User interface 1003 is mainly used for carrying out data interaction with user;In the recommendation apparatus of corpus product of the present invention Processor 1001, memory 1005 can be set in the recommendation apparatus of corpus product, and the recommendation apparatus of the corpus product is logical It crosses processor 1001 and calls the recommended program of the corpus product stored in memory 1005, and execute provided in an embodiment of the present invention The recommended method of corpus product.
The embodiment of the invention provides a kind of recommended methods of corpus product, and referring to Fig. 2, Fig. 2 is a kind of corpus of the present invention The flow diagram of the recommended method first embodiment of product.
In the present embodiment, the recommended method of the corpus product the following steps are included:
Step S10 receives the corpus product inquiry request of user's triggering, obtains institute according to the corpus product inquiry request The corpus product query demand of user's offer is provided.
Specifically, the user that the executing subject of the present embodiment can be progress corpus product inquiry operation uses any Terminal device, such as smart phone, tablet computer, personal computer etc., will not enumerate herein, also with no restrictions to this.
Correspondingly, the mode for triggering the corpus product inquiry request specifically can be user and open on terminal device Then the corpus inquiry application (Application, App) that the corpus transaction platform of installation provides is looked by clicking corpus The Text Entry being arranged on a certain function button on App, such as corpus inquiry App or voice input key are ask, and also or It is generated after the operation buttons such as person's picture input key.
Correspondingly, the corpus product inquiry request got, then when can be the user and operationally stating function button The information of input.
Step S20 carries out keyword extraction processing to the corpus product query demand, obtains N number of keyword.
It should be understood that in practical applications, the corpus product query demand that user provides at least will include a word, Several words, a word or more information.Thus, keyword extraction processing is being carried out to the corpus product query demand Afterwards, the N number of keyword obtained is at least one, i.e. the value of N should be greater than or equal to 1 integer.
In addition, carrying out keyword extraction processing to the corpus product query demand in order to facilitate understanding, N number of key is obtained The operation of word provides a kind of specific extracting mode in the present embodiment, substantially realization step as shown in figure 3, below in conjunction with Fig. 3 into Row illustrates.
Sub-step S201 carries out participle to the corpus product query demand and part-of-speech tagging is handled, obtains M word.
It should be understood that since the N number of keyword finally determined is chosen from M obtained word, thus The value of M cannot necessarily be greater than the value of N in practical applications, i.e. M should be less than or equal to the integer of N.
In addition, in the present embodiment, the described participle that the corpus product query demand is carried out in sub-step S201 It is handled with part-of-speech tagging, specifically:
Firstly, according to the punctuation mark in the corpus product query demand, such as comma, fullstop, the corpus is produced Product query demand carries out subordinate sentence, obtains sentence to be segmented.
Such as the content in the corpus product query demand of user's offer is that " hello, I wants to listen the Xiao Wang of English Son.", system when the current character traversed is ", ", then carries out subordinate sentence by traversing to the content in above-mentioned sentence, By the content before ", " that traverses as a sentence (the referred to as first sentence to be segmented) to be segmented, it is subsequent to be then followed by traversal Content, when traverse "." when, carry out subordinate sentence again, by a upper punctuation mark ", " and current punctuation mark "." between it is interior Hold and is used as another sentence (the referred to as second sentence to be segmented) to be segmented.
Then, maximum reverse matching cutting is carried out to the sentence to be segmented, determines the M according to Custom Dictionaries Word.
Specifically, so-called " maximum reverse matching cutting " refers to when treating participle sentence and carrying out cutting, according to from the right side It turns left and starts cutting.
And above-mentioned described Custom Dictionaries refer to the existing phrase for collecting typing from each big data platform, dictionary in advance, The Custom Dictionaries contain the existing various forms of words being likely to occur substantially.
In order to make it easy to understand, herein using maximum reverse matching slit mode, to obtained in the example above second wait divide Word sentence carries out cutting.
It is assumed that the word recorded in customized dictionary D has: D={ " I ", " seeing ", " reading ", " listening ", " Chinese ", " English ", " 10,000 why ", " little prince " ... }.
Maximum reverse matching cutting is being carried out to the described second sentence (S={ " I wants to listen English little prince " }) to be segmented When operations, a maximum fractionation length, such as 6 are first defined, are then divided since being turned left the right side:
(1) the candidate word W1 taken out from S is " I wants to listen English ";
(2) word recorded in Custom Dictionaries D is searched, candidate word W1 is found not in Custom Dictionaries D, by candidate word W1 Leftmost first character removes, and obtains candidate word W2 " wanting to listen English ";
(3) word recorded in Custom Dictionaries D is searched, candidate word W2 is found not in Custom Dictionaries D, by candidate word W2 Leftmost first character removes, and obtains candidate word W3 " listening English ";
(4) word recorded in Custom Dictionaries D is searched, candidate word W3 is found not in Custom Dictionaries D, by candidate word W3 Leftmost first character removes, and obtains candidate word W4 " English ";
(5) search the word that records in Custom Dictionaries D, find candidate word W4 in Custom Dictionaries D, just need at this time by Candidate word W4 is splitted out from S, and S becomes " I wants to listen little prince ";
(6) according to segmentation length 6, the content in S is intercepted again, obtains candidate word W5 " I wants to listen little prince ";
(7) operation of the step (1) into step (6) is repeated, until completing whole cuttings to the content in S.
According to above-mentioned slicing operation, the word being syncopated as from the second sentence " I wants to listen English little prince " to be segmented are as follows: I, listen, English, little prince.
It should be understood that being given above only a kind of specific participle mode, not to technical solution of the present invention Any restriction is constituted, in practical applications, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
In addition, it is noted that in practical applications, the administrative staff of corpus can also look into according to the history of user Consultation record is updated the Custom Dictionaries.
Finally, carrying out part-of-speech tagging to the M word according to preset part of speech standard information.
It should be noted that part of speech standard information described in the present embodiment specifically refers to Chinese part of speech standard information, Which class word of concrete regulation is noun in the part of speech standard information, which class word is nounoun pronoun, which class word is verb, which class word is shape Hold word, which class word is time word etc., be will not enumerate herein.
Still by taking 4 words that above-mentioned fractionation obtains as an example, then part of speech is carried out to 4 words according to the part of speech standard information Result after mark can be such that " I "<pronoun>, " listening "<verb>, " English "<adjective>, " little prince "<noun>.
It should be understood that being given above only a kind of labeling form, technical solution of the present invention is not constituted Any restriction, in practical applications, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
In addition, it is noted that due in practical applications, corpus product query demand that the user provides can be because The corresponding operation button of corpus product inquiry request that it is triggered is different, format and it is different.
Such as when the key of the user's operation is Text Entry, the corpus product query demand got has Body is text formatting.
Also such as, when the key of the user's operation is that voice inputs key, the corpus product inquiry got is needed Seek specially phonetic matrix.
It such as, is also that picture inputs on time in the key of the user's operation, the corpus product query demand got Specially picture format.
And the above-mentioned participle needed for corpus product inquiry provided and the processing of part of speech standard are in text formatting On the basis of carry out, thus participle and part of speech mark are carried out to the corpus product query demand in order to guarantee smoothly to execute Note processing, obtains the operation of M word, before executing sub-step S201, can first determine the corpus product query demand Then format is adaptively adjusted according to the format of the corpus product query demand.
Such as, however, it is determined that the corpus product query demand is phonetic matrix, then first with speech recognition technology, by language The corpus product query demand of sound format is converted to the corpus product query demand of text formatting, then executes sub-step again S201;If it is determined that the corpus product query demand is picture format, then first with optical character identification (Optical The corpus product query demand of picture format is converted to the language of text formatting by Character Recognition, OCR technique Expect product query demand, then executes sub-step S201 again;If the corpus product query demand is text formatting, directly hold Row sub-step S201.
That is, the operation in the sub-step S201, substantially:
According to the punctuation mark in the corpus product query demand of the text formatting, the corpus of the text formatting is produced Product query demand carries out subordinate sentence, obtains sentence to be segmented;
According to Custom Dictionaries, maximum reverse matching cutting is carried out to the sentence to be segmented, obtains the M word;
According to preset part of speech standard information, part-of-speech tagging is carried out to the M word.
Further, it for the characteristic information for guaranteeing subsequent determination reference value with higher, is mentioned carrying out keyword Before extract operation, first Text Pretreatment operation can be carried out by the corpus product query demand to text formatting.
Such as remove stop words, that is, remove and contain in feedback information such as:, the word of not no practical significance.
Also such as, remove invalid spcial character, such as emoticon, various punctuation marks.
Correspondingly, in the corpus product query demand that the corpus product query demand of phonetic matrix is converted to text formatting Before, equally first series of preprocessing operation, such as filtering, removal can be carried out by the corpus product query demand to phonetic matrix The operation such as interference sound, to guarantee that the text information converted out is more accurate.
Similarly, the corpus product query demand that the corpus product query demand of picture format is converted to text formatting it Before, equally first it can carry out series of preprocessing operation by the corpus product query demand to picture format, for example gray proces are gone It the operation such as makes an uproar, to guarantee that the text information converted out is more accurate.
Sub-step S202 calculates the weight of each word in the M word according to preset part of speech weight distribution standard Value.
It should be understood that usual pronoun, interjection, conjunction, onomatopoeia etc. are that do not have to inquiry during actual queries Much help, thus more matchmakers low, and that the corpus product of user's needs can be embodied should be handed over for the weight of this kind of word distribution (for example " listening ", it is considered that the multimedia form of corpus product is audio, " seeing " is then video to the verb of physique formula, and " reading " is then Text), the adjective (such as " English ", " Chinese ") of corpus Product Language format can be embodied, corpus production can be embodied The title of product generic then distributes higher weight for it.
Sub-step S203 traverses N number of word, by the weighted value of the current word traversed and preset weight threshold Value is compared, and is filtered out the word that weighted value is greater than the weight threshold, is obtained N number of keyword.
It should be understood that being given above only a kind of tool for extracting keyword from the corpus product query demand Body implementation does not constitute any restriction to technical solution of the present invention, and in practical applications, those skilled in the art can To be configured as needed, herein with no restrictions.
Step S30 determines the corresponding characteristic information of corpus product that the user needs according to N number of keyword.
Specifically, the above-mentioned described corresponding characteristic information of corpus product, as can be identified for that the corpus product Key feature.
For example, obtained keyword is " listening ", " English ", " little prince ", then basis by going out to operate to said extracted The corpus product that keyword " listening " can determine that user needs should be audio data, can be determined according to keyword " English " The corpus product needs to be English edition, can determine that the affiliated type of corpus product is virgin according to keyword " little prince " Talk about story class.
It should be understood that in practical applications, for the ease of determining that the corresponding feature of corpus product is believed according to keyword Breath, can construct the corresponding relationship between different keywords and the feature of different corpus products in advance, then according to building in advance Mapping relations determine.
It should be understood that being given above by way of example only, any limit is not constituted to technical solution of the present invention Fixed, in practical applications, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
Step S40 meets the language of any feature in the characteristic information from lookup in corpus according to the characteristic information Expect information.
Specifically, corpus described in the present embodiment constructs in advance, can store text, picture, audio, view The corpus of a plurality of types of corpus informations such as frequency.
In addition, carrying out the inquiry of various dimensions in the corpus, i.e., to guarantee according to determining characteristic information The inquiry of multiple features.The corpus constructed in the present embodiment is that (one based on full-text search engine with ElasticSearch Search server, abbreviation ES) it is core, it is aided with MongDB (database based on distributed document storage) and MySql (one A Relational DBMS) composition.
Specifically, in ES, with the identifier (hereinafter referred to as: ID) for the corpus information being collected into from each big data platform It is rear three be used as index (index), the ID of corpus information as type (type), and in tables of data creation corpus title, Multiple index names such as corpus description, corpus label, language direction, price, sales volume.
Then, specific corpus information is stored using MongoDB, and establishes each in the index and MongoDB of ES The corresponding relationship of corpus information.
Meanwhile by the former data information of corpus information (i.e. without being grasped by any processing such as above-mentioned classification, addition labels Make) it stores into MySql, and corresponded with the index information in ES.
Thus according to determining characteristic information, when inquiring corpus information from corpus, directly by features described above information In feature bring into, utilize a kind of DSL (general big data query language DSL (domain-specific of ES Languages), for realizing the retrieval analysis of magnanimity machine data) in the query statement write of language.
For example, fuzzy search matchQuery (...), prefix search prefixQuery (...), filter picture TermFilter, wizardFilter etc..
Such as with when the subject name " little prince " for the corpus product inquired with determining needs is query information, from language The corpus information inquired in material library can be the related corpus with " little prince " of any language version, any multimedia form Information.
Step S50 is handled each corpus information according to the characteristic information and the corresponding feature of each corpus information, The corpus product for having whole features in the characteristic information is obtained, the corpus product is pushed to the user.
Specifically, in practical applications, the operation for having the corpus product of whole features in the characteristic information is obtained, It is realized approximately by following sub-step:
Firstly, filtering out according to the corresponding feature of each corpus information and having the most corpus information of feature, which is believed Breath is used as initial corpus product;
Then, it according to the characteristic information and the corresponding feature of the initial corpus product, determines to integration characteristic;
Then, it is extracted from the corpus information in addition to the initial corpus product described to the corresponding corpus of integration characteristic Information;
Finally, the corpus information extracted and the initial corpus product are combined, obtain having the feature letter The corpus product of whole features in breath.
In order to make it easy to understand, above-mentioned several steps, are illustrated below:
Such as above-mentioned whole characteristic informations are not equipped in the corpus information directly inquired from corpus, it looks into The English little prince's voice for having the most corpus information of feature and being only audio-frequency information ask out, there are also the Xiao Wang of Chinese version Ziwen word novel.
The processing then carried out specifically can be, and carry out language conversion, pair arrived to little prince's text novel of Chinese version The English edition answered;
Then, it by the audio-frequency information of English little prince in conjunction with the text novel of English edition, and is calibrated, so that broadcasting The text novel of the audio content and English edition put can be played simultaneously, and check in order to facilitate user, carry out voice During broadcasting, corresponding text can be carried out to highlighted mark.
It should be noted that the above is only for example, not constituting any restriction to technical solution of the present invention.
In addition, obtaining user's needs it is noted that be combined out by the corpus information of multiple format Corpus product when, Tika (Apache is released a for extracting the public tool of document content) can be selected, utilization is existing Metadata and structured content are detected and extracted to parsing class libraries from the document of different-format (such as HTML, PDF, Doc).
By foregoing description it is not difficult to find that the recommended method of corpus product provided in this embodiment, by being triggered from user Corpus product inquiry request in extract the corpus product query demand that user provides, and then according to from corpus product query demand In N number of keyword for extracting determine the corresponding characteristic information of corpus product that user needs, then according to determining feature Information, from the corpus information for meeting any feature in determining characteristic information is found out in corpus, finally according to determining feature Information feature corresponding with each corpus information inquired, handles the corpus inquired, can obtain having above-mentioned spy The corpus product of reference breath, so that the corpus information that finishing screen is selected is to meet the corpus information of user's actual need, into And substantially increase the recommendation accuracy rate of corpus product.
In addition, it is noted that in practical applications, the language that is generated according to the corpus product query demand that user provides Material product may need to charge, thus in the corpus product recommended for user, it can first judge that the corpus produces Whether product, which need, is charged.
Correspondingly, however, it is determined that the corpus product does not need to charge, then the corpus product is directly pushed to the use Family;If it is determined that institute's corpus product needs to charge, then notics of charge first can be issued to the user, then monitor what user made Feedback is first deducted from the payment account of the user preset if receiving the instruction that the agreement that the user makes is deducted fees The expense that the corpus product needs, is then pushed to the user for the corpus product.
By aforesaid operations mode, allow user according to the actual situation, it is determined whether needing to pay obtains institute's predicate Material product also greatly improves user experience while guaranteeing corpus Products Show accuracy rate.
Further, in order to preferably promote user experience, when the instruction of user feedback is to disagree to deduct fees, in order to the greatest extent The possible user volume kept using corpus, avoids customer churn, can recommend corpus product described in Free Acquisition to user Mode, such as by the relevant information of corpus share to predetermined number chat group or invite the new user of predetermined number Deng to be to avoid the loss of user, and can achieve the popularization to corpus.
In addition, for the corpus information in better maintenance and management corpus, so that according to the language in corpus Material information synthesis corpus finished product can preferably be bonded user demand, by the corpus product be pushed to the user it Afterwards, the feedback information that the user submits can also be further received, and then according to the feedback information in the corpus Corpus information carry out maintenance and management.
With reference to Fig. 4, Fig. 4 is a kind of flow diagram of the recommended method second embodiment of corpus product of the present invention.
Based on above-mentioned first embodiment, the recommended method of the present embodiment corpus product is before step S40, further includes:
Whether step S00, detecting in the characteristic information includes to identify described in the corpus product generic, mark The feature of corpus Product Language format, the mark corpus product Multimedia Style.If determining the characteristic information by detection In include that the mark corpus product generic, the mark corpus Product Language format, the mark corpus product are more The feature of media genre, thens follow the steps S40;Otherwise, step S01 is executed.
Step S01 obtains the historical query record of the user in predetermined period, according to historical query record and institute It states N number of keyword and determines the characteristic information.
It can specifically be realized in practical applications by following sub-step about operation described in step S01:
(1) the historical query record of the user in predetermined period is obtained.
Specifically, above-mentioned described historical query record, (such as nearly January) looks into before the essential record user Type, characteristic information of corpus product of inquiry etc., thus recorded according to historical query, it can determine the hobby of user.
In addition, it is recorded as predetermined period content, such as nearest one week by limiting the historical query obtained in the present embodiment, So that the information in the historical query record got has more reference value.
(2) utilize big data analysis technology, to the historical query record analyze, determine the user it is current when The query demand at quarter.
Specifically, it is used herein as the analysis that big data point technology records the historical query, particular by statistics In the historical query record, the frequency of use of which keyword is higher, belonging to the corpus product that user often searches in the recent period The language format and multimedia form of classification, corpus product.
(3) using the query demand at the current time as the first element, using N number of keyword as the second element.
(4) it according to the first element and second element, determines and identifies the corpus product generic, mark institute The feature of predicate material Product Language format, the mark corpus product Multimedia Style.
Further, in practical applications, in order to enable finally the characteristic information of determining corpus product is more accurate, i.e., It is more in line with user demand according to the corpus product that determining characteristic information recommends user, when acquisition historical query records It waits, the biological information of the user, preferably face characteristic information and vocal print characteristic information can also be obtained, passed through in this way The analysis of the biological information can determine the gender of the user, and substantially age, can thus filter out The content of user's concern of this age range gender, so that the consequently recommended corpus product to user is more in line with user and needs It asks.
It should be noted that most of user can't fill in perfect when using corpus due in practical applications Personal information, thus tend not to get actual age, gender of user etc. from personal information, and the present embodiment is direct Above- mentioned information are determined according to the biological information of user, not only available accurately above- mentioned information relatively, it can also be big It is big user-friendly.
Further, in practical applications, for convenience and fast and accurately, it is true according to the biological information got The age of the fixed user and gender, can advance with big data analysis technology, be aided with machine learning algorithm, construct one Big data analysis model.Then, after getting the biological information, the biological information that directly will acquire It is input in the analysis model, age and the gender of the user can be obtained.
About the building of the big data analysis model, substantially can be such that
Firstly, obtaining the biological information of known gender and the user at age from each big data platform;
Then, using the known gender and the biological information at age as sample data, it is input to big data analysis It is trained in training pattern, until can accurately export the sample data pair after inputting trained sample data Until the age of the user answered and gender, it can complete to train.
Correspondingly, big data analysis training pattern this moment is just needed big data analysis model.
In addition, in practical applications, the machine learning algorithm of selection, preferably convolutional neural networks algorithm.
Due to convolutional neural networks algorithm and more mature, in the concrete realization, those skilled in the art can from Row checks the related data of convolutional neural networks algorithm, and details are not described herein again.
Such as when the corpus product query demand that the user provides is only " novel " two words, if using big After data analysis technique analyzes the biological information of the user, determine that the user is an age on 30 years old left side Right women.
In addition, recording according to the historical query of the user got, it is found that the user often inquires fantasy type Animation novel.
Thus, according to above- mentioned information it was determined that it is a suitable 30 years old left side that the user, which needs the corpus product inquired, The animation novel of the fantasy type of right female reading.
Correspondingly, determining characteristic information may is that 30 years old, women, fantasy, animation, novel.
It should be noted that having the above is only for example, not constituting any restriction to technical solution of the present invention During body is realized, those skilled in the art, which can according to need, to be configured, and details are not described herein again.
By foregoing description it is not difficult to find that the recommended method of corpus product provided in this embodiment, according to the feature Information meets in the characteristic information before the corpus information of any feature from searching in corpus, by detecting the feature Whether information includes the mark corpus product generic, the mark corpus Product Language format, the mark corpus The feature of product Multimedia Style, and then determination is according to the lookup behaviour for most starting determining characteristic information progress corpus information Make, or reacquire parameter information and determine features described above, then in the search operation for carrying out corpus information, thus effective guarantee The accuracy of characteristic information for carrying out corpus information lookup, enables the subsequent obtained corpus product to be more in line with use Family actual demand.
In addition, the embodiment of the present invention also proposes a kind of storage medium, pushing away for corpus product is stored on the storage medium Program is recommended, the recommended program of the corpus product realizes the recommended method of corpus product as described above when being executed by processor The step of.
It is the structural block diagram of the recommendation apparatus first embodiment of corpus product of the present invention referring to Fig. 5, Fig. 5.
As shown in figure 5, the recommendation apparatus for the corpus product that the embodiment of the present invention proposes includes: to obtain module 5001, extract Module 5002, determining module 5003, searching module 5004 and generation module 5005.
Wherein, the acquisition module 5001, for receiving the corpus product inquiry request of user's triggering, according to the corpus Product inquiry request obtains the corpus product query demand that the user provides;Extraction module 5002, for being produced to the corpus Product query demand carries out keyword extraction processing, obtains N number of keyword, and N is the integer more than or equal to 1;The determining module 5003, for determining the corresponding characteristic information of corpus product that the user needs according to N number of keyword;The lookup Module 5004, for meeting the corpus of any feature in the characteristic information from lookup in corpus according to the characteristic information Information;The generation module 5005 is used for according to the characteristic information and the corresponding feature of each corpus information, to each corpus information It is handled, obtains the corpus product for having whole features in the characteristic information, the corpus product is pushed to the use Family.
The extraction module 5002 is extracting keyword from the corpus product inquiry request in order to facilitate understanding Operation, is given below a kind of concrete implementation mode, approximately as:
Firstly, carrying out participle and part-of-speech tagging processing to the corpus product query demand, M word is obtained;
Then, according to preset part of speech weight distribution standard, the weighted value of each word in the M word is calculated;
Finally, being traversed to N number of word, the weighted value of the current word traversed and preset weight threshold are carried out Compare, filters out the word that weighted value is greater than the weight threshold, obtain N number of keyword.
It should be understood that M should be the integer less than or equal to N in practical applications.
In addition, it is noted that due in practical applications, corpus product query demand that the user provides can be because The corresponding operation button of corpus product inquiry request that it is triggered is different, format and it is different, therefore in order to guarantee described mention Modulus block 5002 smoothly can carry out participle to the corpus product query demand and part-of-speech tagging comes out, and obtains M word, institute Extraction module 5002 is stated before executing aforesaid operations, is also used to: determining the format of the corpus product query demand.
Correspondingly, however, it is determined that the corpus product query demand is phonetic matrix, then speech recognition technology is utilized, by voice The corpus product query demand of format is converted to the corpus product query demand of text formatting;If it is determined that the corpus product inquiry Demand is picture format, then utilizes optical character recognition technology, the corpus product query demand of picture format is converted to text The corpus product query demand of format.
It is correspondingly, above-mentioned in order to facilitate understanding that participle and part-of-speech tagging processing are carried out to the corpus product query demand, The operation of M word is obtained, the present embodiment provides a kind of concrete implementation mode, approximately as:
According to the punctuation mark in the corpus product query demand of the text formatting, the corpus of the text formatting is produced Product query demand carries out subordinate sentence, obtains sentence to be segmented;
Maximum reverse matching cutting is carried out to the sentence to be segmented, determines the M word according to Custom Dictionaries;
According to preset part of speech standard information, part-of-speech tagging is carried out to the M word.
It should be understood that being given above only a kind of concrete implementation mode, not to technical solution of the present invention Any restriction is constituted, in practical applications, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
In addition, the generation module 5005 generates the operation for the corpus product that user needs in order to facilitate understanding, give below A kind of concrete implementation mode out, approximately as:
Firstly, filtering out according to the corresponding feature of each corpus information and having the most corpus information of feature, which is believed Breath is used as initial corpus product;
Then, it according to the characteristic information and the corresponding feature of the initial corpus product, determines to integration characteristic;
Then, it is extracted from the corpus information in addition to the initial corpus product described to the corresponding corpus of integration characteristic Information;
Finally, the corpus information extracted and the initial corpus product are combined, obtain having the feature letter The corpus product of whole features in breath.
It should be understood that being given above only a kind of concrete implementation mode, not to technical solution of the present invention Any restriction is constituted, in a particular application, those skilled in the art, which can according to need, to be configured, and the present invention does not do this Limitation.
By foregoing description it is not difficult to find that the recommendation apparatus of corpus product provided in this embodiment, by being triggered from user Corpus product inquiry request in extract the corpus product query demand that user provides, and then according to from corpus product query demand In N number of keyword for extracting determine the corresponding characteristic information of corpus product that user needs, then according to determining feature Information, from the corpus information for meeting any feature in determining characteristic information is found out in corpus, finally according to determining feature Information feature corresponding with each corpus information inquired, handles the corpus inquired, can obtain having above-mentioned spy The corpus product of reference breath, so that the corpus information that finishing screen is selected is to meet the corpus information of user's actual need, into And substantially increase the recommendation accuracy rate of corpus product.
In addition, it is noted that in practical applications, the language that is generated according to the corpus product query demand that user provides Material product may need to charge, thus in the corpus product recommended for user, it can first judge that the corpus produces Whether product, which need, is charged.
Correspondingly, however, it is determined that the corpus product does not need to charge, then the corpus product is directly pushed to the use Family;If it is determined that institute's corpus product needs to charge, then notics of charge first can be issued to the user, then monitor what user made Feedback is first deducted from the payment account of the user preset if receiving the instruction that the agreement that the user makes is deducted fees The expense that the corpus product needs, is then pushed to the user for the corpus product.
By aforesaid operations mode, allow user according to the actual situation, it is determined whether needing to pay obtains institute's predicate Material product also greatly improves user experience while guaranteeing corpus Products Show accuracy rate.
Further, in order to preferably promote user experience, when the instruction of user feedback is to disagree to deduct fees, in order to the greatest extent The possible user volume kept using corpus, avoids customer churn, can recommend corpus product described in Free Acquisition to user Mode, such as by the relevant information of corpus share to predetermined number chat group or invite the new user of predetermined number Deng to be to avoid the loss of user, and can achieve the popularization to corpus.
In addition, for the corpus information in better maintenance and management corpus, so that according to the language in corpus Material information synthesis corpus finished product can preferably be bonded user demand, by the corpus product be pushed to the user it Afterwards, the feedback information that the user submits can also be further received, and then according to the feedback information in the corpus Corpus information carry out maintenance and management.
It should be noted that workflow described above is only schematical, not to protection model of the invention Enclose composition limit, in practical applications, those skilled in the art can select according to the actual needs part therein or It all achieves the purpose of the solution of this embodiment, herein with no restrictions.
In addition, the not technical detail of detailed description in the present embodiment, reference can be made to provided by any embodiment of the invention The recommended method of corpus product, details are not described herein again.
The first embodiment of recommendation apparatus based on above-mentioned corpus product proposes the recommendation apparatus the of corpus product of the present invention Two embodiments.
In the present embodiment, the recommendation apparatus of the corpus product further include: detection module.
Wherein, whether the detection module includes the mark corpus product institute for detecting in the characteristic information Belong to the feature of classification, the mark corpus Product Language format, the mark corpus product Multimedia Style.
Correspondingly, if by detection, determine in the characteristic information to include the mark corpus product generic, mark The feature for knowing the corpus Product Language format, the mark corpus product Multimedia Style, then trigger the searching module and hold Row meets the operation of the corpus information of any feature in the characteristic information according to the characteristic information, from lookup in corpus.
Otherwise (do not include any of the above-described feature or any several features), then trigger the searching module and execute following step It is rapid:
Firstly, obtaining the historical query record of the user in predetermined period;
Then, using big data analysis technology, historical query record is analyzed, determines that the user's is current The query demand at moment;
Then, it using the query demand at the current time as the first element, is wanted using N number of keyword as second Element;
Finally, determining according to the first element and second element and identifying the corpus product generic, mark The feature of the corpus Product Language format, the mark corpus product Multimedia Style.
It should be understood that having the above is only for example, not constituting any restriction to technical solution of the present invention In body application, those skilled in the art, which can according to need, to be configured, and the present invention is without limitation.
By foregoing description it is not difficult to find that the recommendation apparatus of corpus product provided in this embodiment, according to the feature Information meets in the characteristic information before the corpus information of any feature from searching in corpus, by detecting the feature Whether information includes the mark corpus product generic, the mark corpus Product Language format, the mark corpus The feature of product Multimedia Style, and then determination is according to the lookup behaviour for most starting determining characteristic information progress corpus information Make, or reacquire parameter information and determine features described above, then in the search operation for carrying out corpus information, thus effective guarantee The accuracy of characteristic information for carrying out corpus information lookup, enables the subsequent obtained corpus product to be more in line with use Family actual demand.
It should be noted that workflow described above is only schematical, not to protection model of the invention Enclose composition limit, in practical applications, those skilled in the art can select according to the actual needs part therein or It all achieves the purpose of the solution of this embodiment, herein with no restrictions.
In addition, the not technical detail of detailed description in the present embodiment, reference can be made to provided by any embodiment of the invention The recommended method of corpus product, details are not described herein again.
In addition, it should be noted that, herein, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that process, method, article or system including a series of elements are not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or system Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or system including the element.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as read-only memory (Read Only Memory, ROM)/RAM, magnetic disk, CD), including some instructions are used so that one Terminal device (can be mobile phone, computer, server or the network equipment etc.) executes side described in each embodiment of the present invention Method.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of recommended method of corpus product, which is characterized in that the described method includes:
The corpus product inquiry request for receiving user's triggering obtains what the user provided according to the corpus product inquiry request Corpus product query demand;
Keyword extraction processing is carried out to the corpus product query demand, obtains N number of keyword, N is whole more than or equal to 1 Number;
According to N number of keyword, the corresponding characteristic information of corpus product that the user needs is determined;
According to the characteristic information, meet the corpus information of any feature in the characteristic information from lookup in corpus;
According to the characteristic information and the corresponding feature of each corpus information, each corpus information is handled, obtains having described The corpus product of whole features, is pushed to the user for the corpus product in characteristic information.
2. the method as described in claim 1, which is characterized in that described to be mentioned to corpus product query demand progress keyword The step of taking processing, obtaining N number of keyword, comprising:
Participle and part-of-speech tagging processing are carried out to the corpus product query demand, obtain M word, M is whole less than or equal to N Number;
According to preset part of speech weight distribution standard, the weighted value of each word in the M word is calculated;
N number of word is traversed, the weighted value of the current word traversed is compared with preset weight threshold, is filtered Weighted value is greater than the word of the weight threshold out, obtains N number of keyword.
3. method according to claim 2, which is characterized in that the corpus product query demand carries out participle and part-of-speech tagging Before the step of handling, obtaining M word, the method also includes:
Determine the format of the corpus product query demand;
If the corpus product query demand is phonetic matrix, speech recognition technology is utilized, by the corpus product of phonetic matrix Query demand is converted to the corpus product query demand of text formatting;
If the corpus product query demand is picture format, optical character recognition technology is utilized, by the corpus of picture format Product query demand is converted to the corpus product query demand of text formatting;
Wherein, described the step of participle and part-of-speech tagging processing are carried out to the corpus product query demand, obtain M word, packet It includes:
According to the punctuation mark in the corpus product query demand of the text formatting, the corpus product of the text formatting is looked into Inquiry demand carries out subordinate sentence, obtains sentence to be segmented;
Maximum reverse matching cutting is carried out to the sentence to be segmented, determines the M word according to Custom Dictionaries;
According to preset part of speech standard information, part-of-speech tagging is carried out to the M word.
4. method according to claim 2, which is characterized in that it is described according to the characteristic information, symbol is searched from corpus Before the step of closing the corpus information of any feature in the characteristic information, the method also includes:
Whether detect in the characteristic information includes the mark corpus product generic, the mark corpus Product Language The feature of format, the mark corpus product Multimedia Style;
If include in the characteristic information mark corpus product generic, the mark corpus Product Language format, The feature for identifying the corpus product Multimedia Style, thens follow the steps: according to the characteristic information, symbol is searched from corpus Close the operation of the corpus information of any feature in the characteristic information;
Otherwise, it thens follow the steps:
Obtain the historical query record of the user in predetermined period;
Using big data analysis technology, historical query record is analyzed, determines looking into for the current time of the user Inquiry demand;
Using the query demand at the current time as the first element, using N number of keyword as the second element;
According to the first element and second element, determines and identify the corpus product generic, the mark corpus The feature of Product Language format, the mark corpus product Multimedia Style.
5. such as the described in any item methods of Claims 1-4, which is characterized in that described according to the characteristic information and each corpus The corresponding feature of information, handles each corpus information, obtains the corpus product for having whole features in the characteristic information The step of, comprising:
It according to the corresponding feature of each corpus information, filters out and has the most corpus information of feature, using the corpus information as just Beginning corpus product;
According to the characteristic information and the corresponding feature of the initial corpus product, determine to integration characteristic;
It is extracted from the corpus information in addition to the initial corpus product described to the corresponding corpus information of integration characteristic;
The corpus information extracted and the initial corpus product are combined, obtain having all special in the characteristic information The corpus product of sign.
6. such as the described in any item methods of Claims 1-4, which is characterized in that it is described the corpus product is pushed to it is described Before the step of user, the method also includes:
Judge whether the corpus product needs to charge;
If the corpus product does not need to charge, then follow the steps: the corpus product is pushed to the operation of the user;
If the corpus product needs to charge, notics of charge is issued to the user, and receiving what the user made After agreeing to the instruction deducted fees, the expense that the corpus product needs is deducted from the payment account of the user preset, it will be described Corpus product is pushed to the user.
7. such as the described in any item methods of Claims 1-4, which is characterized in that it is described the corpus product is pushed to it is described After the step of user, the method also includes:
The feedback information that the user submits is received, the corpus information in the corpus is tieed up according to the feedback information Shield.
8. a kind of recommendation apparatus of corpus product, which is characterized in that described device includes:
Module is obtained, for receiving the corpus product inquiry request of user's triggering, is obtained according to the corpus product inquiry request The corpus product query demand that the user provides;
Extraction module obtains N number of keyword, N is big for carrying out keyword extraction processing to the corpus product query demand In the integer for being equal to 1;
Determining module, for determining the corresponding characteristic information of corpus product that the user needs according to N number of keyword;
Searching module, for meeting any feature in the characteristic information from searching in corpus according to the characteristic information Corpus information;
Generation module, for handling each corpus information according to the characteristic information and the corresponding feature of each corpus information, The corpus product for having whole features in the characteristic information is obtained, the corpus product is pushed to the user.
9. a kind of recommendation apparatus of corpus product, which is characterized in that the equipment includes: memory, processor and is stored in institute The recommended program for the corpus product that can be run on memory and on the processor is stated, the recommended program of the corpus product is matched The step of being set to the recommended method for realizing the corpus product as described in any one of claims 1 to 7.
10. a kind of storage medium, which is characterized in that be stored with the recommended program of corpus product, institute's predicate on the storage medium The recommended program of material product realizes the recommendation side of corpus product as described in any one of claim 1 to 7 when being executed by processor The step of method.
CN201910433178.8A 2019-05-21 2019-05-21 Corpus product recommendation method, apparatus, device and storage medium Active CN110297880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910433178.8A CN110297880B (en) 2019-05-21 2019-05-21 Corpus product recommendation method, apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910433178.8A CN110297880B (en) 2019-05-21 2019-05-21 Corpus product recommendation method, apparatus, device and storage medium

Publications (2)

Publication Number Publication Date
CN110297880A true CN110297880A (en) 2019-10-01
CN110297880B CN110297880B (en) 2023-04-18

Family

ID=68027101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910433178.8A Active CN110297880B (en) 2019-05-21 2019-05-21 Corpus product recommendation method, apparatus, device and storage medium

Country Status (1)

Country Link
CN (1) CN110297880B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968800A (en) * 2019-11-26 2020-04-07 北京明略软件系统有限公司 Information recommendation method and device, electronic equipment and readable storage medium
CN111209363A (en) * 2019-12-25 2020-05-29 华为技术有限公司 Corpus data processing method, apparatus, server and storage medium
CN112183089A (en) * 2020-09-25 2021-01-05 中国建设银行股份有限公司 Corpus analysis method and device, electronic equipment and storage medium
CN113111155A (en) * 2020-01-10 2021-07-13 阿里巴巴集团控股有限公司 Information display method, device, equipment and storage medium
CN114385781A (en) * 2021-11-30 2022-04-22 北京凯睿数加科技有限公司 Interface file recommendation method, device, equipment and medium based on statement model
CN117708308A (en) * 2024-02-06 2024-03-15 四川蓉城蕾茗科技有限公司 RAG natural language intelligent knowledge base management method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0756933A (en) * 1993-06-24 1995-03-03 Xerox Corp Method for retrieval of document
US20070255565A1 (en) * 2006-04-10 2007-11-01 Microsoft Corporation Clickable snippets in audio/video search results
US20100005094A1 (en) * 2002-10-17 2010-01-07 Poltorak Alexander I Apparatus and method for analyzing patent claim validity
CN103530385A (en) * 2013-10-18 2014-01-22 北京奇虎科技有限公司 Method and device for searching for information based on vertical searching channels
CN107391690A (en) * 2017-07-25 2017-11-24 李小明 A kind of method for handling documentation & info
CN109325182A (en) * 2018-10-12 2019-02-12 平安科技(深圳)有限公司 Dialogue-based information-pushing method, device, computer equipment and storage medium
WO2019049089A1 (en) * 2017-09-11 2019-03-14 Indian Institute Of Technology, Delhi Method, system and apparatus for multilingual and multimodal keyword search in a mixlingual speech corpus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0756933A (en) * 1993-06-24 1995-03-03 Xerox Corp Method for retrieval of document
US20100005094A1 (en) * 2002-10-17 2010-01-07 Poltorak Alexander I Apparatus and method for analyzing patent claim validity
US20070255565A1 (en) * 2006-04-10 2007-11-01 Microsoft Corporation Clickable snippets in audio/video search results
CN103530385A (en) * 2013-10-18 2014-01-22 北京奇虎科技有限公司 Method and device for searching for information based on vertical searching channels
CN107391690A (en) * 2017-07-25 2017-11-24 李小明 A kind of method for handling documentation & info
WO2019049089A1 (en) * 2017-09-11 2019-03-14 Indian Institute Of Technology, Delhi Method, system and apparatus for multilingual and multimodal keyword search in a mixlingual speech corpus
CN109325182A (en) * 2018-10-12 2019-02-12 平安科技(深圳)有限公司 Dialogue-based information-pushing method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王桂华等: "一种建立在对客户端浏览历史进行LDA建模基础上的个性化查询推荐算法", 《四川大学学报(自然科学版)》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968800A (en) * 2019-11-26 2020-04-07 北京明略软件系统有限公司 Information recommendation method and device, electronic equipment and readable storage medium
CN110968800B (en) * 2019-11-26 2023-05-02 北京明略软件系统有限公司 Information recommendation method and device, electronic equipment and readable storage medium
CN111209363A (en) * 2019-12-25 2020-05-29 华为技术有限公司 Corpus data processing method, apparatus, server and storage medium
CN111209363B (en) * 2019-12-25 2024-02-09 华为技术有限公司 Corpus data processing method, corpus data processing device, server and storage medium
CN113111155A (en) * 2020-01-10 2021-07-13 阿里巴巴集团控股有限公司 Information display method, device, equipment and storage medium
CN113111155B (en) * 2020-01-10 2024-04-19 阿里巴巴集团控股有限公司 Information display method, device, equipment and storage medium
CN112183089A (en) * 2020-09-25 2021-01-05 中国建设银行股份有限公司 Corpus analysis method and device, electronic equipment and storage medium
CN114385781A (en) * 2021-11-30 2022-04-22 北京凯睿数加科技有限公司 Interface file recommendation method, device, equipment and medium based on statement model
CN114385781B (en) * 2021-11-30 2022-09-27 南京数睿数据科技有限公司 Interface file recommendation method, device, equipment and medium based on statement model
CN117708308A (en) * 2024-02-06 2024-03-15 四川蓉城蕾茗科技有限公司 RAG natural language intelligent knowledge base management method and system
CN117708308B (en) * 2024-02-06 2024-05-14 四川蓉城蕾茗科技有限公司 RAG natural language intelligent knowledge base management method and system

Also Published As

Publication number Publication date
CN110297880B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN110297880A (en) Recommended method, device, equipment and the storage medium of corpus product
US11720572B2 (en) Method and system for content recommendation
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
EP3534272A1 (en) Natural language question answering systems
CN103914548B (en) Information search method and device
US20150074112A1 (en) Multimedia Question Answering System and Method
CN111324771B (en) Video tag determination method and device, electronic equipment and storage medium
CN110297988A (en) Hot topic detection method based on weighting LDA and improvement Single-Pass clustering algorithm
WO2020233386A1 (en) Intelligent question-answering method and device employing aiml, computer apparatus, and storage medium
CN109960756A (en) Media event information inductive method
CN109255012B (en) Method and device for machine reading understanding and candidate data set size reduction
KR20070087398A (en) Method and system for classfying music theme using title of music
CN111414763A (en) Semantic disambiguation method, device, equipment and storage device for sign language calculation
Chien et al. Topic-based hierarchical segmentation
Dinarelli et al. Discriminative reranking for spoken language understanding
CN112861990A (en) Topic clustering method and device based on keywords and entities and computer-readable storage medium
CN109508441A (en) Data analysing method, device and electronic equipment
CN112036178A (en) Distribution network entity related semantic search method
CN109147793A (en) The processing method of voice data, apparatus and system
CN114443847A (en) Text classification method, text processing method, text classification device, text processing device, computer equipment and storage medium
CN110188189A (en) A kind of method that Knowledge based engineering adaptive event index cognitive model extracts documentation summary
CN113901173A (en) Retrieval method, retrieval device, electronic equipment and computer storage medium
US20220365956A1 (en) Method and apparatus for generating patent summary information, and electronic device and medium
CN112562684A (en) Voice recognition method and device and electronic equipment
CN108345694B (en) Document retrieval method and system based on theme database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant