CN110297880A - Recommended method, device, equipment and the storage medium of corpus product - Google Patents
Recommended method, device, equipment and the storage medium of corpus product Download PDFInfo
- Publication number
- CN110297880A CN110297880A CN201910433178.8A CN201910433178A CN110297880A CN 110297880 A CN110297880 A CN 110297880A CN 201910433178 A CN201910433178 A CN 201910433178A CN 110297880 A CN110297880 A CN 110297880A
- Authority
- CN
- China
- Prior art keywords
- corpus
- product
- information
- user
- characteristic information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to big data analysis technical fields, disclose recommended method, device, equipment and the storage medium of a kind of corpus product.This method comprises: receiving the corpus product inquiry request of user's triggering, the corpus product query demand that user provides is obtained according to corpus product inquiry request;Keyword extraction processing is carried out to corpus product query demand, obtains N number of keyword, N is the integer more than or equal to 1;According to N number of keyword, the corresponding characteristic information of corpus product that user needs is determined;According to characteristic information, meet the corpus information of any feature in the characteristic information from lookup in corpus;According to characteristic information and the corresponding feature of each corpus information, each corpus information is handled, the corpus product for having whole features in characteristic information is obtained, corpus product is pushed to user.By the above-mentioned means, making the corpus product recommended for user be to meet user's actual need, to greatly improve the recommendation accuracy rate of corpus product.
Description
Technical field
The present invention relates to big data analysis technical field more particularly to a kind of recommended method, device, the equipment of corpus product
And storage medium.
Background technique
Traditional corpus refers to the extensive e-text library through scientific sampling and processing.With the development of the times, mesh
Preceding corpus be not confined to only store the corpus information of text type, and it is each can also to store picture, audio, video etc.
The corpus information of seed type.
Although the corpus information stored in existing corpus is many kinds of, substantial amounts.But existing corpus is looked into
Inquiry mode can not identify comprehensively the query demand of user, so that the corpus information screened and it is unsatisfactory for user's actual need,
The recommendation accuracy rate of corpus product is low.
So it is urgent to provide a kind of according to user's actual need, recommend the method for corpus product, for user to promote corpus
The recommendation accuracy rate of product.
Above content is only used to facilitate the understanding of the technical scheme, and is not represented and is recognized that above content is existing skill
Art.
Summary of the invention
The main purpose of the present invention is to provide recommended method, device, equipment and the storage medium of a kind of corpus product, purports
According to user's actual need, recommend corpus product for user, to promote the recommendation accuracy rate of corpus product.
To achieve the above object, the present invention provides a kind of recommended method of corpus product, the method includes following steps
It is rapid:
The corpus product inquiry request for receiving user's triggering obtains the user according to the corpus product inquiry request and mentions
The corpus product query demand of confession;
Keyword extraction processing is carried out to the corpus product query demand, obtains N number of keyword, N is more than or equal to 1
Integer;
According to N number of keyword, the corresponding characteristic information of corpus product that the user needs is determined;
According to the characteristic information, meet the corpus information of any feature in the characteristic information from lookup in corpus;
According to the characteristic information and the corresponding feature of each corpus information, each corpus information is handled, is had
The corpus product of whole features, is pushed to the user for the corpus product in the characteristic information.
Preferably, described that keyword extraction processing is carried out to the corpus product query demand, obtain the step of N number of keyword
Suddenly, comprising:
Participle and part-of-speech tagging processing are carried out to the corpus product query demand, obtain M word, M is less than or equal to N's
Integer;
According to preset part of speech weight distribution standard, the weighted value of each word in the M word is calculated;
N number of word is traversed, the weighted value of the current word traversed is compared with preset weight threshold,
The word that weighted value is greater than the weight threshold is filtered out, N number of keyword is obtained.
Preferably, the corpus product query demand carries out participle and part-of-speech tagging processing, the step of obtaining M word it
Before, the method also includes:
Determine the format of the corpus product query demand;
If the corpus product query demand is phonetic matrix, speech recognition technology is utilized, by the corpus of phonetic matrix
Product query demand is converted to the corpus product query demand of text formatting;
If the corpus product query demand is picture format, optical character recognition technology is utilized, by picture format
Corpus product query demand is converted to the corpus product query demand of text formatting;
Wherein, described that participle and part-of-speech tagging processing are carried out to the corpus product query demand, obtain the step of M word
Suddenly, comprising:
According to the punctuation mark in the corpus product query demand of the text formatting, the corpus of the text formatting is produced
Product query demand carries out subordinate sentence, obtains sentence to be segmented;
Maximum reverse matching cutting is carried out to the sentence to be segmented, determines the M word according to Custom Dictionaries;
According to preset part of speech standard information, part-of-speech tagging is carried out to the M word.
Preferably, described according to the characteristic information, meet any feature in the characteristic information from searching in corpus
Corpus information the step of before, the method also includes:
Whether detect in the characteristic information includes the mark corpus product generic, the mark corpus product
The feature of language format, the mark corpus product Multimedia Style;
If in the characteristic information including the mark corpus product generic, the mark corpus Product Language lattice
The feature of formula, the mark corpus product Multimedia Style, thens follow the steps: according to the characteristic information, looking into from corpus
Look for the operation for meeting the corpus information of any feature in the characteristic information;
Otherwise, it thens follow the steps:
Obtain the historical query record of the user in predetermined period;
Using big data analysis technology, historical query record is analyzed, determines the current time of the user
Query demand;
Using the query demand at the current time as the first element, using N number of keyword as the second element;
According to the first element and second element, determines and identify described in the corpus product generic, mark
The feature of corpus Product Language format, the mark corpus product Multimedia Style.
Preferably, described according to the characteristic information and the corresponding feature of each corpus information, at each corpus information
Reason obtains having the step of corpus product of whole features in the characteristic information, comprising:
According to the corresponding feature of each corpus information, filters out and have the most corpus information of feature, which is made
For initial corpus product;
According to the characteristic information and the corresponding feature of the initial corpus product, determine to integration characteristic;
It is extracted from the corpus information in addition to the initial corpus product described to the corresponding corpus information of integration characteristic;
The corpus information extracted and the initial corpus product are combined, obtain having complete in the characteristic information
The corpus product of portion's feature.
Preferably, before the described the step of corpus product is pushed to the user, the method also includes:
Judge whether the corpus product needs to charge;
If the corpus product does not need to charge, then follow the steps: the corpus product is pushed to the behaviour of the user
Make;
If the corpus product needs to charge, notics of charge is issued to the user, and do receiving the user
After the instruction that agreement out is deducted fees, the expense that the corpus product needs is deducted from the payment account of the user preset, it will
The corpus product is pushed to the user.
Preferably, after the described the step of corpus product is pushed to the user, the method also includes:
Receive the feedback information that the user submits, according to the feedback information to the corpus information in the corpus into
Row maintenance.
In addition, to achieve the above object, the present invention also proposes a kind of recommendation apparatus of corpus product, described device includes:
Module is obtained, for receiving the corpus product inquiry request of user's triggering, according to the corpus product inquiry request
Obtain the corpus product query demand that the user provides;
Extraction module obtains N number of keyword, N for carrying out keyword extraction processing to the corpus product query demand
For the integer more than or equal to 1;
Determining module, for according to N number of keyword, determining that the corresponding feature of corpus product that the user needs is believed
Breath;
Searching module, for meeting any spy in the characteristic information from searching in corpus according to the characteristic information
The corpus information of sign;
Generation module, for being carried out to each corpus information according to the characteristic information and the corresponding feature of each corpus information
Processing obtains the corpus product for having whole features in the characteristic information, the corpus product is pushed to the user.
In addition, to achieve the above object, the present invention also proposes a kind of recommendation apparatus of corpus product, the equipment includes:
Memory, processor and the recommended program for being stored in the corpus product that can be run on the memory and on the processor,
The recommended program of the corpus product is arranged for carrying out the step of recommended method of corpus product as described above.
In addition, to achieve the above object, the present invention also proposes a kind of storage medium, corpus is stored on the storage medium
The recommended program of the recommended program of product, the corpus product realizes corpus product as described above when being executed by processor
The step of recommended method.
The suggested design of corpus product provided by the invention, by being extracted from the corpus product inquiry request that user triggers
The corpus product query demand that user provides, and then according to the N number of keyword extracted from corpus product query demand come really
The corresponding characteristic information of corpus product for determining user's needs finds out symbol then according to determining characteristic information from corpus
The corpus information of any feature in determining characteristic information is closed, finally according to each corpus information for determining characteristic information and inquiring
Corresponding feature handles the corpus inquired, can obtain the corpus product for having features described above information, so that
The corpus information that finishing screen is selected is the recommendation for meeting the corpus information of user's actual need, and then substantially increasing corpus product
Accuracy rate.
Detailed description of the invention
Fig. 1 is the structural representation of the recommendation apparatus of the corpus product for the hardware running environment that the embodiment of the present invention is related to
Figure;
Fig. 2 is the flow diagram of the recommended method first embodiment of corpus product of the present invention;
Fig. 3 is the specific implementation flow schematic diagram of step S20 in the recommended method of corpus product of the present invention;
Fig. 4 is the flow diagram of the recommended method second embodiment of corpus product of the present invention;
Fig. 5 is the structural block diagram of the recommendation apparatus first embodiment of corpus product of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
Referring to Fig.1, Fig. 1 is the recommendation apparatus knot of the corpus product for the hardware running environment that the embodiment of the present invention is related to
Structure schematic diagram.
As shown in Figure 1, the recommendation apparatus of the corpus product may include: processor 1001, such as central processing unit
(Central Processing Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory
1005.Wherein, communication bus 1002 is for realizing the connection communication between these components.User interface 1003 may include display
Shield (Display), input unit such as keyboard (Keyboard), optional user interface 1003 can also include that the wired of standard connects
Mouth, wireless interface.Network interface 1004 optionally may include standard wireline interface and wireless interface (such as Wireless Fidelity
(WIreless-FIdelity, WI-FI) interface).Memory 1005 can be the random access memory (Random of high speed
Access Memory, RAM) memory, be also possible to stable nonvolatile memory (Non-Volatile Memory,
), such as magnetic disk storage NVM.Memory 1005 optionally can also be the storage device independently of aforementioned processor 1001.
It will be understood by those skilled in the art that structure shown in Fig. 1 is not constituted to the recommendation apparatus of corpus product
It limits, may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
As shown in Figure 1, as may include operating system, network communication mould in a kind of memory 1005 of storage medium
The recommended program of block, Subscriber Interface Module SIM and corpus product.
In the recommendation apparatus of corpus product shown in Fig. 1, network interface 1004 is mainly used for carrying out with network server
Data communication;User interface 1003 is mainly used for carrying out data interaction with user;In the recommendation apparatus of corpus product of the present invention
Processor 1001, memory 1005 can be set in the recommendation apparatus of corpus product, and the recommendation apparatus of the corpus product is logical
It crosses processor 1001 and calls the recommended program of the corpus product stored in memory 1005, and execute provided in an embodiment of the present invention
The recommended method of corpus product.
The embodiment of the invention provides a kind of recommended methods of corpus product, and referring to Fig. 2, Fig. 2 is a kind of corpus of the present invention
The flow diagram of the recommended method first embodiment of product.
In the present embodiment, the recommended method of the corpus product the following steps are included:
Step S10 receives the corpus product inquiry request of user's triggering, obtains institute according to the corpus product inquiry request
The corpus product query demand of user's offer is provided.
Specifically, the user that the executing subject of the present embodiment can be progress corpus product inquiry operation uses any
Terminal device, such as smart phone, tablet computer, personal computer etc., will not enumerate herein, also with no restrictions to this.
Correspondingly, the mode for triggering the corpus product inquiry request specifically can be user and open on terminal device
Then the corpus inquiry application (Application, App) that the corpus transaction platform of installation provides is looked by clicking corpus
The Text Entry being arranged on a certain function button on App, such as corpus inquiry App or voice input key are ask, and also or
It is generated after the operation buttons such as person's picture input key.
Correspondingly, the corpus product inquiry request got, then when can be the user and operationally stating function button
The information of input.
Step S20 carries out keyword extraction processing to the corpus product query demand, obtains N number of keyword.
It should be understood that in practical applications, the corpus product query demand that user provides at least will include a word,
Several words, a word or more information.Thus, keyword extraction processing is being carried out to the corpus product query demand
Afterwards, the N number of keyword obtained is at least one, i.e. the value of N should be greater than or equal to 1 integer.
In addition, carrying out keyword extraction processing to the corpus product query demand in order to facilitate understanding, N number of key is obtained
The operation of word provides a kind of specific extracting mode in the present embodiment, substantially realization step as shown in figure 3, below in conjunction with Fig. 3 into
Row illustrates.
Sub-step S201 carries out participle to the corpus product query demand and part-of-speech tagging is handled, obtains M word.
It should be understood that since the N number of keyword finally determined is chosen from M obtained word, thus
The value of M cannot necessarily be greater than the value of N in practical applications, i.e. M should be less than or equal to the integer of N.
In addition, in the present embodiment, the described participle that the corpus product query demand is carried out in sub-step S201
It is handled with part-of-speech tagging, specifically:
Firstly, according to the punctuation mark in the corpus product query demand, such as comma, fullstop, the corpus is produced
Product query demand carries out subordinate sentence, obtains sentence to be segmented.
Such as the content in the corpus product query demand of user's offer is that " hello, I wants to listen the Xiao Wang of English
Son.", system when the current character traversed is ", ", then carries out subordinate sentence by traversing to the content in above-mentioned sentence,
By the content before ", " that traverses as a sentence (the referred to as first sentence to be segmented) to be segmented, it is subsequent to be then followed by traversal
Content, when traverse "." when, carry out subordinate sentence again, by a upper punctuation mark ", " and current punctuation mark "." between it is interior
Hold and is used as another sentence (the referred to as second sentence to be segmented) to be segmented.
Then, maximum reverse matching cutting is carried out to the sentence to be segmented, determines the M according to Custom Dictionaries
Word.
Specifically, so-called " maximum reverse matching cutting " refers to when treating participle sentence and carrying out cutting, according to from the right side
It turns left and starts cutting.
And above-mentioned described Custom Dictionaries refer to the existing phrase for collecting typing from each big data platform, dictionary in advance,
The Custom Dictionaries contain the existing various forms of words being likely to occur substantially.
In order to make it easy to understand, herein using maximum reverse matching slit mode, to obtained in the example above second wait divide
Word sentence carries out cutting.
It is assumed that the word recorded in customized dictionary D has: D={ " I ", " seeing ", " reading ", " listening ", " Chinese ", " English
", " 10,000 why ", " little prince " ... }.
Maximum reverse matching cutting is being carried out to the described second sentence (S={ " I wants to listen English little prince " }) to be segmented
When operations, a maximum fractionation length, such as 6 are first defined, are then divided since being turned left the right side:
(1) the candidate word W1 taken out from S is " I wants to listen English ";
(2) word recorded in Custom Dictionaries D is searched, candidate word W1 is found not in Custom Dictionaries D, by candidate word W1
Leftmost first character removes, and obtains candidate word W2 " wanting to listen English ";
(3) word recorded in Custom Dictionaries D is searched, candidate word W2 is found not in Custom Dictionaries D, by candidate word W2
Leftmost first character removes, and obtains candidate word W3 " listening English ";
(4) word recorded in Custom Dictionaries D is searched, candidate word W3 is found not in Custom Dictionaries D, by candidate word W3
Leftmost first character removes, and obtains candidate word W4 " English ";
(5) search the word that records in Custom Dictionaries D, find candidate word W4 in Custom Dictionaries D, just need at this time by
Candidate word W4 is splitted out from S, and S becomes " I wants to listen little prince ";
(6) according to segmentation length 6, the content in S is intercepted again, obtains candidate word W5 " I wants to listen little prince ";
(7) operation of the step (1) into step (6) is repeated, until completing whole cuttings to the content in S.
According to above-mentioned slicing operation, the word being syncopated as from the second sentence " I wants to listen English little prince " to be segmented are as follows:
I, listen, English, little prince.
It should be understood that being given above only a kind of specific participle mode, not to technical solution of the present invention
Any restriction is constituted, in practical applications, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
In addition, it is noted that in practical applications, the administrative staff of corpus can also look into according to the history of user
Consultation record is updated the Custom Dictionaries.
Finally, carrying out part-of-speech tagging to the M word according to preset part of speech standard information.
It should be noted that part of speech standard information described in the present embodiment specifically refers to Chinese part of speech standard information,
Which class word of concrete regulation is noun in the part of speech standard information, which class word is nounoun pronoun, which class word is verb, which class word is shape
Hold word, which class word is time word etc., be will not enumerate herein.
Still by taking 4 words that above-mentioned fractionation obtains as an example, then part of speech is carried out to 4 words according to the part of speech standard information
Result after mark can be such that " I "<pronoun>, " listening "<verb>, " English "<adjective>, " little prince "<noun>.
It should be understood that being given above only a kind of labeling form, technical solution of the present invention is not constituted
Any restriction, in practical applications, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
In addition, it is noted that due in practical applications, corpus product query demand that the user provides can be because
The corresponding operation button of corpus product inquiry request that it is triggered is different, format and it is different.
Such as when the key of the user's operation is Text Entry, the corpus product query demand got has
Body is text formatting.
Also such as, when the key of the user's operation is that voice inputs key, the corpus product inquiry got is needed
Seek specially phonetic matrix.
It such as, is also that picture inputs on time in the key of the user's operation, the corpus product query demand got
Specially picture format.
And the above-mentioned participle needed for corpus product inquiry provided and the processing of part of speech standard are in text formatting
On the basis of carry out, thus participle and part of speech mark are carried out to the corpus product query demand in order to guarantee smoothly to execute
Note processing, obtains the operation of M word, before executing sub-step S201, can first determine the corpus product query demand
Then format is adaptively adjusted according to the format of the corpus product query demand.
Such as, however, it is determined that the corpus product query demand is phonetic matrix, then first with speech recognition technology, by language
The corpus product query demand of sound format is converted to the corpus product query demand of text formatting, then executes sub-step again
S201;If it is determined that the corpus product query demand is picture format, then first with optical character identification (Optical
The corpus product query demand of picture format is converted to the language of text formatting by Character Recognition, OCR technique
Expect product query demand, then executes sub-step S201 again;If the corpus product query demand is text formatting, directly hold
Row sub-step S201.
That is, the operation in the sub-step S201, substantially:
According to the punctuation mark in the corpus product query demand of the text formatting, the corpus of the text formatting is produced
Product query demand carries out subordinate sentence, obtains sentence to be segmented;
According to Custom Dictionaries, maximum reverse matching cutting is carried out to the sentence to be segmented, obtains the M word;
According to preset part of speech standard information, part-of-speech tagging is carried out to the M word.
Further, it for the characteristic information for guaranteeing subsequent determination reference value with higher, is mentioned carrying out keyword
Before extract operation, first Text Pretreatment operation can be carried out by the corpus product query demand to text formatting.
Such as remove stop words, that is, remove and contain in feedback information such as:, the word of not no practical significance.
Also such as, remove invalid spcial character, such as emoticon, various punctuation marks.
Correspondingly, in the corpus product query demand that the corpus product query demand of phonetic matrix is converted to text formatting
Before, equally first series of preprocessing operation, such as filtering, removal can be carried out by the corpus product query demand to phonetic matrix
The operation such as interference sound, to guarantee that the text information converted out is more accurate.
Similarly, the corpus product query demand that the corpus product query demand of picture format is converted to text formatting it
Before, equally first it can carry out series of preprocessing operation by the corpus product query demand to picture format, for example gray proces are gone
It the operation such as makes an uproar, to guarantee that the text information converted out is more accurate.
Sub-step S202 calculates the weight of each word in the M word according to preset part of speech weight distribution standard
Value.
It should be understood that usual pronoun, interjection, conjunction, onomatopoeia etc. are that do not have to inquiry during actual queries
Much help, thus more matchmakers low, and that the corpus product of user's needs can be embodied should be handed over for the weight of this kind of word distribution
(for example " listening ", it is considered that the multimedia form of corpus product is audio, " seeing " is then video to the verb of physique formula, and " reading " is then
Text), the adjective (such as " English ", " Chinese ") of corpus Product Language format can be embodied, corpus production can be embodied
The title of product generic then distributes higher weight for it.
Sub-step S203 traverses N number of word, by the weighted value of the current word traversed and preset weight threshold
Value is compared, and is filtered out the word that weighted value is greater than the weight threshold, is obtained N number of keyword.
It should be understood that being given above only a kind of tool for extracting keyword from the corpus product query demand
Body implementation does not constitute any restriction to technical solution of the present invention, and in practical applications, those skilled in the art can
To be configured as needed, herein with no restrictions.
Step S30 determines the corresponding characteristic information of corpus product that the user needs according to N number of keyword.
Specifically, the above-mentioned described corresponding characteristic information of corpus product, as can be identified for that the corpus product
Key feature.
For example, obtained keyword is " listening ", " English ", " little prince ", then basis by going out to operate to said extracted
The corpus product that keyword " listening " can determine that user needs should be audio data, can be determined according to keyword " English "
The corpus product needs to be English edition, can determine that the affiliated type of corpus product is virgin according to keyword " little prince "
Talk about story class.
It should be understood that in practical applications, for the ease of determining that the corresponding feature of corpus product is believed according to keyword
Breath, can construct the corresponding relationship between different keywords and the feature of different corpus products in advance, then according to building in advance
Mapping relations determine.
It should be understood that being given above by way of example only, any limit is not constituted to technical solution of the present invention
Fixed, in practical applications, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
Step S40 meets the language of any feature in the characteristic information from lookup in corpus according to the characteristic information
Expect information.
Specifically, corpus described in the present embodiment constructs in advance, can store text, picture, audio, view
The corpus of a plurality of types of corpus informations such as frequency.
In addition, carrying out the inquiry of various dimensions in the corpus, i.e., to guarantee according to determining characteristic information
The inquiry of multiple features.The corpus constructed in the present embodiment is that (one based on full-text search engine with ElasticSearch
Search server, abbreviation ES) it is core, it is aided with MongDB (database based on distributed document storage) and MySql (one
A Relational DBMS) composition.
Specifically, in ES, with the identifier (hereinafter referred to as: ID) for the corpus information being collected into from each big data platform
It is rear three be used as index (index), the ID of corpus information as type (type), and in tables of data creation corpus title,
Multiple index names such as corpus description, corpus label, language direction, price, sales volume.
Then, specific corpus information is stored using MongoDB, and establishes each in the index and MongoDB of ES
The corresponding relationship of corpus information.
Meanwhile by the former data information of corpus information (i.e. without being grasped by any processing such as above-mentioned classification, addition labels
Make) it stores into MySql, and corresponded with the index information in ES.
Thus according to determining characteristic information, when inquiring corpus information from corpus, directly by features described above information
In feature bring into, utilize a kind of DSL (general big data query language DSL (domain-specific of ES
Languages), for realizing the retrieval analysis of magnanimity machine data) in the query statement write of language.
For example, fuzzy search matchQuery (...), prefix search prefixQuery (...), filter picture
TermFilter, wizardFilter etc..
Such as with when the subject name " little prince " for the corpus product inquired with determining needs is query information, from language
The corpus information inquired in material library can be the related corpus with " little prince " of any language version, any multimedia form
Information.
Step S50 is handled each corpus information according to the characteristic information and the corresponding feature of each corpus information,
The corpus product for having whole features in the characteristic information is obtained, the corpus product is pushed to the user.
Specifically, in practical applications, the operation for having the corpus product of whole features in the characteristic information is obtained,
It is realized approximately by following sub-step:
Firstly, filtering out according to the corresponding feature of each corpus information and having the most corpus information of feature, which is believed
Breath is used as initial corpus product;
Then, it according to the characteristic information and the corresponding feature of the initial corpus product, determines to integration characteristic;
Then, it is extracted from the corpus information in addition to the initial corpus product described to the corresponding corpus of integration characteristic
Information;
Finally, the corpus information extracted and the initial corpus product are combined, obtain having the feature letter
The corpus product of whole features in breath.
In order to make it easy to understand, above-mentioned several steps, are illustrated below:
Such as above-mentioned whole characteristic informations are not equipped in the corpus information directly inquired from corpus, it looks into
The English little prince's voice for having the most corpus information of feature and being only audio-frequency information ask out, there are also the Xiao Wang of Chinese version
Ziwen word novel.
The processing then carried out specifically can be, and carry out language conversion, pair arrived to little prince's text novel of Chinese version
The English edition answered;
Then, it by the audio-frequency information of English little prince in conjunction with the text novel of English edition, and is calibrated, so that broadcasting
The text novel of the audio content and English edition put can be played simultaneously, and check in order to facilitate user, carry out voice
During broadcasting, corresponding text can be carried out to highlighted mark.
It should be noted that the above is only for example, not constituting any restriction to technical solution of the present invention.
In addition, obtaining user's needs it is noted that be combined out by the corpus information of multiple format
Corpus product when, Tika (Apache is released a for extracting the public tool of document content) can be selected, utilization is existing
Metadata and structured content are detected and extracted to parsing class libraries from the document of different-format (such as HTML, PDF, Doc).
By foregoing description it is not difficult to find that the recommended method of corpus product provided in this embodiment, by being triggered from user
Corpus product inquiry request in extract the corpus product query demand that user provides, and then according to from corpus product query demand
In N number of keyword for extracting determine the corresponding characteristic information of corpus product that user needs, then according to determining feature
Information, from the corpus information for meeting any feature in determining characteristic information is found out in corpus, finally according to determining feature
Information feature corresponding with each corpus information inquired, handles the corpus inquired, can obtain having above-mentioned spy
The corpus product of reference breath, so that the corpus information that finishing screen is selected is to meet the corpus information of user's actual need, into
And substantially increase the recommendation accuracy rate of corpus product.
In addition, it is noted that in practical applications, the language that is generated according to the corpus product query demand that user provides
Material product may need to charge, thus in the corpus product recommended for user, it can first judge that the corpus produces
Whether product, which need, is charged.
Correspondingly, however, it is determined that the corpus product does not need to charge, then the corpus product is directly pushed to the use
Family;If it is determined that institute's corpus product needs to charge, then notics of charge first can be issued to the user, then monitor what user made
Feedback is first deducted from the payment account of the user preset if receiving the instruction that the agreement that the user makes is deducted fees
The expense that the corpus product needs, is then pushed to the user for the corpus product.
By aforesaid operations mode, allow user according to the actual situation, it is determined whether needing to pay obtains institute's predicate
Material product also greatly improves user experience while guaranteeing corpus Products Show accuracy rate.
Further, in order to preferably promote user experience, when the instruction of user feedback is to disagree to deduct fees, in order to the greatest extent
The possible user volume kept using corpus, avoids customer churn, can recommend corpus product described in Free Acquisition to user
Mode, such as by the relevant information of corpus share to predetermined number chat group or invite the new user of predetermined number
Deng to be to avoid the loss of user, and can achieve the popularization to corpus.
In addition, for the corpus information in better maintenance and management corpus, so that according to the language in corpus
Material information synthesis corpus finished product can preferably be bonded user demand, by the corpus product be pushed to the user it
Afterwards, the feedback information that the user submits can also be further received, and then according to the feedback information in the corpus
Corpus information carry out maintenance and management.
With reference to Fig. 4, Fig. 4 is a kind of flow diagram of the recommended method second embodiment of corpus product of the present invention.
Based on above-mentioned first embodiment, the recommended method of the present embodiment corpus product is before step S40, further includes:
Whether step S00, detecting in the characteristic information includes to identify described in the corpus product generic, mark
The feature of corpus Product Language format, the mark corpus product Multimedia Style.If determining the characteristic information by detection
In include that the mark corpus product generic, the mark corpus Product Language format, the mark corpus product are more
The feature of media genre, thens follow the steps S40;Otherwise, step S01 is executed.
Step S01 obtains the historical query record of the user in predetermined period, according to historical query record and institute
It states N number of keyword and determines the characteristic information.
It can specifically be realized in practical applications by following sub-step about operation described in step S01:
(1) the historical query record of the user in predetermined period is obtained.
Specifically, above-mentioned described historical query record, (such as nearly January) looks into before the essential record user
Type, characteristic information of corpus product of inquiry etc., thus recorded according to historical query, it can determine the hobby of user.
In addition, it is recorded as predetermined period content, such as nearest one week by limiting the historical query obtained in the present embodiment,
So that the information in the historical query record got has more reference value.
(2) utilize big data analysis technology, to the historical query record analyze, determine the user it is current when
The query demand at quarter.
Specifically, it is used herein as the analysis that big data point technology records the historical query, particular by statistics
In the historical query record, the frequency of use of which keyword is higher, belonging to the corpus product that user often searches in the recent period
The language format and multimedia form of classification, corpus product.
(3) using the query demand at the current time as the first element, using N number of keyword as the second element.
(4) it according to the first element and second element, determines and identifies the corpus product generic, mark institute
The feature of predicate material Product Language format, the mark corpus product Multimedia Style.
Further, in practical applications, in order to enable finally the characteristic information of determining corpus product is more accurate, i.e.,
It is more in line with user demand according to the corpus product that determining characteristic information recommends user, when acquisition historical query records
It waits, the biological information of the user, preferably face characteristic information and vocal print characteristic information can also be obtained, passed through in this way
The analysis of the biological information can determine the gender of the user, and substantially age, can thus filter out
The content of user's concern of this age range gender, so that the consequently recommended corpus product to user is more in line with user and needs
It asks.
It should be noted that most of user can't fill in perfect when using corpus due in practical applications
Personal information, thus tend not to get actual age, gender of user etc. from personal information, and the present embodiment is direct
Above- mentioned information are determined according to the biological information of user, not only available accurately above- mentioned information relatively, it can also be big
It is big user-friendly.
Further, in practical applications, for convenience and fast and accurately, it is true according to the biological information got
The age of the fixed user and gender, can advance with big data analysis technology, be aided with machine learning algorithm, construct one
Big data analysis model.Then, after getting the biological information, the biological information that directly will acquire
It is input in the analysis model, age and the gender of the user can be obtained.
About the building of the big data analysis model, substantially can be such that
Firstly, obtaining the biological information of known gender and the user at age from each big data platform;
Then, using the known gender and the biological information at age as sample data, it is input to big data analysis
It is trained in training pattern, until can accurately export the sample data pair after inputting trained sample data
Until the age of the user answered and gender, it can complete to train.
Correspondingly, big data analysis training pattern this moment is just needed big data analysis model.
In addition, in practical applications, the machine learning algorithm of selection, preferably convolutional neural networks algorithm.
Due to convolutional neural networks algorithm and more mature, in the concrete realization, those skilled in the art can from
Row checks the related data of convolutional neural networks algorithm, and details are not described herein again.
Such as when the corpus product query demand that the user provides is only " novel " two words, if using big
After data analysis technique analyzes the biological information of the user, determine that the user is an age on 30 years old left side
Right women.
In addition, recording according to the historical query of the user got, it is found that the user often inquires fantasy type
Animation novel.
Thus, according to above- mentioned information it was determined that it is a suitable 30 years old left side that the user, which needs the corpus product inquired,
The animation novel of the fantasy type of right female reading.
Correspondingly, determining characteristic information may is that 30 years old, women, fantasy, animation, novel.
It should be noted that having the above is only for example, not constituting any restriction to technical solution of the present invention
During body is realized, those skilled in the art, which can according to need, to be configured, and details are not described herein again.
By foregoing description it is not difficult to find that the recommended method of corpus product provided in this embodiment, according to the feature
Information meets in the characteristic information before the corpus information of any feature from searching in corpus, by detecting the feature
Whether information includes the mark corpus product generic, the mark corpus Product Language format, the mark corpus
The feature of product Multimedia Style, and then determination is according to the lookup behaviour for most starting determining characteristic information progress corpus information
Make, or reacquire parameter information and determine features described above, then in the search operation for carrying out corpus information, thus effective guarantee
The accuracy of characteristic information for carrying out corpus information lookup, enables the subsequent obtained corpus product to be more in line with use
Family actual demand.
In addition, the embodiment of the present invention also proposes a kind of storage medium, pushing away for corpus product is stored on the storage medium
Program is recommended, the recommended program of the corpus product realizes the recommended method of corpus product as described above when being executed by processor
The step of.
It is the structural block diagram of the recommendation apparatus first embodiment of corpus product of the present invention referring to Fig. 5, Fig. 5.
As shown in figure 5, the recommendation apparatus for the corpus product that the embodiment of the present invention proposes includes: to obtain module 5001, extract
Module 5002, determining module 5003, searching module 5004 and generation module 5005.
Wherein, the acquisition module 5001, for receiving the corpus product inquiry request of user's triggering, according to the corpus
Product inquiry request obtains the corpus product query demand that the user provides;Extraction module 5002, for being produced to the corpus
Product query demand carries out keyword extraction processing, obtains N number of keyword, and N is the integer more than or equal to 1;The determining module
5003, for determining the corresponding characteristic information of corpus product that the user needs according to N number of keyword;The lookup
Module 5004, for meeting the corpus of any feature in the characteristic information from lookup in corpus according to the characteristic information
Information;The generation module 5005 is used for according to the characteristic information and the corresponding feature of each corpus information, to each corpus information
It is handled, obtains the corpus product for having whole features in the characteristic information, the corpus product is pushed to the use
Family.
The extraction module 5002 is extracting keyword from the corpus product inquiry request in order to facilitate understanding
Operation, is given below a kind of concrete implementation mode, approximately as:
Firstly, carrying out participle and part-of-speech tagging processing to the corpus product query demand, M word is obtained;
Then, according to preset part of speech weight distribution standard, the weighted value of each word in the M word is calculated;
Finally, being traversed to N number of word, the weighted value of the current word traversed and preset weight threshold are carried out
Compare, filters out the word that weighted value is greater than the weight threshold, obtain N number of keyword.
It should be understood that M should be the integer less than or equal to N in practical applications.
In addition, it is noted that due in practical applications, corpus product query demand that the user provides can be because
The corresponding operation button of corpus product inquiry request that it is triggered is different, format and it is different, therefore in order to guarantee described mention
Modulus block 5002 smoothly can carry out participle to the corpus product query demand and part-of-speech tagging comes out, and obtains M word, institute
Extraction module 5002 is stated before executing aforesaid operations, is also used to: determining the format of the corpus product query demand.
Correspondingly, however, it is determined that the corpus product query demand is phonetic matrix, then speech recognition technology is utilized, by voice
The corpus product query demand of format is converted to the corpus product query demand of text formatting;If it is determined that the corpus product inquiry
Demand is picture format, then utilizes optical character recognition technology, the corpus product query demand of picture format is converted to text
The corpus product query demand of format.
It is correspondingly, above-mentioned in order to facilitate understanding that participle and part-of-speech tagging processing are carried out to the corpus product query demand,
The operation of M word is obtained, the present embodiment provides a kind of concrete implementation mode, approximately as:
According to the punctuation mark in the corpus product query demand of the text formatting, the corpus of the text formatting is produced
Product query demand carries out subordinate sentence, obtains sentence to be segmented;
Maximum reverse matching cutting is carried out to the sentence to be segmented, determines the M word according to Custom Dictionaries;
According to preset part of speech standard information, part-of-speech tagging is carried out to the M word.
It should be understood that being given above only a kind of concrete implementation mode, not to technical solution of the present invention
Any restriction is constituted, in practical applications, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
In addition, the generation module 5005 generates the operation for the corpus product that user needs in order to facilitate understanding, give below
A kind of concrete implementation mode out, approximately as:
Firstly, filtering out according to the corresponding feature of each corpus information and having the most corpus information of feature, which is believed
Breath is used as initial corpus product;
Then, it according to the characteristic information and the corresponding feature of the initial corpus product, determines to integration characteristic;
Then, it is extracted from the corpus information in addition to the initial corpus product described to the corresponding corpus of integration characteristic
Information;
Finally, the corpus information extracted and the initial corpus product are combined, obtain having the feature letter
The corpus product of whole features in breath.
It should be understood that being given above only a kind of concrete implementation mode, not to technical solution of the present invention
Any restriction is constituted, in a particular application, those skilled in the art, which can according to need, to be configured, and the present invention does not do this
Limitation.
By foregoing description it is not difficult to find that the recommendation apparatus of corpus product provided in this embodiment, by being triggered from user
Corpus product inquiry request in extract the corpus product query demand that user provides, and then according to from corpus product query demand
In N number of keyword for extracting determine the corresponding characteristic information of corpus product that user needs, then according to determining feature
Information, from the corpus information for meeting any feature in determining characteristic information is found out in corpus, finally according to determining feature
Information feature corresponding with each corpus information inquired, handles the corpus inquired, can obtain having above-mentioned spy
The corpus product of reference breath, so that the corpus information that finishing screen is selected is to meet the corpus information of user's actual need, into
And substantially increase the recommendation accuracy rate of corpus product.
In addition, it is noted that in practical applications, the language that is generated according to the corpus product query demand that user provides
Material product may need to charge, thus in the corpus product recommended for user, it can first judge that the corpus produces
Whether product, which need, is charged.
Correspondingly, however, it is determined that the corpus product does not need to charge, then the corpus product is directly pushed to the use
Family;If it is determined that institute's corpus product needs to charge, then notics of charge first can be issued to the user, then monitor what user made
Feedback is first deducted from the payment account of the user preset if receiving the instruction that the agreement that the user makes is deducted fees
The expense that the corpus product needs, is then pushed to the user for the corpus product.
By aforesaid operations mode, allow user according to the actual situation, it is determined whether needing to pay obtains institute's predicate
Material product also greatly improves user experience while guaranteeing corpus Products Show accuracy rate.
Further, in order to preferably promote user experience, when the instruction of user feedback is to disagree to deduct fees, in order to the greatest extent
The possible user volume kept using corpus, avoids customer churn, can recommend corpus product described in Free Acquisition to user
Mode, such as by the relevant information of corpus share to predetermined number chat group or invite the new user of predetermined number
Deng to be to avoid the loss of user, and can achieve the popularization to corpus.
In addition, for the corpus information in better maintenance and management corpus, so that according to the language in corpus
Material information synthesis corpus finished product can preferably be bonded user demand, by the corpus product be pushed to the user it
Afterwards, the feedback information that the user submits can also be further received, and then according to the feedback information in the corpus
Corpus information carry out maintenance and management.
It should be noted that workflow described above is only schematical, not to protection model of the invention
Enclose composition limit, in practical applications, those skilled in the art can select according to the actual needs part therein or
It all achieves the purpose of the solution of this embodiment, herein with no restrictions.
In addition, the not technical detail of detailed description in the present embodiment, reference can be made to provided by any embodiment of the invention
The recommended method of corpus product, details are not described herein again.
The first embodiment of recommendation apparatus based on above-mentioned corpus product proposes the recommendation apparatus the of corpus product of the present invention
Two embodiments.
In the present embodiment, the recommendation apparatus of the corpus product further include: detection module.
Wherein, whether the detection module includes the mark corpus product institute for detecting in the characteristic information
Belong to the feature of classification, the mark corpus Product Language format, the mark corpus product Multimedia Style.
Correspondingly, if by detection, determine in the characteristic information to include the mark corpus product generic, mark
The feature for knowing the corpus Product Language format, the mark corpus product Multimedia Style, then trigger the searching module and hold
Row meets the operation of the corpus information of any feature in the characteristic information according to the characteristic information, from lookup in corpus.
Otherwise (do not include any of the above-described feature or any several features), then trigger the searching module and execute following step
It is rapid:
Firstly, obtaining the historical query record of the user in predetermined period;
Then, using big data analysis technology, historical query record is analyzed, determines that the user's is current
The query demand at moment;
Then, it using the query demand at the current time as the first element, is wanted using N number of keyword as second
Element;
Finally, determining according to the first element and second element and identifying the corpus product generic, mark
The feature of the corpus Product Language format, the mark corpus product Multimedia Style.
It should be understood that having the above is only for example, not constituting any restriction to technical solution of the present invention
In body application, those skilled in the art, which can according to need, to be configured, and the present invention is without limitation.
By foregoing description it is not difficult to find that the recommendation apparatus of corpus product provided in this embodiment, according to the feature
Information meets in the characteristic information before the corpus information of any feature from searching in corpus, by detecting the feature
Whether information includes the mark corpus product generic, the mark corpus Product Language format, the mark corpus
The feature of product Multimedia Style, and then determination is according to the lookup behaviour for most starting determining characteristic information progress corpus information
Make, or reacquire parameter information and determine features described above, then in the search operation for carrying out corpus information, thus effective guarantee
The accuracy of characteristic information for carrying out corpus information lookup, enables the subsequent obtained corpus product to be more in line with use
Family actual demand.
It should be noted that workflow described above is only schematical, not to protection model of the invention
Enclose composition limit, in practical applications, those skilled in the art can select according to the actual needs part therein or
It all achieves the purpose of the solution of this embodiment, herein with no restrictions.
In addition, the not technical detail of detailed description in the present embodiment, reference can be made to provided by any embodiment of the invention
The recommended method of corpus product, details are not described herein again.
In addition, it should be noted that, herein, the terms "include", "comprise" or its any other variant are intended to contain
Lid non-exclusive inclusion, so that process, method, article or system including a series of elements are not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or system
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or system including the element.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as read-only memory (Read Only Memory, ROM)/RAM, magnetic disk, CD), including some instructions are used so that one
Terminal device (can be mobile phone, computer, server or the network equipment etc.) executes side described in each embodiment of the present invention
Method.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of recommended method of corpus product, which is characterized in that the described method includes:
The corpus product inquiry request for receiving user's triggering obtains what the user provided according to the corpus product inquiry request
Corpus product query demand;
Keyword extraction processing is carried out to the corpus product query demand, obtains N number of keyword, N is whole more than or equal to 1
Number;
According to N number of keyword, the corresponding characteristic information of corpus product that the user needs is determined;
According to the characteristic information, meet the corpus information of any feature in the characteristic information from lookup in corpus;
According to the characteristic information and the corresponding feature of each corpus information, each corpus information is handled, obtains having described
The corpus product of whole features, is pushed to the user for the corpus product in characteristic information.
2. the method as described in claim 1, which is characterized in that described to be mentioned to corpus product query demand progress keyword
The step of taking processing, obtaining N number of keyword, comprising:
Participle and part-of-speech tagging processing are carried out to the corpus product query demand, obtain M word, M is whole less than or equal to N
Number;
According to preset part of speech weight distribution standard, the weighted value of each word in the M word is calculated;
N number of word is traversed, the weighted value of the current word traversed is compared with preset weight threshold, is filtered
Weighted value is greater than the word of the weight threshold out, obtains N number of keyword.
3. method according to claim 2, which is characterized in that the corpus product query demand carries out participle and part-of-speech tagging
Before the step of handling, obtaining M word, the method also includes:
Determine the format of the corpus product query demand;
If the corpus product query demand is phonetic matrix, speech recognition technology is utilized, by the corpus product of phonetic matrix
Query demand is converted to the corpus product query demand of text formatting;
If the corpus product query demand is picture format, optical character recognition technology is utilized, by the corpus of picture format
Product query demand is converted to the corpus product query demand of text formatting;
Wherein, described the step of participle and part-of-speech tagging processing are carried out to the corpus product query demand, obtain M word, packet
It includes:
According to the punctuation mark in the corpus product query demand of the text formatting, the corpus product of the text formatting is looked into
Inquiry demand carries out subordinate sentence, obtains sentence to be segmented;
Maximum reverse matching cutting is carried out to the sentence to be segmented, determines the M word according to Custom Dictionaries;
According to preset part of speech standard information, part-of-speech tagging is carried out to the M word.
4. method according to claim 2, which is characterized in that it is described according to the characteristic information, symbol is searched from corpus
Before the step of closing the corpus information of any feature in the characteristic information, the method also includes:
Whether detect in the characteristic information includes the mark corpus product generic, the mark corpus Product Language
The feature of format, the mark corpus product Multimedia Style;
If include in the characteristic information mark corpus product generic, the mark corpus Product Language format,
The feature for identifying the corpus product Multimedia Style, thens follow the steps: according to the characteristic information, symbol is searched from corpus
Close the operation of the corpus information of any feature in the characteristic information;
Otherwise, it thens follow the steps:
Obtain the historical query record of the user in predetermined period;
Using big data analysis technology, historical query record is analyzed, determines looking into for the current time of the user
Inquiry demand;
Using the query demand at the current time as the first element, using N number of keyword as the second element;
According to the first element and second element, determines and identify the corpus product generic, the mark corpus
The feature of Product Language format, the mark corpus product Multimedia Style.
5. such as the described in any item methods of Claims 1-4, which is characterized in that described according to the characteristic information and each corpus
The corresponding feature of information, handles each corpus information, obtains the corpus product for having whole features in the characteristic information
The step of, comprising:
It according to the corresponding feature of each corpus information, filters out and has the most corpus information of feature, using the corpus information as just
Beginning corpus product;
According to the characteristic information and the corresponding feature of the initial corpus product, determine to integration characteristic;
It is extracted from the corpus information in addition to the initial corpus product described to the corresponding corpus information of integration characteristic;
The corpus information extracted and the initial corpus product are combined, obtain having all special in the characteristic information
The corpus product of sign.
6. such as the described in any item methods of Claims 1-4, which is characterized in that it is described the corpus product is pushed to it is described
Before the step of user, the method also includes:
Judge whether the corpus product needs to charge;
If the corpus product does not need to charge, then follow the steps: the corpus product is pushed to the operation of the user;
If the corpus product needs to charge, notics of charge is issued to the user, and receiving what the user made
After agreeing to the instruction deducted fees, the expense that the corpus product needs is deducted from the payment account of the user preset, it will be described
Corpus product is pushed to the user.
7. such as the described in any item methods of Claims 1-4, which is characterized in that it is described the corpus product is pushed to it is described
After the step of user, the method also includes:
The feedback information that the user submits is received, the corpus information in the corpus is tieed up according to the feedback information
Shield.
8. a kind of recommendation apparatus of corpus product, which is characterized in that described device includes:
Module is obtained, for receiving the corpus product inquiry request of user's triggering, is obtained according to the corpus product inquiry request
The corpus product query demand that the user provides;
Extraction module obtains N number of keyword, N is big for carrying out keyword extraction processing to the corpus product query demand
In the integer for being equal to 1;
Determining module, for determining the corresponding characteristic information of corpus product that the user needs according to N number of keyword;
Searching module, for meeting any feature in the characteristic information from searching in corpus according to the characteristic information
Corpus information;
Generation module, for handling each corpus information according to the characteristic information and the corresponding feature of each corpus information,
The corpus product for having whole features in the characteristic information is obtained, the corpus product is pushed to the user.
9. a kind of recommendation apparatus of corpus product, which is characterized in that the equipment includes: memory, processor and is stored in institute
The recommended program for the corpus product that can be run on memory and on the processor is stated, the recommended program of the corpus product is matched
The step of being set to the recommended method for realizing the corpus product as described in any one of claims 1 to 7.
10. a kind of storage medium, which is characterized in that be stored with the recommended program of corpus product, institute's predicate on the storage medium
The recommended program of material product realizes the recommendation side of corpus product as described in any one of claim 1 to 7 when being executed by processor
The step of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910433178.8A CN110297880B (en) | 2019-05-21 | 2019-05-21 | Corpus product recommendation method, apparatus, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910433178.8A CN110297880B (en) | 2019-05-21 | 2019-05-21 | Corpus product recommendation method, apparatus, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110297880A true CN110297880A (en) | 2019-10-01 |
CN110297880B CN110297880B (en) | 2023-04-18 |
Family
ID=68027101
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910433178.8A Active CN110297880B (en) | 2019-05-21 | 2019-05-21 | Corpus product recommendation method, apparatus, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110297880B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110968800A (en) * | 2019-11-26 | 2020-04-07 | 北京明略软件系统有限公司 | Information recommendation method and device, electronic equipment and readable storage medium |
CN111209363A (en) * | 2019-12-25 | 2020-05-29 | 华为技术有限公司 | Corpus data processing method, apparatus, server and storage medium |
CN112183089A (en) * | 2020-09-25 | 2021-01-05 | 中国建设银行股份有限公司 | Corpus analysis method and device, electronic equipment and storage medium |
CN113111155A (en) * | 2020-01-10 | 2021-07-13 | 阿里巴巴集团控股有限公司 | Information display method, device, equipment and storage medium |
CN114385781A (en) * | 2021-11-30 | 2022-04-22 | 北京凯睿数加科技有限公司 | Interface file recommendation method, device, equipment and medium based on statement model |
CN117708308A (en) * | 2024-02-06 | 2024-03-15 | 四川蓉城蕾茗科技有限公司 | RAG natural language intelligent knowledge base management method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0756933A (en) * | 1993-06-24 | 1995-03-03 | Xerox Corp | Method for retrieval of document |
US20070255565A1 (en) * | 2006-04-10 | 2007-11-01 | Microsoft Corporation | Clickable snippets in audio/video search results |
US20100005094A1 (en) * | 2002-10-17 | 2010-01-07 | Poltorak Alexander I | Apparatus and method for analyzing patent claim validity |
CN103530385A (en) * | 2013-10-18 | 2014-01-22 | 北京奇虎科技有限公司 | Method and device for searching for information based on vertical searching channels |
CN107391690A (en) * | 2017-07-25 | 2017-11-24 | 李小明 | A kind of method for handling documentation & info |
CN109325182A (en) * | 2018-10-12 | 2019-02-12 | 平安科技(深圳)有限公司 | Dialogue-based information-pushing method, device, computer equipment and storage medium |
WO2019049089A1 (en) * | 2017-09-11 | 2019-03-14 | Indian Institute Of Technology, Delhi | Method, system and apparatus for multilingual and multimodal keyword search in a mixlingual speech corpus |
-
2019
- 2019-05-21 CN CN201910433178.8A patent/CN110297880B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0756933A (en) * | 1993-06-24 | 1995-03-03 | Xerox Corp | Method for retrieval of document |
US20100005094A1 (en) * | 2002-10-17 | 2010-01-07 | Poltorak Alexander I | Apparatus and method for analyzing patent claim validity |
US20070255565A1 (en) * | 2006-04-10 | 2007-11-01 | Microsoft Corporation | Clickable snippets in audio/video search results |
CN103530385A (en) * | 2013-10-18 | 2014-01-22 | 北京奇虎科技有限公司 | Method and device for searching for information based on vertical searching channels |
CN107391690A (en) * | 2017-07-25 | 2017-11-24 | 李小明 | A kind of method for handling documentation & info |
WO2019049089A1 (en) * | 2017-09-11 | 2019-03-14 | Indian Institute Of Technology, Delhi | Method, system and apparatus for multilingual and multimodal keyword search in a mixlingual speech corpus |
CN109325182A (en) * | 2018-10-12 | 2019-02-12 | 平安科技(深圳)有限公司 | Dialogue-based information-pushing method, device, computer equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
王桂华等: "一种建立在对客户端浏览历史进行LDA建模基础上的个性化查询推荐算法", 《四川大学学报(自然科学版)》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110968800A (en) * | 2019-11-26 | 2020-04-07 | 北京明略软件系统有限公司 | Information recommendation method and device, electronic equipment and readable storage medium |
CN110968800B (en) * | 2019-11-26 | 2023-05-02 | 北京明略软件系统有限公司 | Information recommendation method and device, electronic equipment and readable storage medium |
CN111209363A (en) * | 2019-12-25 | 2020-05-29 | 华为技术有限公司 | Corpus data processing method, apparatus, server and storage medium |
CN111209363B (en) * | 2019-12-25 | 2024-02-09 | 华为技术有限公司 | Corpus data processing method, corpus data processing device, server and storage medium |
CN113111155A (en) * | 2020-01-10 | 2021-07-13 | 阿里巴巴集团控股有限公司 | Information display method, device, equipment and storage medium |
CN113111155B (en) * | 2020-01-10 | 2024-04-19 | 阿里巴巴集团控股有限公司 | Information display method, device, equipment and storage medium |
CN112183089A (en) * | 2020-09-25 | 2021-01-05 | 中国建设银行股份有限公司 | Corpus analysis method and device, electronic equipment and storage medium |
CN114385781A (en) * | 2021-11-30 | 2022-04-22 | 北京凯睿数加科技有限公司 | Interface file recommendation method, device, equipment and medium based on statement model |
CN114385781B (en) * | 2021-11-30 | 2022-09-27 | 南京数睿数据科技有限公司 | Interface file recommendation method, device, equipment and medium based on statement model |
CN117708308A (en) * | 2024-02-06 | 2024-03-15 | 四川蓉城蕾茗科技有限公司 | RAG natural language intelligent knowledge base management method and system |
CN117708308B (en) * | 2024-02-06 | 2024-05-14 | 四川蓉城蕾茗科技有限公司 | RAG natural language intelligent knowledge base management method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110297880B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110297880A (en) | Recommended method, device, equipment and the storage medium of corpus product | |
US11720572B2 (en) | Method and system for content recommendation | |
CN108304375B (en) | Information identification method and equipment, storage medium and terminal thereof | |
EP3534272A1 (en) | Natural language question answering systems | |
CN103914548B (en) | Information search method and device | |
US20150074112A1 (en) | Multimedia Question Answering System and Method | |
CN111324771B (en) | Video tag determination method and device, electronic equipment and storage medium | |
CN110297988A (en) | Hot topic detection method based on weighting LDA and improvement Single-Pass clustering algorithm | |
WO2020233386A1 (en) | Intelligent question-answering method and device employing aiml, computer apparatus, and storage medium | |
CN109960756A (en) | Media event information inductive method | |
CN109255012B (en) | Method and device for machine reading understanding and candidate data set size reduction | |
KR20070087398A (en) | Method and system for classfying music theme using title of music | |
CN111414763A (en) | Semantic disambiguation method, device, equipment and storage device for sign language calculation | |
Chien et al. | Topic-based hierarchical segmentation | |
Dinarelli et al. | Discriminative reranking for spoken language understanding | |
CN112861990A (en) | Topic clustering method and device based on keywords and entities and computer-readable storage medium | |
CN109508441A (en) | Data analysing method, device and electronic equipment | |
CN112036178A (en) | Distribution network entity related semantic search method | |
CN109147793A (en) | The processing method of voice data, apparatus and system | |
CN114443847A (en) | Text classification method, text processing method, text classification device, text processing device, computer equipment and storage medium | |
CN110188189A (en) | A kind of method that Knowledge based engineering adaptive event index cognitive model extracts documentation summary | |
CN113901173A (en) | Retrieval method, retrieval device, electronic equipment and computer storage medium | |
US20220365956A1 (en) | Method and apparatus for generating patent summary information, and electronic device and medium | |
CN112562684A (en) | Voice recognition method and device and electronic equipment | |
CN108345694B (en) | Document retrieval method and system based on theme database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |