Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
For the relevant technologies in the limited situation of information content that question text includes, it will lead to and be unable to get accurately effectively
Recommendation the problem of, the embodiment of the invention provides a kind of data processing method, this method may include: to include in problem
In the case that content of text and non-textual content and the content of text meet meaningless condition, answered from described problem is corresponding
Core word is determined in case;And according to the core word, determine the corresponding recommendation of described problem.
The embodiment of the present invention for simultaneously including content of text and non-textual content and the content of text meet it is meaningless
A kind of problem (abbreviation first problem) of condition, determines core word from the corresponding answer of first problem, and by the core word
Determination basis as the corresponding recommendation of first problem.
In the embodiment of the present invention, the content of text that first problem includes meets meaningless condition, can refer to first problem packet
The content of text included is without practical significance.
On the one hand, the content of text that first problem includes meets meaningless condition, will lead to the text that first problem includes
Content can not be directed toward accurate effective recommendation.For example, the content of text for meeting meaningless condition may include: that " this is assorted
", " whom this is " etc..The meaning of the vocabulary such as " what ", " who " is limited in content of text due to meeting meaningless condition,
Accurate effective recommendation can not be directed toward.
On the other hand, due to meeting the content of text of meaningless condition without practical significance, therefore mostly just according to meeting nothing
The content of text of meaning condition cannot obtain effective answer.For example, only only in accordance with the texts such as " what this is ", " whom this is "
Content is not typically available effective answer.
The first problem of the embodiment of the present invention includes simultaneously content of text and non-textual content, due to content of text and Fei Wen
This content can characterize richer problem intent information, therefore can provide more favorable foundation for the answer of problem, therefore can
The probability of effective answer is obtained to increase, namely the validity of answer can be improved.Non-textual content may include: in picture
Appearance, and/or video content etc..
In a kind of example 1 of the invention, the content of text of problem A is " whom this is ", and the non-textual content of problem A is
The problem of " personage's picture ", then the content of text of problem A and non-textual content characterize, intent information can be " whom personage is ", this
So that " title of personage " will be will include in effective answer of problem A.
The first problem that the processing of the embodiment of the present invention is targeted includes simultaneously content of text and non-textual content, Ke Yiti
The validity of high answer.On this basis, the embodiment of the present invention determines core word from the corresponding answer of first problem, and according to
According to the core word, the corresponding recommendation of described problem is determined;Since core word can be used for characterizing the weight of compression expression answer
Want the word of information and core content, core word is originated from the higher answer of validity, therefore can be improved recommendation and problem it
Between the degree of correlation, therefore the accuracy and validity of recommendation can be improved.
In example 1, the content of text of problem A is " whom this is ", and the non-textual content of problem A is " personage's picture ", false
" title of personage " will be will include in the answer of rhetoric question topic A, then the embodiment of the present invention can will include in the answer of problem A
" title of personage " is used as core word, and the keyword by " title of personage " as recommendation, it is possible thereby to improve recommendation
The accuracy and validity of content.
The applicable scene of the embodiment of the present invention may include: question and answer scene etc..In question and answer scene, the first user is by problem
Publication comes out, and second user can see the problem and be answered, and the first user can select satisfied from all answers
Answer.
In question and answer scene, Questions page can refer to the corresponding page of problem, can be used for describing the relevant information of problem.
Above-mentioned relevant information may include: problem, answer, recommendation etc..Optionally, Questions page can be jumped by the link of problem and
Come, that is, Questions page can be jumped in response to the trigger action of the link for problem.
Referring to Fig.1, a kind of signal of Questions page 100 of the embodiment of the present invention is shown, which can wrap
It includes: problem area 101, answer region 102 and recommendation region 103.
Wherein, problem area 101 may include: the content of problem.The content of problem may include: the content of text of problem
With non-textual content.It may include: non-textual content or non-textual link in problem area 101.Non-textual content can be with
It include: image content, and/or video content etc..Certainly, non-textual content can also include: audio content etc., and the present invention is real
It is without restriction for the concrete type of non-textual content to apply example.
Answer region 102 may include: N number of answer of problem, and N can be natural number.The content of answer may include: to answer
The content of text of case or the non-textual content of answer.
Recommending region 103 may include: the corresponding recommendation of problem.The type of recommendation may include: that correlation is asked
Topic, related encyclopaedia or associated guideline etc..Recommendation can be related to problem, therefore can satisfy the demand of user.
Such as: the problem of Questions page are as follows: " what if is cold cough? ", then it is the correlation that user recommends in Questions page
Problem may include: " what if is flu? ", " cold cough have a running nose what if? ", " what if is child's cold cough? ", etc.
Deng.
Such as: the problem of Questions page are as follows: " what if is cold cough? ", then it is the correlation that user recommends in Questions page
Encyclopaedia may include: " flu encyclopaedia " etc., alternatively, being the associated guideline that user recommends may include: " how to control in Questions page
Treat flu ", " table teach you differentiate be common cold or influenza " etc..
Above-mentioned recommendation can expand knowledge a little, on the basis of the answer of problem to promote user for question and answer system
The loyalty of system;Also, in the case where user clicks recommendation, the flow of question answering system can also be increased.
It is appreciated that page layout shown in FIG. 1 is intended only as alternative embodiment, in fact, those skilled in the art can
To determine the page layout of Questions page according to practical application request, for example, recommending region 103 can be with position in Questions page
In the lower area etc. of Questions page, it will be understood that the embodiment of the present invention is without restriction to the specifically layout of Questions page.
Data processing method provided in an embodiment of the present invention can be applied in client application environment corresponding with server,
Client and server are located in wired or wireless network, and by the wired or wireless network, client is counted with server
According to interaction.
Optionally, client 100 may operate in equipment, for example, client 100 can be the APP run in equipment,
Such as question and answer APP, input method APP or operating system included APP, the embodiment of the present invention is for tool corresponding to client
Body APP is without restriction.
Optionally, above equipment may include screen, and above-mentioned screen can be used for showing that content, above content may include:
The page of question answering system, such as Questions page.Above equipment can specifically include but be not limited to: smart phone, tablet computer, electricity
Philosophical works reader, MP3 (dynamic image expert's compression standard audio level 3, Moving Picture Experts Group
Audio Layer III) player, MP4 (dynamic image expert's compression standard audio level 4, Moving Picture
Experts Group Audio Layer IV) player, pocket computer on knee, vehicle-mounted computer, desktop computer, machine top
Box, intelligent TV set, wearable device, intelligent sound etc..It is appreciated that specific equipment is not added in the embodiment of the present invention
With limitation.
Embodiment of the method
Referring to Fig. 2, a kind of step flow chart of data processing method embodiment of the invention is shown, can specifically include
Following steps:
Step 201 includes content of text and non-textual content in problem and the content of text meets meaningless condition
In the case of, core word is determined from the corresponding answer of described problem;
Step 202, according to the core word, determine the corresponding recommendation of described problem.
At least one step of embodiment illustrated in fig. 2 can be by server and/or client executing, certain embodiment of the present invention
It is without restriction for the specific executing subject of each step, the embodiment of the present invention mainly by server be executing subject for,
Data processing method is illustrated, the corresponding data processing method of other executing subjects is cross-referenced.
In step 201, the problem of server can preserve problem set, and problem set may include question answering system.
Optionally, server can screen the problems in problem set, to obtain first problem, first problem tool
Body can satisfy following condition: including content of text and non-textual content and the content of text meets meaningless condition.
In an alternative embodiment of the invention, the process for determining first problem may include:
Step S1, whether decision problem includes content of text and non-textual content, if so, thening follow the steps S2;
Step S2, whether the content of text of decision problem meets meaningless condition, if so, can be using problem as first
Problem.
It is appreciated that step S1 and step S2 can be executed sequentially, alternatively, step S2 and step S1 can be executed sequentially,
The embodiment of the present invention is without restriction for the specific execution sequence of step S1 and step S2.
In the embodiment of the present invention, the content of text that first problem includes meets meaningless condition, can refer to first problem packet
The content of text included is without practical significance.
Those skilled in the art can determine above-mentioned meaningless condition according to practical application request.
In an alternative embodiment of the invention, above-mentioned meaningless condition can specifically include: the first meaningless condition,
First meaningless condition is specifically as follows: the corresponding first language structure feature of the content of text and presetting meaningless text pair
The second language structure feature answered matches.
Lingual structure feature can be used for characterizing the corresponding language construction of text.Above-mentioned language construction may include: syntax knot
Structure, semantic structure etc..Syntactic structure can specifically include: interdependent syntactic structure.Dependency grammar by metalanguage unit at
/ dependence disclose its syntactic structure;It is " Subject, Predicate and Object " in interdependent syntactic analysis identification sentence, " fixed for intuitive
Shape is mended " these grammatical items, and analyze the relationship between each ingredient.Semantic structure may include: interdependent semantic structure.Semanteme according to
The semantic association between the analysis each linguistic unit of parsing sentence is deposited, and semantic association is presented with dependency structure.
The embodiment of the present invention presets meaningless text and can refer to meaningless text that collection obtains, meeting meaningless condition
This, such as " whom this is ", " what this is ", " what dog this is ", " where this is ".
For meeting the text of meaningless condition, there is language mechanism characteristics rule can say.Therefore the present invention is real
Example is applied by presetting the corresponding second language structure feature of meaningless text, characterizes the rule that the text of meaningless condition has;
And the corresponding first language structure feature of content of text second language structure feature corresponding with meaningless text is preset is carried out
Matching, if successful match, namely preset the second language knot for existing in meaningless text and matching with first language structure feature
Structure feature then illustrates that text content meets meaningless condition.
By taking lingual structure feature is interdependent syntactic structure feature as an example, first language structure feature may include: the first one-tenth
Point and first composition between the first relationship, second language structure feature may include: second composition and second composition
Between the second relationship, then first composition can be matched with second composition and by the first relationship and the second relationship into
Row matching, with judge first language structure feature and second language structure feature whether successful match.
For example, second language structure feature includes: " what dog this is " corresponding " Subject, Predicate and Object " structure, first language structure
Feature includes " what animal this is " corresponding " Subject, Predicate and Object " structure, then first language structure feature and second language structure feature
Successful match.
In an alternative embodiment of the invention, above-mentioned meaningless condition can specifically include: the second meaningless condition,
Second meaningless condition is specifically as follows: the entity word of presetting granularity being not present in the content of text.
In the embodiment of the present invention, entity is a specific things or concept.Entity generally understands classified types, such as figure kind
Entity, film class entity, animal class entity, history class entity etc..The same entity can correspond to multiple entity instances, and entity is real
Example can be to the descriptive page (content) of an entity in network (or other media), such as wrap in the page of encyclopaedia
Containing the corresponding entity instance of entity.
Optionally, entity may include: name entity (named entity), name entity can refer to name, mechanism name,
Place name and other all entities with entitled mark.Widely name entity can also include: title, song title, shadow
Depending on acute name, ProductName, brand name, number, date, currency, address etc..
In practical applications, be intended to the problem of user often with each neck such as film, animal, history, military affairs, amusement, fashion
Entity is related in domain, therefore the embodiment of the present invention can incite somebody to action one of " whether there is entity word in content of text " as meaningless condition
A factor.
The covering scope of entity is relatively broad, therefore the granularity of entity word may include: coarseness and fine granularity.With " animal
For class entity ", according to granularity from coarse to fine may include: " animal " -> " canid " -> " dog " -> " fierce dog " ->
" Caucasia shepherd dog " etc..
In view of the accuracy of the directive property of the entity word of coarseness is lower, therefore the embodiment of the present invention can be by presetting grain
Degree is to characterize meaningless condition.For example, it is assumed that thinking that the granularity of " dog " is thicker, then presetting granularity can be determined are as follows: fineness ratio
" dog " thinner granularity, such as " fierce dog ", " Caucasia shepherd dog ", " dog subfamily dog ", " Tai Di ".
In presetting granularity are as follows: in the case where the thinner granularity of fineness ratio " dog ", " what dog this is " there is no more than " dog "
The entity word of thin granularity, it is therefore contemplated that " what dog this is " meets meaningless condition.
It is appreciated that the characteristics of those skilled in the art can be according to field, determines the corresponding presetting granularity in field.Example
Such as, entity according to granularity from coarse to fine may include: " clothes " -> " skirt " -> " one-piece dress " and " half body in fashion world
" skirt " then can be determined as presetting granularity by skirt " etc..It is appreciated that the embodiment of the present invention is for the specific true of presetting granularity
Determine mode and specific presetting granularity is without restriction.
It in an embodiment of the present invention, can be using NER (name Entity recognition, Named Entity
Recognition) method determines the entity in the content of text.
According to a kind of embodiment, NER method may include: the method based on dictionary.Method based on dictionary can basis
The frequency that phrase occurs constructs entity library to high frequency words, is directly identified as reality for the word that can be retrieved in entity library
Body.Wherein, phrase can refer to two or more contaminations.In practical applications, entity dependency number can be grabbed from internet
According to, and entity related data is analyzed, to obtain corresponding entity word, and the entity word is stored to entity library, this hair
Bright embodiment is without restriction for specific entity word and its acquisition modes.
According to another embodiment, NER method may include: rule-based method.Rule-based method can root
According to the composition rule of phrase, the phrase that respective rule is met in request is labeled as entity.
According to another embodiment, NER method may include: the method based on statistical learning.Side based on statistical learning
Method will name Entity recognition to regard a classification problem as, using similar SVM (support vector machines, Support Vector
Machine), the classification methods such as Bayes;Alternatively, regard name Entity recognition as a sequence labelling problem, it is (hidden using HMM
Markov model, Hidden Markov Model), maximum entropy model (Maximum Entropy Model), CRF (condition
Random field, conditional random field algorithm), LSTM (shot and long term memory network, LongShort-Term
Memory) the sequence labellings model such as model.
In practical applications, those skilled in the art can according to practical application request, using the first meaningless condition and
Any or combination in second meaningless condition, it will be understood that specific meaningless condition is not added in the embodiment of the present invention
With limitation.
In step 201, in problem include content of text and non-textual content and the content of text meets meaningless condition
In the case where, it is believed that: the content of text that problem includes can not be directed toward accurate effective recommendation and content of text and
Non-textual content can characterize richer problem intent information, therefore having for answer can be improved in content of text and non-textual content
Effect property.Therefore, the embodiment of the present invention can consider comprising content relevant to problem in the corresponding answer of problem, and core word can
For characterizing the important information of compression expression urtext and the word of core content, therefore determined from the corresponding answer of problem
Core word out, to characterize content relevant to problem by core word.
The embodiment of the present invention can provide the following technical solution that core word is determined from the corresponding answer of described problem:
Technical solution 1
In technical solution 1, the step 201 includes content of text and non-textual content and the content of text in problem
In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, can specifically include: in problem
It is corresponding from described problem including content of text and non-textual content and in the case that the content of text meets meaningless condition
Answer in determine target answer;Core word is determined from the target answer.
Target answer can be used for characterizing the higher answer of quality or with the higher answer of problem correlation, it is possible thereby to
Low-quality or invalid answer is excluded, it is possible thereby to improve the quality of core word.
In an alternative embodiment of the invention, the feature of the target answer may include in following feature at least
It is a kind of:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
The enquirement user of problem can refer to the first user of RELEASE PROBLEM.If an answer is adopted by the enquirement user of problem
It receives, then illustrates that this answer is to meet the demand of user, therefore can will also meet using this answer as target answer
Source of the answer of the demand of user as core word.
The offer user of answer can refer to the user to furnish an answer.The embodiment of the present invention can be according to the identity of offer user
Information determines first condition, it is possible thereby to realize the determination of target answer.In general, identity information meets the offer of first condition
User compares in field and carries weight and credible, therefore can provide quality higher answer, therefore the embodiment of the present invention can be with
Target answer is determined by providing user.Above-mentioned identity information may include: the information such as account levels, account credit.
The embodiment of the present invention can determine target answer by the mass parameter of answer.Above-mentioned mass parameter can be by user
Evaluated to obtain, in practical applications, put question to user or it is non-put question to can evaluate per family answer, such as can be with
It gives a mark for answer, then can integrate the marking of multiple users, determine the mass parameter of answer.Second condition may include
But be not limited to: mass parameter is more than parameter threshold, alternatively, quality of the mass parameter of target answer in all answers of the problem
It is optimal, etc. in parameter.
The target answer can be the intersection of multiple answers, to exclude the redundant content of non-intersection.
To sum up, the embodiment of the present invention characterizes the higher answer of quality or higher with problem correlation by target answer
Answer, it is possible thereby to low-quality or invalid answer be excluded, it is possible thereby to improve the quality of core word.
Technical solution 2
In technical solution 2, the step 201 includes content of text and non-textual content and the content of text in problem
In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, can specifically include: in problem
Including content of text and non-textual content and in the case that the content of text meets meaningless condition, described problem is determined
The corresponding intersection of multiple answers;Core word is determined from the corresponding text of the intersection.
In the case where problem is corresponding with multiple answers, the embodiment of the present invention can determine the corresponding intersection of multiple answers,
Wherein intersection can refer to the content namely the content for belonging to each answer that multiple answers include jointly.The embodiment of the present invention will be more
Source of the content that a answer includes jointly as core word, can be improved the quality of core word, and due to eliminating redundancy
Information, therefore the determination efficiency of core word can be improved.
Technical solution 3
In technical solution 3, the step 201 includes content of text and non-textual content and the content of text in problem
In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, can specifically include: in problem
It is corresponding to described problem including content of text and non-textual content and in the case that the content of text meets meaningless condition
Answer text segmented, to obtain word segmentation result;It is special according to the corresponding vocabulary of vocabulary multiple in the word segmentation result
Sign, determines core word from the multiple vocabulary;
Wherein, the lexical feature may include at least one of following feature: part of speech, word frequency, reverse document frequency
It is long with word.
In some cases, answer text is shorter, can be directly using answer text as core word.For example, the text of problem
This content is " whom this is ", and the answer of problem is " name A ", " name A " directly can be used as core word.
In other cases, answer text is relatively tediously long, directly can not therefrom obtain core word, in such cases, can
To utilize technical solution 3, core word is extracted from answer text.Specifically, it answer case text can be segmented, be obtained first
To word segmentation result in may include multiple vocabulary;Then, according to lexical feature, core word is determined from multiple vocabulary,
In, core word can be one or more in multiple vocabulary.
Above-mentioned lexical feature may include: at least one of part of speech, word frequency, reverse document frequency and word length.
Wherein, basis of the characteristics of part of speech can refer to using word as Part of Speech Division.Every kind of language may have corresponding
Part of speech.For example, the word of Modern Chinese can be divided into two classes, 14 kinds of parts of speech (part of speech).One kind is notional word: noun, verb, adjective,
Distinction word, pronoun, number, quantifier, one kind are function words: adverbial word, preposition, conjunction, auxiliary word, modal particle, onomatopoeia, interjection.
In the document that portion is given, word frequency refers to the number that some given vocabulary occurs in the document.Specifically
To the embodiment of the present invention, given document can refer to the document where answer text.
Reverse document frequency is the measurement of a vocabulary general importance.The reverse document frequency of a certain vocabulary, Ke Yiyou
Obtained quotient then is taken logarithm to obtain by the number Q1 of document divided by the number Q2 of the document comprising the vocabulary, wherein Q1 and Q2 can
Think natural number.
Word length can refer to the length of vocabulary.
The document of the embodiment of the present invention can be originated from collection of document, and above-mentioned collection of document can be originated from internet, for example, on
State collection of document may include: the input corpus of input method environment, instant messaging environment language chat the language of corpus, microblogging environment
Material, corpus of question-answering environment etc..The embodiment of the present invention can will have successional text to be regarded as a document in collection of document,
It is appreciated that the embodiment of the present invention is without restriction for the specific acquisition modes of specific corpus and document.
In an embodiment of the present invention, above-mentioned special according to the corresponding vocabulary of vocabulary multiple in the word segmentation result
Sign, determines core word from the multiple vocabulary, can specifically include: is right respectively according to multiple vocabulary in the word segmentation result
The word frequency answered and reverse document frequency, determine core word from the multiple vocabulary.
It is alternatively possible to which the product according to word frequency and reverse document frequency, determines core word from multiple vocabulary.This multiplies
Long-pending meaning can be with are as follows: if the frequency that some vocabulary occurs in a document is high, and seldom goes out in other documents
It is existing, then it is assumed that this vocabulary has high importance in this document.
In another embodiment of the invention, described according to the corresponding vocabulary of vocabulary multiple in the word segmentation result
Feature is determined core word from the multiple vocabulary, be can specifically include: vocabulary multiple in the word segmentation result are right respectively
The lexical feature input core word identification model answered, to obtain the core word of the core word identification model output;The core
The training data of word identification model can specifically include: corpus and the corresponding mark core word of the corpus.
Core word identification model can be machine learning model.In broad terms, machine learning is that one kind can assign machine
The ability of device study, the method for allowing it to complete the impossible function of Direct Programming with this.But it is said in the sense that practice, machine
Study is a kind of by training model using data, then uses a kind of method of model prediction.Machine learning method can be with
It include: traditional decision-tree, linear regression method, logistic regression method, neural network method, k near neighbor method etc., it will be understood that
The embodiment of the present invention is without restriction for specific machine learning method.
The core word identification model of the embodiment of the present invention can learn training data, the core word in training data
It is marked.The core word identification model of the embodiment of the present invention can have the knowledge of core word by the study for training data
Other ability.Optionally, core word identification model can determine in corpus in corresponding first lexical feature of core word or corpus
Corresponding second lexical feature of non-core word, and according to the corresponding lexical feature of vocabulary multiple in the word segmentation result,
One lexical feature or the second lexical feature, determine the core word in the word segmentation result.
In step 202, the corresponding recommendation of described problem can be determined according to the core word that step 201 obtains.
According to a kind of embodiment, core word keyword corresponding with content can be matched, and by successful match
Content is as recommendation.
In an embodiment of the present invention, the corresponding recommendation of problem can be arranged in Questions page;It can be with
Any user for browsing the Questions page is set to see above-mentioned recommendation.For example, the corresponding recommendation of problem can be arranged
In recommendation region shown in Fig. 1.
In another embodiment of the invention, the corresponding recommendation of problem can be pushed to the enquirement user of problem.
For example, can be by way of message, Xiang Tiwen user pushes the recommendation.It is appreciated that the embodiment of the present invention is for pushing away
The concrete application mode for recommending content is without restriction.
To sum up, the data processing method of the embodiment of the present invention handles targeted first problem while including content of text
It since content of text and non-textual content can characterize richer problem intent information, therefore can be to ask with non-textual content
The answer of topic provides more favorable foundation, therefore can increase to obtain the probability of effective answer, namely having for answer can be improved
Effect property.
On this basis, the embodiment of the present invention determines core word from the corresponding answer of problem, and word with reference to this core,
Determine the corresponding recommendation of described problem;Since core word can be used for characterizing the important information and core of compression expression answer
The word of content, core word is originated from the higher answer of validity, therefore the degree of correlation between recommendation and problem can be improved, because
The accuracy and validity of recommendation can be improved in this.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method
It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to
According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should
Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented
Necessary to example.
Installation practice
Referring to Fig. 3, a kind of structural block diagram of data processing equipment embodiment of the invention is shown, can specifically include:
Core word determining module 301 and recommendation determining module 302;
Wherein, core word determining module 301, for including content of text and non-textual content and the text in problem
In the case that content meets meaningless condition, core word is determined from the corresponding answer of described problem;And
Recommendation determining module 302, for determining the corresponding recommendation of described problem according to the core word.
Optionally, the non-textual content may include: image content, and/or video content.
Optionally, the meaningless condition may include:
The corresponding first language structure feature of content of text second language structure corresponding with meaningless text is preset
Feature matches;And/or
The entity word of presetting granularity is not present in the content of text.
Optionally, the core word determining module 301 may include:
Target answer determining module, for may include in content of text and non-textual content and the text in problem
In the case that appearance meets meaningless condition, target answer is determined from the corresponding answer of described problem;
Answer core word determining module, for determining core word from the target answer.
Optionally, the feature of the target answer may include at least one of following feature:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
Optionally, the core word determining module 301 may include:
Answer intersection determining module, for may include in content of text and non-textual content and the text in problem
In the case that appearance meets meaningless condition, the corresponding intersection of multiple answers of described problem is determined;
Intersection core word determining module, for determining core word from the corresponding text of the intersection.
Optionally, the core word determining module 301 may include:
Word segmentation module, for may include content of text and non-textual content in problem and the content of text meets nothing
In the case where meaning condition, the corresponding answer text of described problem is segmented, to obtain word segmentation result;
Lexical choice module, for according to the corresponding lexical feature of vocabulary multiple in the word segmentation result, from described
Core word is determined in multiple vocabulary;
Wherein, the lexical feature may include at least one of following feature:
Part of speech, word frequency, reverse document frequency and word are long.
Optionally, the lexical choice module may include:
Core word identification module, for the corresponding lexical feature of vocabulary multiple in the word segmentation result to be inputted core
Word identification model, to obtain the core word of the core word identification model output;The training data of the core word identification model
It may include: corpus and the corresponding mark core word of the corpus.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple
Place illustrates referring to the part of embodiment of the method.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method
Embodiment in be described in detail, no detailed explanation will be given here.
The embodiment of the invention provides a kind of devices for data processing, include memory and one or one
A above program, perhaps more than one program is stored in memory and is configured to by one or one for one of them
It includes the instruction for performing the following operation that the above processor, which executes the one or more programs: including text in problem
In the case that this content and non-textual content and the content of text meet meaningless condition, from the corresponding answer of described problem
In determine core word;According to the core word, the corresponding recommendation of described problem is determined.
Fig. 4 is a kind of block diagram of device 800 for data processing shown according to an exemplary embodiment.For example, dress
Setting 800 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical treatment
Equipment, body-building equipment, personal digital assistant etc..
Referring to Fig. 4, device 800 may include following one or more components: processing component 802, memory 804, power supply
Component 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, and
Communication component 816.
The integrated operation of the usual control device 800 of processing component 802, such as with display, telephone call, data communication, phase
Machine operation and record operate associated operation.Processing element 802 may include that one or more processors 820 refer to execute
It enables, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more modules, just
Interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, it is more to facilitate
Interaction between media component 808 and processing component 802.
Memory 804 is configured as storing various types of data to support the operation in equipment 800.These data are shown
Example includes the instruction of any application or method for operating on device 800, contact data, and telephone book data disappears
Breath, picture, video etc..Memory 804 can be by any kind of volatibility or non-volatile memory device or their group
It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile
Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 806 provides electric power for the various assemblies of device 800.Power supply module 806 may include power management system
System, one or more power supplys and other with for device 800 generate, manage, and distribute the associated component of electric power.
Multimedia component 808 includes the screen of one output interface of offer between described device 800 and user.One
In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action
Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers
Body component 808 includes a front camera and/or rear camera.When equipment 800 is in operation mode, such as screening-mode or
When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and
Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike
Wind (MIC), when device 800 is in operation mode, when such as call model, logging mode and language data process mode, microphone
It is configured as receiving external audio signal.The received audio signal can be further stored in memory 804 or via logical
Believe that component 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock
Determine button.
Sensor module 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented
Estimate.For example, sensor module 814 can detecte the state that opens/closes of equipment 800, and the relative positioning of component, for example, it is described
Component is the display and keypad of device 800, and sensor module 814 can be with 800 1 components of detection device 800 or device
Position change, the existence or non-existence that user contacts with device 800,800 orientation of device or acceleration/deceleration and device 800
Temperature change.Sensor module 814 may include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor module 814 can also include optical sensor, such as CMOS or ccd image sensor, at
As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device
800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation
In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, to promote short range communication.Example
Such as, (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) skill can be handled based on rf data in NFC module
Art, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 800 can be believed by one or more application specific integrated circuit (ASIC), number
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided
It such as include the memory 804 of instruction, above-metioned instruction can be executed by the processor 820 of device 800 to complete the above method.For example,
The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk
With optical data storage devices etc..
Fig. 5 is the structural schematic diagram of server in some embodiments of the present invention.The server 1900 can be because of configuration or property
Energy is different and generates bigger difference, may include one or more central processing units (central processing
Units, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage applications
The storage medium 1930 (such as one or more mass memory units) of program 1942 or data 1944.Wherein, memory
1932 and storage medium 1930 can be of short duration storage or persistent storage.The program for being stored in storage medium 1930 may include one
A or more than one module (diagram does not mark), each module may include to the series of instructions operation in server.More into
One step, central processing unit 1922 can be set to communicate with storage medium 1930, execute storage medium on server 1900
Series of instructions operation in 1930.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets
Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or
More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium by device (server or
Person's terminal) processor execute when, enable a device to execute data processing method shown in Fig. 2.
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium by device (server or
Person's terminal) processor execute when, enable a device to execute a kind of data processing method, which comprises in problem packet
Include content of text and non-textual content and in the case that the content of text meets meaningless condition, it is corresponding from described problem
Core word is determined in answer;According to the core word, the corresponding recommendation of described problem is determined.
The embodiment of the invention discloses A1, a kind of data processing method, which comprises
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition,
Core word is determined from the corresponding answer of described problem;
According to the core word, the corresponding recommendation of described problem is determined.
A2, method according to a1, the non-textual content include: image content, and/or video content.
A3, method according to a1, the meaningless condition include:
The corresponding first language structure feature of content of text second language structure corresponding with meaningless text is preset
Feature matches;And/or
The entity word of presetting granularity is not present in the content of text.
A4, method according to a1, described in problem includes content of text and non-textual content and the content of text
In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition,
Target answer is determined from the corresponding answer of described problem;
Core word is determined from the target answer.
The feature of A5, method according to a4, the target answer include at least one of following feature:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
A6, method according to a1, described in problem includes content of text and non-textual content and the content of text
In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition,
Determine the corresponding intersection of multiple answers of described problem;
Core word is determined from the corresponding text of the intersection.
A7, method according to a1, described in problem includes content of text and non-textual content and the content of text
In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition,
The corresponding answer text of described problem is segmented, to obtain word segmentation result;
According to the corresponding lexical feature of vocabulary multiple in the word segmentation result, core is determined from the multiple vocabulary
Heart word;
Wherein, the lexical feature includes at least one of following feature:
Part of speech, word frequency, reverse document frequency and word are long.
A8, the method according to A7, it is described according to the corresponding lexical feature of vocabulary multiple in the word segmentation result,
Core word is determined from the multiple vocabulary, comprising:
The corresponding lexical feature of vocabulary multiple in the word segmentation result is inputted into core word identification model, to obtain
State the core word of core word identification model output;The training data of the core word identification model includes: corpus and institute's predicate
Expect corresponding mark core word.
The embodiment of the invention discloses B9, a kind of data processing equipment, comprising:
Core word determining module, for including content of text and non-textual content in problem and the content of text meets
In the case where meaningless condition, core word is determined from the corresponding answer of described problem;And
Recommendation determining module, for determining the corresponding recommendation of described problem according to the core word.
B10, the device according to B9, the non-textual content include: image content, and/or video content.
B11, the device according to B9, the meaningless condition include:
The corresponding first language structure feature of content of text second language structure corresponding with meaningless text is preset
Feature matches;And/or
The entity word of presetting granularity is not present in the content of text.
B12, the device according to B9, the core word determining module include:
Target answer determining module, for including content of text and non-textual content and content of text symbol in problem
In the case where closing meaningless condition, target answer is determined from the corresponding answer of described problem;
Answer core word determining module, for determining core word from the target answer.
The feature of B13, device according to b12, the target answer include at least one of following feature:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
B14, the device according to B9, the core word determining module include:
Answer intersection determining module, for including content of text and non-textual content and content of text symbol in problem
In the case where closing meaningless condition, the corresponding intersection of multiple answers of described problem is determined;
Intersection core word determining module, for determining core word from the corresponding text of the intersection.
B15, the device according to B9, the core word determining module include:
Word segmentation module, for problem include content of text and non-textual content and the content of text meet it is meaningless
In the case where condition, the corresponding answer text of described problem is segmented, to obtain word segmentation result;
Lexical choice module, for according to the corresponding lexical feature of vocabulary multiple in the word segmentation result, from described
Core word is determined in multiple vocabulary;
Wherein, the lexical feature includes at least one of following feature:
Part of speech, word frequency, reverse document frequency and word are long.
B16, the device according to B15, the lexical choice module include:
Core word identification module, for the corresponding lexical feature of vocabulary multiple in the word segmentation result to be inputted core
Word identification model, to obtain the core word of the core word identification model output;The training data of the core word identification model
It include: corpus and the corresponding mark core word of the corpus.
The embodiment of the invention discloses C17, a kind of device for data processing, include memory and one or
The more than one program of person, one of them perhaps more than one program be stored in memory and be configured to by one or
It includes the instruction for performing the following operation that more than one processor, which executes the one or more programs:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition,
Core word is determined from the corresponding answer of described problem;
According to the core word, the corresponding recommendation of described problem is determined.
C18, the device according to C17, the non-textual content include: image content, and/or video content.
C19, the device according to C17, the meaningless condition include:
The corresponding first language structure feature of content of text second language structure corresponding with meaningless text is preset
Feature matches;And/or
The entity word of presetting granularity is not present in the content of text.
C20, the device according to C17, described in problem includes in content of text and non-textual content and the text
In the case that appearance meets meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition,
Target answer is determined from the corresponding answer of described problem;
Core word is determined from the target answer.
The feature of C21, the device according to C20, the target answer include at least one of following feature:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
C22, the device according to C17, described in problem includes in content of text and non-textual content and the text
In the case that appearance meets meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition,
Determine the corresponding intersection of multiple answers of described problem;
Core word is determined from the corresponding text of the intersection.
C23, the device according to C17, described in problem includes in content of text and non-textual content and the text
In the case that appearance meets meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition,
The corresponding answer text of described problem is segmented, to obtain word segmentation result;
According to the corresponding lexical feature of vocabulary multiple in the word segmentation result, core is determined from the multiple vocabulary
Heart word;
Wherein, the lexical feature includes at least one of following feature:
Part of speech, word frequency, reverse document frequency and word are long.
C24, the device according to C23, it is described special according to the corresponding vocabulary of vocabulary multiple in the word segmentation result
Sign, determines core word from the multiple vocabulary, comprising:
The corresponding lexical feature of vocabulary multiple in the word segmentation result is inputted into core word identification model, to obtain
State the core word of core word identification model output;The training data of the core word identification model includes: corpus and institute's predicate
Expect corresponding mark core word.
The embodiment of the invention discloses D25, a kind of machine readable media, instruction are stored thereon with, when by one or more
When processor executes, so that device executes the data processing method as described in A1 one or more into A8.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its
Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or
Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following
Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Above to a kind of data processing method provided by the present invention, a kind of data processing equipment and a kind of at data
The device of reason, is described in detail, and specific case used herein explains the principle of the present invention and embodiment
It states, the above description of the embodiment is only used to help understand the method for the present invention and its core ideas;Meanwhile for this field
Those skilled in the art, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, to sum up institute
It states, the contents of this specification are not to be construed as limiting the invention.