CN109857847A - A kind of data processing method, device and the device for data processing - Google Patents

A kind of data processing method, device and the device for data processing Download PDF

Info

Publication number
CN109857847A
CN109857847A CN201910037025.1A CN201910037025A CN109857847A CN 109857847 A CN109857847 A CN 109857847A CN 201910037025 A CN201910037025 A CN 201910037025A CN 109857847 A CN109857847 A CN 109857847A
Authority
CN
China
Prior art keywords
content
text
answer
word
core word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910037025.1A
Other languages
Chinese (zh)
Inventor
牛琳琳
刘玉璇
许嘉明
杨菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sogou Hangzhou Intelligent Technology Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201910037025.1A priority Critical patent/CN109857847A/en
Publication of CN109857847A publication Critical patent/CN109857847A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a kind of data processing method, device and for the device of data processing.Method therein specifically includes: in the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, determining core word from the corresponding answer of described problem;According to the core word, the corresponding recommendation of described problem is determined.The accuracy and validity of recommendation can be improved in the embodiment of the present invention.

Description

A kind of data processing method, device and the device for data processing
Technical field
The present invention relates to field of computer technology more particularly to a kind of data processing method, device and it is used for data processing Device.
Background technique
With the development of internet technology, the trend of explosive increase, demand of the user to knowledge is presented in internet data Increasingly thirst for, more and more users begin through internet to meet the acquisition to knowledge.Wherein, question answering system is interconnection The convenient interactivity platform of one of net, in question answering system, problem is issued out by the first user, and second in question answering system User can see the problem and answer, and the first user can select satisfied answer from all answers.
Currently, synthesis obtains the solution answer of the problem to the first user from different perspectives for convenience, not only in problem page Answer for this problem is provided in face, can also provide the problem corresponding recommendation in the Questions page, for first User is with reference to use.Wherein, the type of recommendation may include: relevant issues, related encyclopaedia or associated guideline etc..On Stating recommendation can expand knowledge a little, on the basis of the answer of problem to promote user for the loyalty of question answering system; Also, in the case where user clicks recommendation, the flow of question answering system can also be increased.
A kind of determination method of recommendation may include: the vocabulary that will include in question text, as recommendation Keyword obtains corresponding recommendation.
However, will lead in the limited situation of information content that question text includes and be unable to get accurate effective recommendation Content, namely obtain the invalid recommendation of mistake;In such cases, if being provided in Questions page in the invalid recommendation of mistake Hold, is then easy to influence user experience.For example, being typically only capable to obtain in the case where question text is " what this is ", " whom this is " Content relevant to " what is ", " whom is " is got, due to " what is ", the directive property of " whom is " and indefinite, that is, " being What ", " whom is " can not usually be directed toward specific target, for example, " whom is " can be directed toward arbitrary personage, " whom is " can be with It is directed toward the song etc. that title has " whom is ", this makes the accuracy and validity of content relevant to " what is ", " whom is " It is relatively low, it is difficult to meet the demand of user.
Summary of the invention
The embodiment of the present invention provides a kind of data processing method, device and the device for data processing, can be improved and pushes away Recommend the accuracy and validity of content.
To solve the above-mentioned problems, the embodiment of the invention discloses a kind of data processing methods, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, Core word is determined from the corresponding answer of described problem;
According to the core word, the corresponding recommendation of described problem is determined.
On the other hand, the embodiment of the invention discloses a kind of data processing equipments, comprising:
Core word determining module, for including content of text and non-textual content in problem and the content of text meets In the case where meaningless condition, core word is determined from the corresponding answer of described problem;And
Recommendation determining module, for determining the corresponding recommendation of described problem according to the core word.
In another aspect, including memory, Yi Jiyi the embodiment of the invention discloses a kind of device for data processing A perhaps more than one program one of them or more than one program is stored in memory, and is configured to by one Or it includes the instruction for performing the following operation that more than one processor, which executes the one or more programs:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, Core word is determined from the corresponding answer of described problem;
According to the core word, the corresponding recommendation of described problem is determined.
Another aspect, the embodiment of the invention discloses a kind of machine readable medias, are stored thereon with instruction, when by one or When multiple processors execute, so that device executes the data processing method as described in aforementioned one or more.
The embodiment of the present invention includes following advantages:
The first problem that the processing of the embodiment of the present invention is targeted includes simultaneously content of text and non-textual content, due to text This content and non-textual content can characterize richer problem intent information, therefore can provide for the answer of problem more favorable Foundation, therefore can increase to obtain the probability of effective answer, namely the validity of answer can be improved.
On this basis, the embodiment of the present invention determines core word from the corresponding answer of problem, and word with reference to this core, Determine the corresponding recommendation of described problem;Since core word can be used for characterizing the important information and core of compression expression answer The word of content, core word is originated from the higher answer of validity, therefore the degree of correlation between recommendation and problem can be improved, because The accuracy and validity of recommendation can be improved in this.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is a kind of signal of Questions page 100 of the embodiment of the present invention;
Fig. 2 is a kind of step flow chart of data processing method embodiment of the invention;
Fig. 3 is a kind of structural block diagram of data processing equipment embodiment of the invention;
Fig. 4 is a kind of block diagram of device 800 for data processing of the invention;And
Fig. 5 is the structural schematic diagram of server in some embodiments of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
For the relevant technologies in the limited situation of information content that question text includes, it will lead to and be unable to get accurately effectively Recommendation the problem of, the embodiment of the invention provides a kind of data processing method, this method may include: to include in problem In the case that content of text and non-textual content and the content of text meet meaningless condition, answered from described problem is corresponding Core word is determined in case;And according to the core word, determine the corresponding recommendation of described problem.
The embodiment of the present invention for simultaneously including content of text and non-textual content and the content of text meet it is meaningless A kind of problem (abbreviation first problem) of condition, determines core word from the corresponding answer of first problem, and by the core word Determination basis as the corresponding recommendation of first problem.
In the embodiment of the present invention, the content of text that first problem includes meets meaningless condition, can refer to first problem packet The content of text included is without practical significance.
On the one hand, the content of text that first problem includes meets meaningless condition, will lead to the text that first problem includes Content can not be directed toward accurate effective recommendation.For example, the content of text for meeting meaningless condition may include: that " this is assorted ", " whom this is " etc..The meaning of the vocabulary such as " what ", " who " is limited in content of text due to meeting meaningless condition, Accurate effective recommendation can not be directed toward.
On the other hand, due to meeting the content of text of meaningless condition without practical significance, therefore mostly just according to meeting nothing The content of text of meaning condition cannot obtain effective answer.For example, only only in accordance with the texts such as " what this is ", " whom this is " Content is not typically available effective answer.
The first problem of the embodiment of the present invention includes simultaneously content of text and non-textual content, due to content of text and Fei Wen This content can characterize richer problem intent information, therefore can provide more favorable foundation for the answer of problem, therefore can The probability of effective answer is obtained to increase, namely the validity of answer can be improved.Non-textual content may include: in picture Appearance, and/or video content etc..
In a kind of example 1 of the invention, the content of text of problem A is " whom this is ", and the non-textual content of problem A is The problem of " personage's picture ", then the content of text of problem A and non-textual content characterize, intent information can be " whom personage is ", this So that " title of personage " will be will include in effective answer of problem A.
The first problem that the processing of the embodiment of the present invention is targeted includes simultaneously content of text and non-textual content, Ke Yiti The validity of high answer.On this basis, the embodiment of the present invention determines core word from the corresponding answer of first problem, and according to According to the core word, the corresponding recommendation of described problem is determined;Since core word can be used for characterizing the weight of compression expression answer Want the word of information and core content, core word is originated from the higher answer of validity, therefore can be improved recommendation and problem it Between the degree of correlation, therefore the accuracy and validity of recommendation can be improved.
In example 1, the content of text of problem A is " whom this is ", and the non-textual content of problem A is " personage's picture ", false " title of personage " will be will include in the answer of rhetoric question topic A, then the embodiment of the present invention can will include in the answer of problem A " title of personage " is used as core word, and the keyword by " title of personage " as recommendation, it is possible thereby to improve recommendation The accuracy and validity of content.
The applicable scene of the embodiment of the present invention may include: question and answer scene etc..In question and answer scene, the first user is by problem Publication comes out, and second user can see the problem and be answered, and the first user can select satisfied from all answers Answer.
In question and answer scene, Questions page can refer to the corresponding page of problem, can be used for describing the relevant information of problem. Above-mentioned relevant information may include: problem, answer, recommendation etc..Optionally, Questions page can be jumped by the link of problem and Come, that is, Questions page can be jumped in response to the trigger action of the link for problem.
Referring to Fig.1, a kind of signal of Questions page 100 of the embodiment of the present invention is shown, which can wrap It includes: problem area 101, answer region 102 and recommendation region 103.
Wherein, problem area 101 may include: the content of problem.The content of problem may include: the content of text of problem With non-textual content.It may include: non-textual content or non-textual link in problem area 101.Non-textual content can be with It include: image content, and/or video content etc..Certainly, non-textual content can also include: audio content etc., and the present invention is real It is without restriction for the concrete type of non-textual content to apply example.
Answer region 102 may include: N number of answer of problem, and N can be natural number.The content of answer may include: to answer The content of text of case or the non-textual content of answer.
Recommending region 103 may include: the corresponding recommendation of problem.The type of recommendation may include: that correlation is asked Topic, related encyclopaedia or associated guideline etc..Recommendation can be related to problem, therefore can satisfy the demand of user.
Such as: the problem of Questions page are as follows: " what if is cold cough? ", then it is the correlation that user recommends in Questions page Problem may include: " what if is flu? ", " cold cough have a running nose what if? ", " what if is child's cold cough? ", etc. Deng.
Such as: the problem of Questions page are as follows: " what if is cold cough? ", then it is the correlation that user recommends in Questions page Encyclopaedia may include: " flu encyclopaedia " etc., alternatively, being the associated guideline that user recommends may include: " how to control in Questions page Treat flu ", " table teach you differentiate be common cold or influenza " etc..
Above-mentioned recommendation can expand knowledge a little, on the basis of the answer of problem to promote user for question and answer system The loyalty of system;Also, in the case where user clicks recommendation, the flow of question answering system can also be increased.
It is appreciated that page layout shown in FIG. 1 is intended only as alternative embodiment, in fact, those skilled in the art can To determine the page layout of Questions page according to practical application request, for example, recommending region 103 can be with position in Questions page In the lower area etc. of Questions page, it will be understood that the embodiment of the present invention is without restriction to the specifically layout of Questions page.
Data processing method provided in an embodiment of the present invention can be applied in client application environment corresponding with server, Client and server are located in wired or wireless network, and by the wired or wireless network, client is counted with server According to interaction.
Optionally, client 100 may operate in equipment, for example, client 100 can be the APP run in equipment, Such as question and answer APP, input method APP or operating system included APP, the embodiment of the present invention is for tool corresponding to client Body APP is without restriction.
Optionally, above equipment may include screen, and above-mentioned screen can be used for showing that content, above content may include: The page of question answering system, such as Questions page.Above equipment can specifically include but be not limited to: smart phone, tablet computer, electricity Philosophical works reader, MP3 (dynamic image expert's compression standard audio level 3, Moving Picture Experts Group Audio Layer III) player, MP4 (dynamic image expert's compression standard audio level 4, Moving Picture Experts Group Audio Layer IV) player, pocket computer on knee, vehicle-mounted computer, desktop computer, machine top Box, intelligent TV set, wearable device, intelligent sound etc..It is appreciated that specific equipment is not added in the embodiment of the present invention With limitation.
Embodiment of the method
Referring to Fig. 2, a kind of step flow chart of data processing method embodiment of the invention is shown, can specifically include Following steps:
Step 201 includes content of text and non-textual content in problem and the content of text meets meaningless condition In the case of, core word is determined from the corresponding answer of described problem;
Step 202, according to the core word, determine the corresponding recommendation of described problem.
At least one step of embodiment illustrated in fig. 2 can be by server and/or client executing, certain embodiment of the present invention It is without restriction for the specific executing subject of each step, the embodiment of the present invention mainly by server be executing subject for, Data processing method is illustrated, the corresponding data processing method of other executing subjects is cross-referenced.
In step 201, the problem of server can preserve problem set, and problem set may include question answering system.
Optionally, server can screen the problems in problem set, to obtain first problem, first problem tool Body can satisfy following condition: including content of text and non-textual content and the content of text meets meaningless condition.
In an alternative embodiment of the invention, the process for determining first problem may include:
Step S1, whether decision problem includes content of text and non-textual content, if so, thening follow the steps S2;
Step S2, whether the content of text of decision problem meets meaningless condition, if so, can be using problem as first Problem.
It is appreciated that step S1 and step S2 can be executed sequentially, alternatively, step S2 and step S1 can be executed sequentially, The embodiment of the present invention is without restriction for the specific execution sequence of step S1 and step S2.
In the embodiment of the present invention, the content of text that first problem includes meets meaningless condition, can refer to first problem packet The content of text included is without practical significance.
Those skilled in the art can determine above-mentioned meaningless condition according to practical application request.
In an alternative embodiment of the invention, above-mentioned meaningless condition can specifically include: the first meaningless condition, First meaningless condition is specifically as follows: the corresponding first language structure feature of the content of text and presetting meaningless text pair The second language structure feature answered matches.
Lingual structure feature can be used for characterizing the corresponding language construction of text.Above-mentioned language construction may include: syntax knot Structure, semantic structure etc..Syntactic structure can specifically include: interdependent syntactic structure.Dependency grammar by metalanguage unit at / dependence disclose its syntactic structure;It is " Subject, Predicate and Object " in interdependent syntactic analysis identification sentence, " fixed for intuitive Shape is mended " these grammatical items, and analyze the relationship between each ingredient.Semantic structure may include: interdependent semantic structure.Semanteme according to The semantic association between the analysis each linguistic unit of parsing sentence is deposited, and semantic association is presented with dependency structure.
The embodiment of the present invention presets meaningless text and can refer to meaningless text that collection obtains, meeting meaningless condition This, such as " whom this is ", " what this is ", " what dog this is ", " where this is ".
For meeting the text of meaningless condition, there is language mechanism characteristics rule can say.Therefore the present invention is real Example is applied by presetting the corresponding second language structure feature of meaningless text, characterizes the rule that the text of meaningless condition has; And the corresponding first language structure feature of content of text second language structure feature corresponding with meaningless text is preset is carried out Matching, if successful match, namely preset the second language knot for existing in meaningless text and matching with first language structure feature Structure feature then illustrates that text content meets meaningless condition.
By taking lingual structure feature is interdependent syntactic structure feature as an example, first language structure feature may include: the first one-tenth Point and first composition between the first relationship, second language structure feature may include: second composition and second composition Between the second relationship, then first composition can be matched with second composition and by the first relationship and the second relationship into Row matching, with judge first language structure feature and second language structure feature whether successful match.
For example, second language structure feature includes: " what dog this is " corresponding " Subject, Predicate and Object " structure, first language structure Feature includes " what animal this is " corresponding " Subject, Predicate and Object " structure, then first language structure feature and second language structure feature Successful match.
In an alternative embodiment of the invention, above-mentioned meaningless condition can specifically include: the second meaningless condition, Second meaningless condition is specifically as follows: the entity word of presetting granularity being not present in the content of text.
In the embodiment of the present invention, entity is a specific things or concept.Entity generally understands classified types, such as figure kind Entity, film class entity, animal class entity, history class entity etc..The same entity can correspond to multiple entity instances, and entity is real Example can be to the descriptive page (content) of an entity in network (or other media), such as wrap in the page of encyclopaedia Containing the corresponding entity instance of entity.
Optionally, entity may include: name entity (named entity), name entity can refer to name, mechanism name, Place name and other all entities with entitled mark.Widely name entity can also include: title, song title, shadow Depending on acute name, ProductName, brand name, number, date, currency, address etc..
In practical applications, be intended to the problem of user often with each neck such as film, animal, history, military affairs, amusement, fashion Entity is related in domain, therefore the embodiment of the present invention can incite somebody to action one of " whether there is entity word in content of text " as meaningless condition A factor.
The covering scope of entity is relatively broad, therefore the granularity of entity word may include: coarseness and fine granularity.With " animal For class entity ", according to granularity from coarse to fine may include: " animal " -> " canid " -> " dog " -> " fierce dog " -> " Caucasia shepherd dog " etc..
In view of the accuracy of the directive property of the entity word of coarseness is lower, therefore the embodiment of the present invention can be by presetting grain Degree is to characterize meaningless condition.For example, it is assumed that thinking that the granularity of " dog " is thicker, then presetting granularity can be determined are as follows: fineness ratio " dog " thinner granularity, such as " fierce dog ", " Caucasia shepherd dog ", " dog subfamily dog ", " Tai Di ".
In presetting granularity are as follows: in the case where the thinner granularity of fineness ratio " dog ", " what dog this is " there is no more than " dog " The entity word of thin granularity, it is therefore contemplated that " what dog this is " meets meaningless condition.
It is appreciated that the characteristics of those skilled in the art can be according to field, determines the corresponding presetting granularity in field.Example Such as, entity according to granularity from coarse to fine may include: " clothes " -> " skirt " -> " one-piece dress " and " half body in fashion world " skirt " then can be determined as presetting granularity by skirt " etc..It is appreciated that the embodiment of the present invention is for the specific true of presetting granularity Determine mode and specific presetting granularity is without restriction.
It in an embodiment of the present invention, can be using NER (name Entity recognition, Named Entity Recognition) method determines the entity in the content of text.
According to a kind of embodiment, NER method may include: the method based on dictionary.Method based on dictionary can basis The frequency that phrase occurs constructs entity library to high frequency words, is directly identified as reality for the word that can be retrieved in entity library Body.Wherein, phrase can refer to two or more contaminations.In practical applications, entity dependency number can be grabbed from internet According to, and entity related data is analyzed, to obtain corresponding entity word, and the entity word is stored to entity library, this hair Bright embodiment is without restriction for specific entity word and its acquisition modes.
According to another embodiment, NER method may include: rule-based method.Rule-based method can root According to the composition rule of phrase, the phrase that respective rule is met in request is labeled as entity.
According to another embodiment, NER method may include: the method based on statistical learning.Side based on statistical learning Method will name Entity recognition to regard a classification problem as, using similar SVM (support vector machines, Support Vector Machine), the classification methods such as Bayes;Alternatively, regard name Entity recognition as a sequence labelling problem, it is (hidden using HMM Markov model, Hidden Markov Model), maximum entropy model (Maximum Entropy Model), CRF (condition Random field, conditional random field algorithm), LSTM (shot and long term memory network, LongShort-Term Memory) the sequence labellings model such as model.
In practical applications, those skilled in the art can according to practical application request, using the first meaningless condition and Any or combination in second meaningless condition, it will be understood that specific meaningless condition is not added in the embodiment of the present invention With limitation.
In step 201, in problem include content of text and non-textual content and the content of text meets meaningless condition In the case where, it is believed that: the content of text that problem includes can not be directed toward accurate effective recommendation and content of text and Non-textual content can characterize richer problem intent information, therefore having for answer can be improved in content of text and non-textual content Effect property.Therefore, the embodiment of the present invention can consider comprising content relevant to problem in the corresponding answer of problem, and core word can For characterizing the important information of compression expression urtext and the word of core content, therefore determined from the corresponding answer of problem Core word out, to characterize content relevant to problem by core word.
The embodiment of the present invention can provide the following technical solution that core word is determined from the corresponding answer of described problem:
Technical solution 1
In technical solution 1, the step 201 includes content of text and non-textual content and the content of text in problem In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, can specifically include: in problem It is corresponding from described problem including content of text and non-textual content and in the case that the content of text meets meaningless condition Answer in determine target answer;Core word is determined from the target answer.
Target answer can be used for characterizing the higher answer of quality or with the higher answer of problem correlation, it is possible thereby to Low-quality or invalid answer is excluded, it is possible thereby to improve the quality of core word.
In an alternative embodiment of the invention, the feature of the target answer may include in following feature at least It is a kind of:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
The enquirement user of problem can refer to the first user of RELEASE PROBLEM.If an answer is adopted by the enquirement user of problem It receives, then illustrates that this answer is to meet the demand of user, therefore can will also meet using this answer as target answer Source of the answer of the demand of user as core word.
The offer user of answer can refer to the user to furnish an answer.The embodiment of the present invention can be according to the identity of offer user Information determines first condition, it is possible thereby to realize the determination of target answer.In general, identity information meets the offer of first condition User compares in field and carries weight and credible, therefore can provide quality higher answer, therefore the embodiment of the present invention can be with Target answer is determined by providing user.Above-mentioned identity information may include: the information such as account levels, account credit.
The embodiment of the present invention can determine target answer by the mass parameter of answer.Above-mentioned mass parameter can be by user Evaluated to obtain, in practical applications, put question to user or it is non-put question to can evaluate per family answer, such as can be with It gives a mark for answer, then can integrate the marking of multiple users, determine the mass parameter of answer.Second condition may include But be not limited to: mass parameter is more than parameter threshold, alternatively, quality of the mass parameter of target answer in all answers of the problem It is optimal, etc. in parameter.
The target answer can be the intersection of multiple answers, to exclude the redundant content of non-intersection.
To sum up, the embodiment of the present invention characterizes the higher answer of quality or higher with problem correlation by target answer Answer, it is possible thereby to low-quality or invalid answer be excluded, it is possible thereby to improve the quality of core word.
Technical solution 2
In technical solution 2, the step 201 includes content of text and non-textual content and the content of text in problem In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, can specifically include: in problem Including content of text and non-textual content and in the case that the content of text meets meaningless condition, described problem is determined The corresponding intersection of multiple answers;Core word is determined from the corresponding text of the intersection.
In the case where problem is corresponding with multiple answers, the embodiment of the present invention can determine the corresponding intersection of multiple answers, Wherein intersection can refer to the content namely the content for belonging to each answer that multiple answers include jointly.The embodiment of the present invention will be more Source of the content that a answer includes jointly as core word, can be improved the quality of core word, and due to eliminating redundancy Information, therefore the determination efficiency of core word can be improved.
Technical solution 3
In technical solution 3, the step 201 includes content of text and non-textual content and the content of text in problem In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, can specifically include: in problem It is corresponding to described problem including content of text and non-textual content and in the case that the content of text meets meaningless condition Answer text segmented, to obtain word segmentation result;It is special according to the corresponding vocabulary of vocabulary multiple in the word segmentation result Sign, determines core word from the multiple vocabulary;
Wherein, the lexical feature may include at least one of following feature: part of speech, word frequency, reverse document frequency It is long with word.
In some cases, answer text is shorter, can be directly using answer text as core word.For example, the text of problem This content is " whom this is ", and the answer of problem is " name A ", " name A " directly can be used as core word.
In other cases, answer text is relatively tediously long, directly can not therefrom obtain core word, in such cases, can To utilize technical solution 3, core word is extracted from answer text.Specifically, it answer case text can be segmented, be obtained first To word segmentation result in may include multiple vocabulary;Then, according to lexical feature, core word is determined from multiple vocabulary, In, core word can be one or more in multiple vocabulary.
Above-mentioned lexical feature may include: at least one of part of speech, word frequency, reverse document frequency and word length.
Wherein, basis of the characteristics of part of speech can refer to using word as Part of Speech Division.Every kind of language may have corresponding Part of speech.For example, the word of Modern Chinese can be divided into two classes, 14 kinds of parts of speech (part of speech).One kind is notional word: noun, verb, adjective, Distinction word, pronoun, number, quantifier, one kind are function words: adverbial word, preposition, conjunction, auxiliary word, modal particle, onomatopoeia, interjection.
In the document that portion is given, word frequency refers to the number that some given vocabulary occurs in the document.Specifically To the embodiment of the present invention, given document can refer to the document where answer text.
Reverse document frequency is the measurement of a vocabulary general importance.The reverse document frequency of a certain vocabulary, Ke Yiyou Obtained quotient then is taken logarithm to obtain by the number Q1 of document divided by the number Q2 of the document comprising the vocabulary, wherein Q1 and Q2 can Think natural number.
Word length can refer to the length of vocabulary.
The document of the embodiment of the present invention can be originated from collection of document, and above-mentioned collection of document can be originated from internet, for example, on State collection of document may include: the input corpus of input method environment, instant messaging environment language chat the language of corpus, microblogging environment Material, corpus of question-answering environment etc..The embodiment of the present invention can will have successional text to be regarded as a document in collection of document, It is appreciated that the embodiment of the present invention is without restriction for the specific acquisition modes of specific corpus and document.
In an embodiment of the present invention, above-mentioned special according to the corresponding vocabulary of vocabulary multiple in the word segmentation result Sign, determines core word from the multiple vocabulary, can specifically include: is right respectively according to multiple vocabulary in the word segmentation result The word frequency answered and reverse document frequency, determine core word from the multiple vocabulary.
It is alternatively possible to which the product according to word frequency and reverse document frequency, determines core word from multiple vocabulary.This multiplies Long-pending meaning can be with are as follows: if the frequency that some vocabulary occurs in a document is high, and seldom goes out in other documents It is existing, then it is assumed that this vocabulary has high importance in this document.
In another embodiment of the invention, described according to the corresponding vocabulary of vocabulary multiple in the word segmentation result Feature is determined core word from the multiple vocabulary, be can specifically include: vocabulary multiple in the word segmentation result are right respectively The lexical feature input core word identification model answered, to obtain the core word of the core word identification model output;The core The training data of word identification model can specifically include: corpus and the corresponding mark core word of the corpus.
Core word identification model can be machine learning model.In broad terms, machine learning is that one kind can assign machine The ability of device study, the method for allowing it to complete the impossible function of Direct Programming with this.But it is said in the sense that practice, machine Study is a kind of by training model using data, then uses a kind of method of model prediction.Machine learning method can be with It include: traditional decision-tree, linear regression method, logistic regression method, neural network method, k near neighbor method etc., it will be understood that The embodiment of the present invention is without restriction for specific machine learning method.
The core word identification model of the embodiment of the present invention can learn training data, the core word in training data It is marked.The core word identification model of the embodiment of the present invention can have the knowledge of core word by the study for training data Other ability.Optionally, core word identification model can determine in corpus in corresponding first lexical feature of core word or corpus Corresponding second lexical feature of non-core word, and according to the corresponding lexical feature of vocabulary multiple in the word segmentation result, One lexical feature or the second lexical feature, determine the core word in the word segmentation result.
In step 202, the corresponding recommendation of described problem can be determined according to the core word that step 201 obtains.
According to a kind of embodiment, core word keyword corresponding with content can be matched, and by successful match Content is as recommendation.
In an embodiment of the present invention, the corresponding recommendation of problem can be arranged in Questions page;It can be with Any user for browsing the Questions page is set to see above-mentioned recommendation.For example, the corresponding recommendation of problem can be arranged In recommendation region shown in Fig. 1.
In another embodiment of the invention, the corresponding recommendation of problem can be pushed to the enquirement user of problem. For example, can be by way of message, Xiang Tiwen user pushes the recommendation.It is appreciated that the embodiment of the present invention is for pushing away The concrete application mode for recommending content is without restriction.
To sum up, the data processing method of the embodiment of the present invention handles targeted first problem while including content of text It since content of text and non-textual content can characterize richer problem intent information, therefore can be to ask with non-textual content The answer of topic provides more favorable foundation, therefore can increase to obtain the probability of effective answer, namely having for answer can be improved Effect property.
On this basis, the embodiment of the present invention determines core word from the corresponding answer of problem, and word with reference to this core, Determine the corresponding recommendation of described problem;Since core word can be used for characterizing the important information and core of compression expression answer The word of content, core word is originated from the higher answer of validity, therefore the degree of correlation between recommendation and problem can be improved, because The accuracy and validity of recommendation can be improved in this.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.
Installation practice
Referring to Fig. 3, a kind of structural block diagram of data processing equipment embodiment of the invention is shown, can specifically include: Core word determining module 301 and recommendation determining module 302;
Wherein, core word determining module 301, for including content of text and non-textual content and the text in problem In the case that content meets meaningless condition, core word is determined from the corresponding answer of described problem;And
Recommendation determining module 302, for determining the corresponding recommendation of described problem according to the core word.
Optionally, the non-textual content may include: image content, and/or video content.
Optionally, the meaningless condition may include:
The corresponding first language structure feature of content of text second language structure corresponding with meaningless text is preset Feature matches;And/or
The entity word of presetting granularity is not present in the content of text.
Optionally, the core word determining module 301 may include:
Target answer determining module, for may include in content of text and non-textual content and the text in problem In the case that appearance meets meaningless condition, target answer is determined from the corresponding answer of described problem;
Answer core word determining module, for determining core word from the target answer.
Optionally, the feature of the target answer may include at least one of following feature:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
Optionally, the core word determining module 301 may include:
Answer intersection determining module, for may include in content of text and non-textual content and the text in problem In the case that appearance meets meaningless condition, the corresponding intersection of multiple answers of described problem is determined;
Intersection core word determining module, for determining core word from the corresponding text of the intersection.
Optionally, the core word determining module 301 may include:
Word segmentation module, for may include content of text and non-textual content in problem and the content of text meets nothing In the case where meaning condition, the corresponding answer text of described problem is segmented, to obtain word segmentation result;
Lexical choice module, for according to the corresponding lexical feature of vocabulary multiple in the word segmentation result, from described Core word is determined in multiple vocabulary;
Wherein, the lexical feature may include at least one of following feature:
Part of speech, word frequency, reverse document frequency and word are long.
Optionally, the lexical choice module may include:
Core word identification module, for the corresponding lexical feature of vocabulary multiple in the word segmentation result to be inputted core Word identification model, to obtain the core word of the core word identification model output;The training data of the core word identification model It may include: corpus and the corresponding mark core word of the corpus.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
The embodiment of the invention provides a kind of devices for data processing, include memory and one or one A above program, perhaps more than one program is stored in memory and is configured to by one or one for one of them It includes the instruction for performing the following operation that the above processor, which executes the one or more programs: including text in problem In the case that this content and non-textual content and the content of text meet meaningless condition, from the corresponding answer of described problem In determine core word;According to the core word, the corresponding recommendation of described problem is determined.
Fig. 4 is a kind of block diagram of device 800 for data processing shown according to an exemplary embodiment.For example, dress Setting 800 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical treatment Equipment, body-building equipment, personal digital assistant etc..
Referring to Fig. 4, device 800 may include following one or more components: processing component 802, memory 804, power supply Component 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, and Communication component 816.
The integrated operation of the usual control device 800 of processing component 802, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing element 802 may include that one or more processors 820 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more modules, just Interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, it is more to facilitate Interaction between media component 808 and processing component 802.
Memory 804 is configured as storing various types of data to support the operation in equipment 800.These data are shown Example includes the instruction of any application or method for operating on device 800, contact data, and telephone book data disappears Breath, picture, video etc..Memory 804 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.
Power supply module 806 provides electric power for the various assemblies of device 800.Power supply module 806 may include power management system System, one or more power supplys and other with for device 800 generate, manage, and distribute the associated component of electric power.
Multimedia component 808 includes the screen of one output interface of offer between described device 800 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 808 includes a front camera and/or rear camera.When equipment 800 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike Wind (MIC), when device 800 is in operation mode, when such as call model, logging mode and language data process mode, microphone It is configured as receiving external audio signal.The received audio signal can be further stored in memory 804 or via logical Believe that component 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.
Sensor module 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented Estimate.For example, sensor module 814 can detecte the state that opens/closes of equipment 800, and the relative positioning of component, for example, it is described Component is the display and keypad of device 800, and sensor module 814 can be with 800 1 components of detection device 800 or device Position change, the existence or non-existence that user contacts with device 800,800 orientation of device or acceleration/deceleration and device 800 Temperature change.Sensor module 814 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 814 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) skill can be handled based on rf data in NFC module Art, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 800 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 804 of instruction, above-metioned instruction can be executed by the processor 820 of device 800 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..
Fig. 5 is the structural schematic diagram of server in some embodiments of the present invention.The server 1900 can be because of configuration or property Energy is different and generates bigger difference, may include one or more central processing units (central processing Units, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage applications The storage medium 1930 (such as one or more mass memory units) of program 1942 or data 1944.Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistent storage.The program for being stored in storage medium 1930 may include one A or more than one module (diagram does not mark), each module may include to the series of instructions operation in server.More into One step, central processing unit 1922 can be set to communicate with storage medium 1930, execute storage medium on server 1900 Series of instructions operation in 1930.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium by device (server or Person's terminal) processor execute when, enable a device to execute data processing method shown in Fig. 2.
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium by device (server or Person's terminal) processor execute when, enable a device to execute a kind of data processing method, which comprises in problem packet Include content of text and non-textual content and in the case that the content of text meets meaningless condition, it is corresponding from described problem Core word is determined in answer;According to the core word, the corresponding recommendation of described problem is determined.
The embodiment of the invention discloses A1, a kind of data processing method, which comprises
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, Core word is determined from the corresponding answer of described problem;
According to the core word, the corresponding recommendation of described problem is determined.
A2, method according to a1, the non-textual content include: image content, and/or video content.
A3, method according to a1, the meaningless condition include:
The corresponding first language structure feature of content of text second language structure corresponding with meaningless text is preset Feature matches;And/or
The entity word of presetting granularity is not present in the content of text.
A4, method according to a1, described in problem includes content of text and non-textual content and the content of text In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, Target answer is determined from the corresponding answer of described problem;
Core word is determined from the target answer.
The feature of A5, method according to a4, the target answer include at least one of following feature:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
A6, method according to a1, described in problem includes content of text and non-textual content and the content of text In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, Determine the corresponding intersection of multiple answers of described problem;
Core word is determined from the corresponding text of the intersection.
A7, method according to a1, described in problem includes content of text and non-textual content and the content of text In the case where meeting meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, The corresponding answer text of described problem is segmented, to obtain word segmentation result;
According to the corresponding lexical feature of vocabulary multiple in the word segmentation result, core is determined from the multiple vocabulary Heart word;
Wherein, the lexical feature includes at least one of following feature:
Part of speech, word frequency, reverse document frequency and word are long.
A8, the method according to A7, it is described according to the corresponding lexical feature of vocabulary multiple in the word segmentation result, Core word is determined from the multiple vocabulary, comprising:
The corresponding lexical feature of vocabulary multiple in the word segmentation result is inputted into core word identification model, to obtain State the core word of core word identification model output;The training data of the core word identification model includes: corpus and institute's predicate Expect corresponding mark core word.
The embodiment of the invention discloses B9, a kind of data processing equipment, comprising:
Core word determining module, for including content of text and non-textual content in problem and the content of text meets In the case where meaningless condition, core word is determined from the corresponding answer of described problem;And
Recommendation determining module, for determining the corresponding recommendation of described problem according to the core word.
B10, the device according to B9, the non-textual content include: image content, and/or video content.
B11, the device according to B9, the meaningless condition include:
The corresponding first language structure feature of content of text second language structure corresponding with meaningless text is preset Feature matches;And/or
The entity word of presetting granularity is not present in the content of text.
B12, the device according to B9, the core word determining module include:
Target answer determining module, for including content of text and non-textual content and content of text symbol in problem In the case where closing meaningless condition, target answer is determined from the corresponding answer of described problem;
Answer core word determining module, for determining core word from the target answer.
The feature of B13, device according to b12, the target answer include at least one of following feature:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
B14, the device according to B9, the core word determining module include:
Answer intersection determining module, for including content of text and non-textual content and content of text symbol in problem In the case where closing meaningless condition, the corresponding intersection of multiple answers of described problem is determined;
Intersection core word determining module, for determining core word from the corresponding text of the intersection.
B15, the device according to B9, the core word determining module include:
Word segmentation module, for problem include content of text and non-textual content and the content of text meet it is meaningless In the case where condition, the corresponding answer text of described problem is segmented, to obtain word segmentation result;
Lexical choice module, for according to the corresponding lexical feature of vocabulary multiple in the word segmentation result, from described Core word is determined in multiple vocabulary;
Wherein, the lexical feature includes at least one of following feature:
Part of speech, word frequency, reverse document frequency and word are long.
B16, the device according to B15, the lexical choice module include:
Core word identification module, for the corresponding lexical feature of vocabulary multiple in the word segmentation result to be inputted core Word identification model, to obtain the core word of the core word identification model output;The training data of the core word identification model It include: corpus and the corresponding mark core word of the corpus.
The embodiment of the invention discloses C17, a kind of device for data processing, include memory and one or The more than one program of person, one of them perhaps more than one program be stored in memory and be configured to by one or It includes the instruction for performing the following operation that more than one processor, which executes the one or more programs:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, Core word is determined from the corresponding answer of described problem;
According to the core word, the corresponding recommendation of described problem is determined.
C18, the device according to C17, the non-textual content include: image content, and/or video content.
C19, the device according to C17, the meaningless condition include:
The corresponding first language structure feature of content of text second language structure corresponding with meaningless text is preset Feature matches;And/or
The entity word of presetting granularity is not present in the content of text.
C20, the device according to C17, described in problem includes in content of text and non-textual content and the text In the case that appearance meets meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, Target answer is determined from the corresponding answer of described problem;
Core word is determined from the target answer.
The feature of C21, the device according to C20, the target answer include at least one of following feature:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
C22, the device according to C17, described in problem includes in content of text and non-textual content and the text In the case that appearance meets meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, Determine the corresponding intersection of multiple answers of described problem;
Core word is determined from the corresponding text of the intersection.
C23, the device according to C17, described in problem includes in content of text and non-textual content and the text In the case that appearance meets meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, The corresponding answer text of described problem is segmented, to obtain word segmentation result;
According to the corresponding lexical feature of vocabulary multiple in the word segmentation result, core is determined from the multiple vocabulary Heart word;
Wherein, the lexical feature includes at least one of following feature:
Part of speech, word frequency, reverse document frequency and word are long.
C24, the device according to C23, it is described special according to the corresponding vocabulary of vocabulary multiple in the word segmentation result Sign, determines core word from the multiple vocabulary, comprising:
The corresponding lexical feature of vocabulary multiple in the word segmentation result is inputted into core word identification model, to obtain State the core word of core word identification model output;The training data of the core word identification model includes: corpus and institute's predicate Expect corresponding mark core word.
The embodiment of the invention discloses D25, a kind of machine readable media, instruction are stored thereon with, when by one or more When processor executes, so that device executes the data processing method as described in A1 one or more into A8.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Above to a kind of data processing method provided by the present invention, a kind of data processing equipment and a kind of at data The device of reason, is described in detail, and specific case used herein explains the principle of the present invention and embodiment It states, the above description of the embodiment is only used to help understand the method for the present invention and its core ideas;Meanwhile for this field Those skilled in the art, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, to sum up institute It states, the contents of this specification are not to be construed as limiting the invention.

Claims (10)

1. a kind of data processing method, which is characterized in that the described method includes:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, from institute It states in the corresponding answer of problem and determines core word;
According to the core word, the corresponding recommendation of described problem is determined.
2. the method according to claim 1, wherein the non-textual content includes: image content, and/or view Frequency content.
3. the method according to claim 1, wherein the meaningless condition includes:
The corresponding first language structure feature of content of text second language structure feature corresponding with meaningless text is preset Match;And/or
The entity word of presetting granularity is not present in the content of text.
4. the method according to claim 1, wherein it is described problem include content of text and non-textual content, And in the case that the content of text meets meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, from institute It states in the corresponding answer of problem and determines target answer;
Core word is determined from the target answer.
5. according to the method described in claim 4, it is characterized in that, the feature of the target answer include in following feature extremely Few one kind:
Adopted by the enquirement user of described problem;And/or
The offer user of the target answer meets first condition;And/or
The mass parameter of the target answer meets second condition;And/or
The target answer is the intersection of multiple answers.
6. the method according to claim 1, wherein it is described problem include content of text and non-textual content, And in the case that the content of text meets meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, determine The corresponding intersection of multiple answers of described problem;
Core word is determined from the corresponding text of the intersection.
7. the method according to claim 1, wherein it is described problem include content of text and non-textual content, And in the case that the content of text meets meaningless condition, core word is determined from the corresponding answer of described problem, comprising:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, to institute It states the corresponding answer text of problem to be segmented, to obtain word segmentation result;
According to the corresponding lexical feature of vocabulary multiple in the word segmentation result, core is determined from the multiple vocabulary Word;
Wherein, the lexical feature includes at least one of following feature:
Part of speech, word frequency, reverse document frequency and word are long.
8. a kind of data processing equipment characterized by comprising
Core word determining module, for including content of text and non-textual content in problem and the content of text meets unintentionally In the case where adopted condition, core word is determined from the corresponding answer of described problem;And
Recommendation determining module, for determining the corresponding recommendation of described problem according to the core word.
9. a kind of device for data processing, which is characterized in that include memory and one or more than one journey Sequence, perhaps more than one program is stored in memory and is configured to by one or more than one processor for one of them Executing the one or more programs includes the instruction for performing the following operation:
In the case where problem includes content of text and non-textual content and the content of text meets meaningless condition, from institute It states in the corresponding answer of problem and determines core word;
According to the core word, the corresponding recommendation of described problem is determined.
10. a kind of machine readable media is stored thereon with instruction, when executed by one or more processors, so that device is held Data processing method of the row as described in one or more in claim 1 to 7.
CN201910037025.1A 2019-01-15 2019-01-15 A kind of data processing method, device and the device for data processing Pending CN109857847A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910037025.1A CN109857847A (en) 2019-01-15 2019-01-15 A kind of data processing method, device and the device for data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910037025.1A CN109857847A (en) 2019-01-15 2019-01-15 A kind of data processing method, device and the device for data processing

Publications (1)

Publication Number Publication Date
CN109857847A true CN109857847A (en) 2019-06-07

Family

ID=66894790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910037025.1A Pending CN109857847A (en) 2019-01-15 2019-01-15 A kind of data processing method, device and the device for data processing

Country Status (1)

Country Link
CN (1) CN109857847A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290631A (en) * 2008-05-28 2008-10-22 北京百问百答网络技术有限公司 Network advertisement automatic delivery method and its system
CN102663129A (en) * 2012-04-25 2012-09-12 中国科学院计算技术研究所 Medical field deep question and answer method and medical retrieval system
US20150099257A1 (en) * 2013-10-09 2015-04-09 International Business Machines Corporation Empathy injection for question-answering systems
CN106815194A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and keyword recognition method and device
CN106874467A (en) * 2017-02-15 2017-06-20 百度在线网络技术(北京)有限公司 Method and apparatus for providing Search Results
CN108334489A (en) * 2017-01-19 2018-07-27 百度在线网络技术(北京)有限公司 Text core word recognition method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290631A (en) * 2008-05-28 2008-10-22 北京百问百答网络技术有限公司 Network advertisement automatic delivery method and its system
CN102663129A (en) * 2012-04-25 2012-09-12 中国科学院计算技术研究所 Medical field deep question and answer method and medical retrieval system
US20150099257A1 (en) * 2013-10-09 2015-04-09 International Business Machines Corporation Empathy injection for question-answering systems
CN106815194A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and keyword recognition method and device
CN108334489A (en) * 2017-01-19 2018-07-27 百度在线网络技术(北京)有限公司 Text core word recognition method and device
CN106874467A (en) * 2017-02-15 2017-06-20 百度在线网络技术(北京)有限公司 Method and apparatus for providing Search Results

Similar Documents

Publication Publication Date Title
CN106575293B (en) Isolated language detection system and method
CN110770694B (en) Obtaining response information from multiple corpora
CN114631091A (en) Semantic representation using structural ontologies for assistant systems
WO2020056621A1 (en) Learning method and apparatus for intention recognition model, and device
KR20160138982A (en) Hybrid client/server architecture for parallel processing
US11392213B2 (en) Selective detection of visual cues for automated assistants
CN108345612B (en) Problem processing method and device for problem processing
CN112292724A (en) Dynamic and/or context-specific hotwords for invoking automated assistants
CN109471919A (en) Empty anaphora resolution method and device
CN110222256A (en) A kind of information recommendation method, device and the device for information recommendation
EP3835993A2 (en) Keyword extraction method, apparatus and medium
CN107424612B (en) Processing method, apparatus and machine-readable medium
CN108345625A (en) A kind of information mining method and device, a kind of device for information excavating
CN108717403B (en) Processing method and device for processing
TW202301080A (en) Multi-device mediation for assistant systems
CN112328793A (en) Comment text data processing method and device and storage medium
CN110399468A (en) A kind of data processing method, device and the device for data processing
CN107301188B (en) Method for acquiring user interest and electronic equipment
CN109857847A (en) A kind of data processing method, device and the device for data processing
CN110119461A (en) A kind of processing method and processing device of query information
CN114417827A (en) Text context processing method and device, electronic equipment and storage medium
CN110929122B (en) Data processing method and device for data processing
CN111708444A (en) Input method, input device and input device
CN113010768B (en) Data processing method and device for data processing
CN116069936B (en) Method and device for generating digital media article

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20190830

Address after: 310018 Room 1501, Building 57, Baiyang Street Science Park Road, Hangzhou Economic and Technological Development Zone, Zhejiang Province

Applicant after: Sogou (Hangzhou) Intelligent Technology Co., Ltd.

Applicant after: Sogo Science-Technology Development Co., Ltd., Beijing

Address before: 100084 Beijing, Zhongguancun East Road, building 1, No. 9, Sohu cyber building, room 9, room, room 01

Applicant before: Sogo Science-Technology Development Co., Ltd., Beijing

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190607