CN106709072A - Method of obtaining intelligent conversation reply content based on shared corpora - Google Patents

Method of obtaining intelligent conversation reply content based on shared corpora Download PDF

Info

Publication number
CN106709072A
CN106709072A CN201710076115.2A CN201710076115A CN106709072A CN 106709072 A CN106709072 A CN 106709072A CN 201710076115 A CN201710076115 A CN 201710076115A CN 106709072 A CN106709072 A CN 106709072A
Authority
CN
China
Prior art keywords
sentence
session
type
reply
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710076115.2A
Other languages
Chinese (zh)
Inventor
陈包容
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Dove Software Co Ltd
Original Assignee
Changsha Dove Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Dove Software Co Ltd filed Critical Changsha Dove Software Co Ltd
Priority to CN201710076115.2A priority Critical patent/CN106709072A/en
Publication of CN106709072A publication Critical patent/CN106709072A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method of obtaining intelligent conversation reply content based on shared corpora. The method comprises the following steps of: establishing personal corpora corresponding to communication parties, combining the personal corpora of the multiple communication parties, obtaining the shared corpora and matching the reply content in the shared corpora with the current conversation content. The method solves the technical problem that the conversation reply content obtained by machining based on existing shared corpora is inaccurate. The method has the beneficial effects that the workload of manually establishing the shared corpora is reduced; and the established shared corpora are rich in content and various in forms and have relatively high practicability and intellectuality, so that the relatively accurate conversation reply content is obtained by matching based on the established shared corpora.

Description

A kind of method that intelligent session reply content is obtained based on shared corpus
Technical field
The present invention relates to communication technical field, and in particular to one kind obtains intelligent session reply content based on shared corpus Method.
Background technology
In intelligent session, session reply content can often be shared.Such as enterprise staff carries out commercial session with client In scene, sales manager Zhang San for purpose client inquiry quotation reply sentence, can share to sales manager Li Si so that its He works together, therefore the individual conference language material that can be based on one or more communication sides creates shared corpus, is then based on what is created Shared corpus matching obtains intelligent session reply content.
Due to the existing shared corpus by manual creation to build storehouse quality universal not high, so as to cause based on existing common The session reply content for enjoying corpus matching acquisition is not accurate.For the problem, the present embodiment proposes a kind of based on shared language The method that material storehouse obtains intelligent session reply content.
The content of the invention
The invention provides a kind of method that intelligent session reply content is obtained based on shared corpus, to solve based on existing The session reply content that the shared corpus matching having is obtained not accurately technical problem.
The method that intelligent session reply content is obtained based on shared corpus that the present invention is provided, including:
Personal corpus corresponding with communication side is set up, wherein, the number of communication side is more than one;
The personal corpus of multiple communication sides is merged, shared corpus is obtained;
The reply content of matching and current sessions content matching in shared corpus, and using reply content as with it is current The corresponding session reply content of session content.
Further, setting up personal corpus corresponding with communication side includes:
Gather the session content of communication side;
Obtain the session pair in session content;
According to default scene tag, collection obtains session pair scene tag value corresponding with scene tag;
Session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate Personal corpus corresponding with communication side.
Further, the session in session content is obtained to including:
According to the semanteme of session sentence in session content, determine the initiation sentence in session content and reply sentence;
According to default type judgment rule, it is determined that initiating sentence and replying the type of sentence;
Reply sentence according to initiating between sentence and initiation sentence and next initiation sentence extracts basic session pair;
Sentence to, basic session centering is initiated according to basic session and the type of sentence is replied, at least one session pair is extracted.
Further, according to the semanteme of session sentence in session content, determine the initiation sentence in session content and reply sentence bag Include:
Judge whether the sentence of the session in session content has communication other side to send above in Preset Time interval, if nothing, Then session sentence is defined as initiating sentence;
If so, then judge session sentence whether with communication other side send above without semantic association, if so, then by session sentence really It is set to initiation sentence, otherwise is defined as replying sentence by session sentence.
Further, according to default type judgment rule, it is determined that the type for initiating sentence includes:
Judge to initiate whether sentence is with complete independent semantic sentence, if so, then judging to initiate whether sentence is had by multiple It is made up of complete independent semantic simple sentence, if so, the type for initiating sentence then is defined as into complex sentence initiates sentence type, otherwise it is simple sentence Initiate sentence type;If it is not, whether then judge to initiate sentence comprising having complete independent semantic simple sentence, if comprising sentence will be initiated Type be defined as non-standard complex sentence and initiate sentence type, be that non-standard simple sentence initiates sentence type if not including;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard simple sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard complex sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether initiate sentence whether can be continuous above and below with oneself Session sentence is merged into the sentence group of semantic association, if so, the type that will then initiate sentence derives expands to sentence mass-sending first line of a poem type, otherwise Do not carry out deriving extension.
Further, according to default type judgment rule, it is determined that the type for replying sentence includes:
Judge to reply whether sentence is with complete independent semantic sentence, if so, then judging to reply whether sentence is had by multiple It is made up of complete independent semantic simple sentence, if so, the type for replying sentence then is defined as into complex sentence replys sentence type, otherwise it is simple sentence Reply sentence type;If it is not, whether then judge to reply sentence comprising having complete independent semantic simple sentence, if comprising sentence will be replied Type be defined as non-standard complex sentence and reply sentence type, be that non-standard simple sentence replys sentence type if not including;
Search for whether the reply sentence of non-standard simple sentence reply sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard simple sentence reply sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is replied into sentence The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Search for whether the reply sentence of non-standard complex sentence reply sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard complex sentence reply sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is replied into sentence The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Whether judge the reply sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether reply sentence whether can be continuous above and below with oneself Session sentence is merged into the sentence group of semantic association, if so, the type derivative that will then reply sentence expands to sentence group replys sentence type, otherwise Do not carry out deriving extension.
Further, according to basic session to, the type of sentence is initiated in basic session centering and basic session centering is replied The type of sentence, extracts at least one session to including:
The type that sentence is initiated in basic session centering is carried out to derive extension, polytype initiation sentence is obtained;
The type that sentence is replied in basic session centering is carried out to derive extension, polytype reply sentence is obtained;
According to polytype initiation sentence and polytype reply sentence, the session pair of at least one semantic association is combined Extracted.
Further, the personal corpus of multiple communication sides is merged, obtaining shared corpus includes:
The personal corpus of multiple communication sides is combined, combination corpus is obtained;
The session comprising identical initiation sentence obtains shared corpus to carrying out similar terms merging during corpus will be combined.
Further, also include after the shared corpus of acquisition:
Whether the session in shared corpus is judged to comprising multiple reply sentences, if so, then according to default rule to many Individual reply sentence carries out intelligent sequencing.
Further, matched in shared corpus includes with the reply content of current sessions content matching:
Collection is corresponding with current sessions content, and session context label value corresponding with default session context label;
Matching is corresponding with current sessions content, session context label and session context label value in shared corpus Reply sentence, as reply content.
The invention has the advantages that:
The method that intelligent session reply content is obtained based on shared corpus that the present invention is provided, by setting up and communication side Corresponding personal corpus, the personal corpus of multiple communication sides is merged, and obtains shared corpus and in shared language Matching and the reply content of current sessions content matching, solve the meeting obtained based on existing shared corpus matching in material storehouse Words reply content not accurately technical problem.Not only reduce the workload of the shared corpus of manual creation, and being total to of creating Enjoy that corpus is rich in content and diversified in form, with practicality higher and intelligent, so that based on the shared language material for creating Storehouse matching obtains more accurately session reply content.
In addition to objects, features and advantages described above, the present invention also has other objects, features and advantages. Below with reference to figure, the present invention is further detailed explanation.
Brief description of the drawings
The accompanying drawing for building the part of the application is used for providing a further understanding of the present invention, schematic reality of the invention Apply example and its illustrate, for explaining the present invention, not build inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is that the preferred embodiment of the present invention is based on the method flow diagram that shared corpus obtains intelligent session reply content;
Fig. 2 be the preferred embodiment of the present invention be directed to simplify intelligent session is obtained based on shared corpus replying for embodiment The method flow diagram of content.
Specific embodiment
Embodiments of the invention are described in detail below in conjunction with accompanying drawing, but the present invention can be defined by the claims Multitude of different ways with covering is implemented.
Reference picture 1, the preferred embodiments of the present invention are provided one kind and are obtained in intelligent session reply based on shared corpus The method of appearance, including:
Step S101, sets up personal corpus corresponding with communication side, wherein, the number of communication side is more than one;
Step S102, the personal corpus of multiple communication sides is merged, and obtains shared corpus;
Step S103, the reply content of matching and current sessions content matching in shared corpus, and by reply content As session reply content corresponding with current sessions content.
The method that intelligent session reply content is obtained based on shared corpus provided in an embodiment of the present invention, by set up with The corresponding personal corpus in communication side, the personal corpus of multiple communication sides is merged, acquisition share corpus and Matching and the reply content of current sessions content matching, are solved and are obtained based on existing shared corpus matching in shared corpus Session reply content not accurately technical problem.Not only reduce the workload of the shared corpus of manual creation, Er Qiechuan The shared corpus built is rich in content and diversified in form, with practicality higher and intelligent, so that based on being total to for creating Enjoy corpus matching and obtain more accurately session reply content.
It should be noted that because the present embodiment is to obtain shared language material by merging the personal corpus of multiple communication sides Storehouse, therefore when personal corpus corresponding with communication side is set up, the number of communication side need to be more than one, namely need to create at least two The personal corpus of communication side.
Alternatively, setting up personal corpus corresponding with communication side includes:
Gather the session content of communication side;
Obtain the session pair in session content;
According to default scene tag, collection obtains session pair scene tag value corresponding with scene tag;
Session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate Personal corpus corresponding with communication side.
The session content that the embodiment of the present invention passes through collection communication side, obtains the session pair in session content, according to default Scene tag, collection obtain corresponding with the scene tag scene tag value of session pair and by session to, scene tag and Scene tag value corresponding with scene tag carries out matching combination, so that personal corpus corresponding with communication side is generated, not only Greatly reduce the artificial workload for setting up personal corpus, and according to session to, scene tag and with scene tag pair The scene tag value answered generates personal corpus corresponding with communication side, can preferably simulate true session context, further makes The shared corpus that must be created also can the preferable true session context of simulation, and cause to be obtained based on the shared corpus for creating More accurately session reply content.
It should be noted that, the embodiment of the present invention is by session to, scene tag and scene tag corresponding with scene tag Value carries out matching combination producing individual's corpus, namely according to the content matching group of " session right+scene tag+scene tag value " Normally, personal corpus is generated.Further, since different session contents has different scene characteristics, such as session content Theme, session intention, Session Time, session place, session both sides' relation etc., therefore the present embodiment obtains the meeting in session content To rear, further according to default scene tag, collection obtains session pair scene tag value corresponding with scene tag to words, and will Session carries out matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate personal corpus. Scene tag in the present embodiment by User Defined or automatic acquisition, for example, can be session content theme, and session communication is double Side time, place, the date, session intention, weather, season, sex, occupation, post, mood, hobby, body-sensing data, Health status, real-time behavior state, constellation, blood group, the bipartite relation of session communication, age gap away from, seniority in the family gap, both sides The interval time of session communication, frequency, time span, the sentence pattern of session content, sentence class, sentence structure type, and total amount mark One or more combination in label etc..
And the present embodiment collection is when obtaining corresponding with the scene tag scene tag value of session pair, different sides can be taken Method realization, the method for specifically including direct collection, such as place scene tag value, can be by the GPS of mobile terminal certainly Dynamic collection is obtained;The method of reasoning, such as communication two party relation scene tag value, can be by other acquired fields The reasoning of scape label value is obtained;The method with the term vector of session relevance is calculated, for example, is intended to collection label value for session, Can be obtained with the term vector of session relevance by calculating;The method of neural network learning, such as mood scene mark Label value, the grader that session content or other acquired scene tag value inputs are trained can be classified obtain.Additionally, The present embodiment can also automatically obtain scene tag value with reference to one or more method described above.
Alternatively, the session in session content is obtained to including:
According to the semanteme of session sentence in session content, determine the initiation sentence in session content and reply sentence;
According to default type judgment rule, it is determined that initiating sentence and replying the type of sentence;
Reply sentence according to initiating between sentence and initiation sentence and next initiation sentence extracts basic session pair;
Sentence to, basic session centering is initiated according to basic session and the type of sentence is replied, at least one session pair is extracted.
The existing session pair extracted from session content or question and answer pair, often the session of question-response is to form, and In actual conversation procedure, communication two party conversates and not complies fully with the conversation modes of question-response, such as communication The session sentence that other side sends, communication side may reply several session sentences, or for a plurality of session sentence that communication other side sends, lead to News side may only reply a session sentence.
Therefore it is right if only the form extraction dialogue of question-response is taken, it is understood that there may be problems with:
(1) for the session content that some do not represent in question-response form, session pair is extracted from session content Difficulty is larger, and precision is relatively low.The session content that sentence+multiple replys sentence form for example is initiated for multiple, session is therefrom extracted Pair when, it is necessary to analyze reply sentence match with each initiation sentence, process is complicated, greatly, and precision is relatively low for difficulty.
(2) due to it is existing according to session content extract question and answer pair or session to be typically all standard of comparison session sentence, Or session sentence relatively simple for structure, so as to cause the session sentence for some complicated or non-standard structures precisely to have extracted Whole property is good and practicality session pair high.
(3) further, since the integrality of the session pair extracted in question-response form is more easily damaged, so as to cause to extract Session to being unable to the true session of accurate simulation.Regarding to the issue above, the present invention proposes one kind according to initiation sentence and replys sentence Type method that session pair is extracted from session content.
For the problem, the present embodiment determines the hair in session content by the semanteme according to session sentence in session content The first line of a poem and reply sentence, according to default type judgment rule, it is determined that initiate sentence and reply the type of sentence, according to initiation sentence and hair The reply sentence that the first line of a poem and next are initiated between sentence extracts basic session pair, and according to basic session to, basic session centering Initiate sentence and reply the type of sentence, extract at least one session pair, solve prior art extract session pair difficulty is larger, essence The relatively low technical problem of degree, has broken the limitation of the session to form of traditional question-response, and according to initiation sentence and return The type of complex sentence, can not only fast and effeciently extract session pair, and the session pair extracted precision and the degree of accuracy also carry significantly Rise.Additionally, for the session sentence of some complicated or non-standard structures, it is good and practical that the embodiment of the present invention can precisely extract integrality Property session pair high so that the session extracted to can the true session of accurate simulation, intelligence degree is higher.Further, The session that the embodiment of the present invention is extracted to various informative, be conducive to it is dialogue-based to precisely matching intelligent replying content, and With various informative intelligent replying content is obtained, practicality is higher.
It should be noted that the present embodiment it is determined that initiate sentence and reply sentence type before, first preset initiate sentence and The type and type judgment rule corresponding with type of sentence are replied, so that according to default type judgment rule, can be quick It is determined that initiating sentence and replying the type of sentence.
The present embodiment can be by gathering the session of the instant messaging account of communication side, Email Accounts, microblogging number, cell-phone number Content obtains session content, and wherein session content is text, picture, voice, video or animation form, and when session content is language When sound, picture, video or animation form, also including the session content of voice, picture, video or animation form is converted into text The session content of form.
Alternatively, according to the semanteme of session sentence in session content, determine that the sentence of the initiation in session content and reply sentence include:
Judge whether the sentence of the session in session content has communication other side to send above in Preset Time interval, if nothing, Then session sentence is defined as initiating sentence;
If so, then judge session sentence whether with communication other side send above without semantic association, if so, then by session sentence really It is set to initiation sentence, otherwise is defined as replying sentence by session sentence.
In order to precisely extract the session pair in session content, the present embodiment is first according to the language of session sentence in session content Justice, determines the initiation sentence in session content and replys sentence, then further determines to initiate sentence and replys the type of sentence, so that root Session pair is precisely extracted according to the type initiated sentence and reply sentence.Wherein, the present embodiment it is signified according to session sentence in session content Semanteme, the detailed process for determining initiation sentence in session content and replying sentence is:Judge the session sentence in session content pre- If whether there is communication other side to send above in time interval, if nothing, session sentence is defined as initiating sentence, if so, then judging Session sentence whether with communication other side send above without semantic association, if so, then by session sentence be defined as initiate sentence, otherwise will Words sentence is defined as replying sentence.
In the conversation procedure of reality, if current sessions sentence is interval interior without the upper of communication other side's transmission in Preset Time Text, is typically construed as initiating the initial sentence of session, namely initiate sentence.For example assume current sessions sentence for December 3 sent Session sentence, upper session sentence is to communicate the session sentence that other side sent in December 1, it is assumed that default time interval is 1 day, Then by judging, current sessions sentence sends above in Preset Time is interval without communication other side, then by current sessions sentence Be considered initiate session initial sentence, also will current sessions sentence be judged to initiate sentence.And the default time interval of the present embodiment Specifically by User Defined, for example, can be 1 hour, half a day, one day, one month etc., namely current sessions sentence ought be judged Sent above without communication other side in 1 hour, half a day, one day, one month, then judge current sessions sentence as sentence is initiated.
Additionally, when session sentence have communication other side send above when, be can determine whether according to actual session content, session sentence may It is to reply the sentence of reply above that communication other side sends;It is likely to not be to reply communication other side to send above, but sends out again Play the initiation sentence of session;Or simultaneously be reply communication other side send above reply sentence and again initiation session initiation Sentence.For such case, the present embodiment is by judging whether session sentence with communication other side sends comes true without semantic association above Determine the type of session sentence.It should be noted that whether session sentence closes without semanteme above with what communication other side sent in the present embodiment Connection, specifically refers to whether session sentence includes the sentence without semantic association above sent with communication other side.
For example, when session sentence has communication other side to send above, and communication other side A send above for " recently how Sample", then for session sentence (the communication side B of the first situation:" pretty good "), can determine whether out that session sentence does not include and communication The sentence without semantic association above that other side sends, now determines session sentence to reply sentence;For second session of situation Sentence (communication side B:" me is helped to pay telephone charge"), can determine whether out that session sentence is included with communication other side's transmission above without language The sentence of justice association, now determines session sentence to initiate sentence;For session sentence (the communication side B of the third situation:" it is pretty good, Me is helped to pay telephone charge"), can determine whether out that session sentence is same is included with communication other side's transmission above without semantic association Sentence (" helps me to pay telephone charge"), now determine session sentence to initiate sentence.
The present embodiment is by judging whether the sentence of the session in session content has communication other side to send in Preset Time interval Above and there is communication other side to send above when judge session sentence whether with communication other side send above without semantic pass Connection, can precisely determine the initiation sentence and reply sentence in session content, be follow-up accurate according to the initiation for determining sentence and reply sentence Extract session pair and laid the foundation to setting up personal corpus according to the session extracted.
Alternatively, according to default type judgment rule, it is determined that the type for initiating sentence includes:
Judge to initiate whether sentence is with complete independent semantic sentence, if so, then judging to initiate whether sentence is had by multiple It is made up of complete independent semantic simple sentence, if so, the type for initiating sentence then is defined as into complex sentence initiates sentence type, otherwise it is simple sentence Initiate sentence type;If it is not, whether then judge to initiate sentence comprising having complete independent semantic simple sentence, if comprising sentence will be initiated Type be defined as non-standard complex sentence and initiate sentence type, be that non-standard simple sentence initiates sentence type if not including;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard simple sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard complex sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether initiate sentence whether can be continuous above and below with oneself Session sentence is merged into the sentence group of semantic association, if so, the type that will then initiate sentence derives expands to sentence mass-sending first line of a poem type, otherwise Do not carry out deriving extension.
In actual implementation process, initiating sentence may be presented with polytype, for example simple sentence, complex sentence, non-standard Sentence etc., and it is different types of initiate sentence may influence or cause extract session to difference.For the problem, the present embodiment According to default type judgment rule, it is determined that initiating the type of sentence.Specifically, sentence is being initiated with complete independent semanteme first Under the premise of, by judging that initiating the simple sentence that sentence is by or multiple is completely independently semantic constitutes, it is determined that initiating sentence for simple sentence Or complex sentence initiates sentence type, and on the premise of sentence is initiated without complete independent semanteme, by judging whether initiate sentence Determine the type for initiating sentence for non-standard complex sentence also criteria of right and wrong simple sentence initiates sentence comprising the simple sentence with complete independent semanteme Type;Then initiated by searching for non-standard simple sentence and non-standard complex sentence the initiations sentence of sentence type whether have oneself above with Literary continuous session sentence, and whether can be merged into complete independent semantic language with the session continuous above and below of oneself sentence Sentence, it is determined whether the type derivative that will initiate sentence expands to non-standard sentence mass-sending first line of a poem type;Finally by judging simple sentence, multiple Whether the initiation sentence of sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has the continuous above and below of oneself Session sentence, it is determined that whether the type for initiating sentence can derive expands to sentence mass-sending first line of a poem type.
Specifically, the present embodiment determines that being divided into three differentiation processes, i.e., first on the process nature for initiate sentence type sentences Other process is to initiate sentence to each to initiate sentence type (simple sentence, complex sentence, non-standard simple sentence and non-standard complex sentence) according to four kinds Differentiated one by one;Second differentiation process is after first differentiation process has been carried out, then to differentiate non-standard simple sentence and non- Whether the initiation sentence of standard complex sentence initiation sentence type can further derive expands to non-standard sentence mass-sending first line of a poem type;3rd is sentenced Other process be after second differentiation process has been carried out, then differentiate simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and Whether the initiation sentence of non-standard sentence realm type can further derive expands to sentence mass-sending first line of a poem type.
On the one hand the present embodiment is conducive to carrying out sentence structure and composition to initiating sentence by determining to initiate the type of sentence Depth analysis, on the other hand, based on type judgement and structural analysis is carried out to initiating sentence, are conducive to more accurate extraction practicality high And various informative session pair, storehouse quality is built to the shared corpus set up based on the session extracted so as to improve, and More accurately session reply content is obtained based on the shared corpus matching for creating.It should be noted that being initiated in the present embodiment Whether sentence has the session continuous above and below sentence of oneself to specifically refer to initiate whether sentence has the sender for sending initiation sentence to send Session continuous above and below sentence.
Alternatively, according to default type judgment rule, it is determined that the type for replying sentence includes:
Judge to reply whether sentence is with complete independent semantic sentence, if so, then judging to reply whether sentence is had by multiple It is made up of complete independent semantic simple sentence, if so, the type for replying sentence then is defined as into complex sentence replys sentence type, otherwise it is simple sentence Reply sentence type;If it is not, whether then judge to reply sentence comprising having complete independent semantic simple sentence, if comprising sentence will be replied Type be defined as non-standard complex sentence and reply sentence type, be that non-standard simple sentence replys sentence type if not including;
Search for whether the reply sentence of non-standard simple sentence reply sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard simple sentence reply sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is replied into sentence The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Search for whether the reply sentence of non-standard complex sentence reply sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard complex sentence reply sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is replied into sentence The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Whether judge the reply sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether reply sentence whether can be continuous above and below with oneself Session sentence is merged into the sentence group of semantic association, if so, the type derivative that will then reply sentence expands to sentence group replys sentence type, otherwise Do not carry out deriving extension.
The present embodiment judges that the principle and process of the type replied the type of sentence and judge initiation sentence are essentially identical, therefore no longer Describe in detail.And on the one hand the present embodiment is conducive to carrying out sentence structure and composition to replying sentence by determining to reply the type of sentence Depth analysis, on the other hand, based on type judgement and structural analysis is carried out to replying sentence, are conducive to more accurate extraction practicality high And various informative session pair, storehouse quality is built to the shared corpus set up based on the session extracted so as to improve, and More accurately session reply content is obtained based on the shared corpus matching for creating.It should be noted that being replied in the present embodiment Whether sentence has the session continuous above and below sentence of oneself to specifically refer to reply whether sentence has the sender for sending the reply sentence The session continuous above and below sentence for sending.
Alternatively, according to basic session to, the type of sentence is initiated in basic session centering and sentence is replied in basic session centering Type, extract at least one session to including:
The type that sentence is initiated in basic session centering is carried out to derive extension, polytype initiation sentence is obtained;
The type that sentence is replied in basic session centering is carried out to derive extension, polytype reply sentence is obtained;
According to polytype initiation sentence and polytype reply sentence, the session pair of at least one semantic association is combined Extracted.
Due in the present embodiment initiate sentence and reply sentence type include it is various, for example simple sentence, complex sentence, non-standard simple sentence, Non-standard complex sentence, non-standard sentence group, sentence mass-sending first line of a poem type, and it is simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence, nonstandard Quasi- sentence group, sentence group reply sentence type, therefore are extracting basic session to rear, high and various informative in order to more precisely extract practicality Session pair, the type that sentence is initiated in basic session centering derive extension, the polytype hair of acquisition by the present embodiment first The first line of a poem, then carries out the type that sentence is replied in basic session centering to derive extension, obtains polytype reply sentence, finally according to Polytype initiation sentence and polytype reply sentence, combine the session of at least one semantic association to extracting, from And the multiple sessions pair of acquisition can be combined.
For example assume that it is that complex sentence initiates sentence type to initiate sentence type, it is that complex sentence replys sentence type to reply sentence, then by type After derivative extension, simple sentence can be extracted initiate sentence+simple sentence and reply sentence, complex sentence is initiated sentence+simple sentence and replys sentence, simple sentence initiate sentence+ Complex sentence replys sentence, and complex sentence initiates the session pair that sentence+complex sentence replys the diversified forms such as sentence.
Alternatively, the personal corpus of multiple communication sides is merged, obtaining shared corpus includes:
The personal corpus of multiple communication sides is combined, combination corpus is obtained;
The session comprising identical initiation sentence obtains shared corpus to carrying out similar terms merging during corpus will be combined.
The personal corpus of the communication side created due to the present embodiment all by session to constituting, namely by session setup Sentence composition is replied in sentence and corresponding session.Therefore the present embodiment is merged by the personal corpus of multiple communication sides, is obtained When must share corpus, the personal corpus of multiple communication sides is combined first, obtains combination corpus, then will combination Session comprising identical initiation sentence in corpus obtains shared corpus to carrying out similar terms merging.
It should be noted that the present embodiment will include the session of identical initiation sentence to carrying out similar terms conjunction in combining corpus And, the answer sentence that will include the session centering of identical initiation sentence merges.For example assume that the personal corpus of communication side A includes meeting Words are to { initiating sentence:How are you getting along recently/ reply sentence:Pretty good, the personal corpus of communication side B includes session to { initiating sentence: How are you getting along recently/ reply sentence:It is as usual }, then it is identical by being included in combination corpus after by two people's corpus combinations Initiate sentence session to carrying out similar terms merging, also can by above-mentioned two individual corpus comprising it is identical initiation sentence (" recently why Sample") session pair, merge into initiate sentence:How are you getting along recently/ reply sentence 1:Pretty good;Reply sentence 2:It is as usual }.
The present embodiment can be obtained by that will include the session of identical initiation sentence to carrying out similar terms merging in combination corpus The shared corpus that must be simplified, is conducive to follow-up according to the shared intelligent session reply content of corpus Rapid matching acquisition.Additionally, The present embodiment can also can be obtained by that will include the session of identical reply sentence to carrying out similar terms merging in combination corpus The shared corpus simplified, is conducive to follow-up according to the shared intelligent session reply content of corpus Rapid matching acquisition.For example:You Company whereHow your company is gone toMay I ask interview addressThis 3 initiation sentences replies sentence be all:Changsha Yuelu District Tongzi slope collection openings for the able and the worthy Changsha foreign student undertaking area opposite.
Alternatively, also include after the shared corpus of acquisition:
Whether the session in shared corpus is judged to comprising multiple reply sentences, if so, then according to default rule to many Individual reply sentence carries out intelligent sequencing.
After the session for including identical initiation sentence in corpus being combined due to the present embodiment to carrying out similar terms merging, session Centering is directed to same initiation sentence, potentially includes multiple reply sentences.For the problem, the present embodiment obtain shared corpus it Afterwards also including judging the session in shared corpus to whether comprising multiple reply sentences, if so, then according to default rule to many Individual reply sentence carries out intelligent sequencing, so that the convenient shared corpus of follow-up basis quickly obtains the reply sentence for more matching.
It should be noted that the present embodiment can reply sentence according to default rule to multiple carries out intelligent sequencing, for example Intelligence is carried out according to replying the frequency of use of sentence, use habit, replying sentences to multiple using preference, use time order etc. rule Can sequence.
Alternatively, matched in shared corpus includes with the reply content of current sessions content matching:
Collection is corresponding with current sessions content, and session context label value corresponding with default session context label;
Matching is corresponding with current sessions content, session context label and session context label value in shared corpus Reply sentence, as reply content.
Automatically the shared corpus set up due to the present embodiment by session it is right+scene tag+scene tag value constitutes, therefore During based on the matching of shared corpus with the reply content of current sessions content matching, the present embodiment gather first with current sessions Hold corresponding, and session context label value corresponding with default session context label, then in shared corpus matching with Current sessions content, session context label and session context label value are corresponded to and reply sentence, used as reply content.
Alternatively, default scene tag includes the first scene tag and the second scene tag, wherein
First scene tag includes:The time of session communication both sides, place, date, weather, season, body-sensing data, session One or more combination in interval time, frequency, the time span scene tag of communication two party session communication;
Second scene tag includes:Session content theme, the session intention of session communication both sides, sex, occupation, post, Mood, hobby, health status, real-time behavior state, the sentence pattern of session content, sentence class, sentence structure type, and total amount One or more combination in scene tag.
Embodiment being simplified below for one, intelligent session reply content is obtained based on shared corpus to of the invention Method is illustrated further.
Reference picture 2, it is of the invention embodiment offer is provided intelligent session reply content is obtained based on shared corpus Method, including:
Step S201, sets up personal corpus corresponding with communication side, wherein, the number of the communication side is more than one.
Specifically, it is assumed that the communication side in the present embodiment includes communication side A1 and communication side A2, due to leading to for different The method and process that news side sets up personal corpus are identical, therefore the present embodiment is only to one of communication side, such as communication side A1 sets up personal corpus and is specifically described.Specifically, the method that the present embodiment sets up personal corpus for communication side A1 Including:
Step S2001, gathers the session content of communication side.
Specifically, it is assumed that the session content of the present embodiment collection is the instant messaging account of communication side A1, Email Accounts, micro- Rich number, the session content that is conversated with communication other side B of cell-phone number, wherein, session content be text, picture, voice, video or Animation form, and when session content is voice, picture, video or animation form, also including by voice, picture, video or dynamic The session content of unrestrained form is converted to the session content of text formatting.Extracted from session content to describe the present embodiment in detail The process of session pair, the present embodiment is illustrated with simple communication side A1 with the session content of communication other side B, specific as follows:
A1:Eat
B:Eat.
B:You
A1:Me is helped to pay
A1:Take
B:100 yuan are altogether paid.
B:The people of queuing can be so many.
Step S2002, judges whether the sentence of the session in session content has what communication other side sent in Preset Time interval Above, if nothing, session sentence is defined as initiating sentence;
If so, then judge session sentence whether with communication other side send above without semantic association, if so, then by session sentence really It is set to initiation sentence, otherwise is defined as replying sentence by session sentence.
Specifically, according to above-mentioned judgment rule, it may be determined that initiation sentence and reply sentence in session content, it is assumed that this implementation Example is specifically shown in Table 1 by judging to obtain the initiation sentence in session content and replying sentence.
Table 1
Initiate sentence Reply sentence
Eat Eat.
You 100 yuan are altogether paid.
Me is helped to pay The people of queuing can be so many.
Take
Step S2003, judges to initiate whether sentence is with complete independent semantic sentence, if so, then judging that initiating sentence is It is no by multiple to there is complete independent semantic simple sentence to constitute, if so, the type for initiating sentence then is defined as into complex sentence initiates sentence type, Otherwise for simple sentence initiates sentence type, if it is not, then judge to initiate whether sentence is included with complete independent semantic simple sentence, if comprising, The type for initiating sentence is then defined as non-standard complex sentence and initiates sentence type, if not including, for non-standard simple sentence initiates sentence type;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard simple sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard complex sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether initiate sentence whether can be continuous above and below with oneself Session sentence is merged into the sentence group of semantic association, if so, will then have determined that the type of the initiation sentence of type derives expands to sentence mass-sending First line of a poem type, does not carry out otherwise deriving extension.
Specifically, it is assumed that first differentiation process of the present embodiment first in step S2003, judge to initiate sentence Type is as follows, is specifically shown in Table 2.
Table 2
Sequence number Initiate sentence Type
First initiation sentence Eat Simple sentence
Article 2 initiates sentence You Simple sentence
Article 3 initiates sentence Me is helped to pay Non-standard simple sentence
Article 4 initiates sentence Take Non-standard simple sentence
Then, second differentiation process in step S2003, i.e., by judging non-standard simple sentence and non-standard complex sentence Whether initiate the initiations sentence of sentence type has a session continuous above and below of oneself, and whether can with oneself above and under Literary continuous session sentence is merged into complete independent semantic sentence, it is determined whether initiate non-standard simple sentence and non-standard complex sentence The type derivative of sentence expands to non-standard sentence mass-sending first line of a poem type.By specific judgement, the Article 3 of the present embodiment and the Initiating sentence for four can be merged into complete independent semantic sentence, namely now Article 3 and Article 4 can be initiated into sentence Type derive and expand to non-standard sentence mass-sending first line of a poem type, be specifically shown in Table 3.
Table 3
Finally, the 3rd in step S2003 differentiation process, judges simple sentence, complex sentence, non-standard simple sentence, non-standard Whether the initiation sentence of complex sentence and non-standard sentence realm type can further derive expands to sentence mass-sending first line of a poem type.
Specifically, it can be seen from table 3, the present embodiment can not will initiate the sentence group that sentence is further merged into semantic association, I.e. in last process, do not carry out further deriving extension to initiating sentence.Therefore the final type such as institute of table 3 for obtaining initiation sentence Show.
Step S2004, according to default type judgment rule, it is determined that replying the type of sentence.
The present embodiment determines that the principle and process base of the type of sentence are initiated in the principle and process of the type for replying sentence and determination This is identical, therefore no longer describes in detail, it is assumed that the present embodiment judges that the type for replying sentence is specifically as shown in table 4.
Table 4
Step S2005, basic session is extracted according to the reply sentence initiated between sentence and initiation sentence and next initiation sentence It is right.
Specifically, when the present embodiment initiates sentence extraction session pair for first, first determine whether first initiation sentence with Whether one is initiated have reply sentence between sentence, if so, basic session pair is then extracted according to the initiation sentence and the reply sentence, by Initiate have reply sentence between sentence in first and Article 2, then initiate sentence according to first and reply sentence to extract basic session pair. It should be noted that the present embodiment is after it is determined that initiate to include reply sentence between sentence and next initiation sentence, also needs to calculate and initiate Sentence with reply sentence whether semantic association, and only in the case of semantic association, just extract basis session pair, do not extract otherwise. Present embodiment assumes that first is initiated sentence and first reply sentence semantic association, then basic session pair can be extracted, it is assumed that be Basic session is to 1, and basic session is as shown in table 5 to 1 particular content.
Similarly, when the present embodiment is initiated sentence and extracts basic session pair for Article 2, first determine whether Article 2 initiate sentence with Whether Article 3 initiates have reply sentence between sentence, and by judging, Article 2 and Article 3 are initiated not including reply between sentence Sentence, then abandon Article 2 and initiate sentence as initiation sentence.Similarly, sentence is initiated according to Article 3 and Article 4, it is assumed that can extract The basic session of semantic association is to 2, and basic session is as shown in table 5 to 2 particular content.
Table 5
Step S2006, the type that sentence is initiated in basic session centering is carried out to derive extension, obtains polytype initiation Sentence.
Specifically, six kinds are had due to initiating the type of sentence in the present embodiment, respectively simple sentence, complex sentence, non-standard simple sentence, Non-standard complex sentence, non-standard sentence group and sentence mass-sending first line of a poem type, therefore the present embodiment initiates sentence according to basic session centering first Type carry out deriving extension, due in the present embodiment basic session to the type of the initiation sentence in 1 for simple sentence initiates sentence type, Its cannot further derive be extended to other five kinds initiation sentence types, so when only include a type of initiation sentence, i.e. simple sentence The initiation sentence of sentence type is initiated, it is specific as shown in table 6.And according to basic session to the type of the initiation sentence in 2, can be further Derivative is extended to other kinds of initiation sentence, and such as simple sentence initiates sentence type, specific as shown in table 6.
Table 6
Step S2007, the type that sentence is replied in basic session centering is carried out to derive extension, obtains polytype reply Sentence.
Specifically, six kinds are had due to replying the type of sentence in the present embodiment, respectively simple sentence, complex sentence, non-standard simple sentence, Non-standard complex sentence, non-standard sentence group and sentence group reply sentence type.Therefore the present embodiment replys sentence according to basic session centering first Type carry out deriving extension, due in the present embodiment basic session to the type of the reply sentence in 1 for simple sentence replys sentence type, Its cannot further derive be extended to other five kinds reply sentence types, so when only include a type of reply sentence, i.e. simple sentence The reply sentence of sentence type is replied, it is specific as shown in table 7.And according to basic session to the type of the reply sentence in 2, can be further Derivative is extended to other kinds of reply sentence, and such as complex sentence replys sentence type, specific as shown in table 7.
Table 7
Step S2008, according to polytype initiation sentence and polytype reply sentence, combination at least one is semantic to close The session of connection is to extracting.
Specifically, there was only one kind due to 1, initiating sentence for basic session and replying the type of sentence, so when can only carry A session pair is taken, and is directed to basic session to 2, be various due to initiating the type of sentence and the type of complex sentence, therefore can be combined and obtain Multiple sessions pair are obtained, 8 are specifically shown in Table, table 8 is to 26 sessions pair extracted according to basic session.
Table 8
Step S2009, according to default scene tag, collection obtains session pair scene tag corresponding with scene tag Value.
Specifically, the present embodiment in collection with session to scene tag value corresponding and corresponding with default scene tag When, scene tag is preset first, then for each session to gathering scene tag corresponding with default scene tag respectively Value.Assuming that the default scene tag of the present embodiment includes session content theme, session intention, place, weather, session communication both sides Relation, the age of communication object, the multiple combination of occupation, then can collect with each session to corresponding scene tag Value, is specifically shown in Table 9.It should be noted that in the present embodiment due to session to 1- sessions to 6 based on session to 2 Derivative extension session pair, thus it is identical to the 2 corresponding scene tag value of scene tag with basic session.Additionally, the present embodiment pin To different dialogues to that can set different scene tags, and the number of the scene tag for setting can also be different.
Table 9
Step S2010, it will words carry out match group to, scene tag and scene tag value corresponding with scene tag Close, so as to generate personal exclusive corpus.
Specifically, the present embodiment carries out session to, scene tag and scene tag value corresponding with scene tag With combination, so as to generate personal exclusive corpus, namely combined according to the content of " session right+scene tag+scene tag value " Rule, generates the personal exclusive corpus of communication side A1.
Step S202, the personal corpus of multiple communication sides is merged, and obtains shared corpus.
Specifically, the present embodiment sets up the method and process of personal corpus and the method for communication side A1 for communication side A2 It is identical with process.And the detailed process that the personal corpus of communication side A1 and communication side A2 is merged is by the present embodiment:It is first First the personal corpus of communication side A1 and communication side A2 is combined, combination corpus is obtained, then by combination corpus Session comprising identical initiation sentence obtains shared corpus to carrying out similar terms merging.
Whether step S203, judges the session in shared corpus to comprising multiple reply sentences, if so, then according to default Rule is replied sentence and carries out intelligent sequencing to multiple.
After the session for including identical initiation sentence in corpus being combined due to the present embodiment to carrying out similar terms merging, session Centering is directed to same initiation sentence, potentially includes multiple reply sentences.Therefore the present embodiment is further sentenced after shared corpus is obtained Whether the disconnected session shared in corpus is to including multiple reply sentences, if so, then being entered to multiple reply sentence according to default rule Row intelligent sequencing.Specifically, the present embodiment can be according to replying the frequency of use of sentence, use habit, use preference, use time Order etc. rule is replied sentence and carries out intelligent sequencing to multiple.
Step S204, the reply content of matching and current sessions content matching in shared corpus, and by reply content As session reply content corresponding with current sessions content.
Specifically, the present embodiment gathers corresponding with current sessions content first, and with default session context label pair The session context label value answered, then matching and current sessions content, session context label and session in session corpus Scene tag value is corresponded to and replys sentence, used as reply content.For example, it is assumed that current sessions content is " to help me to pay the fees", then According to the session corpus set up, matching session reply content can be quickly obtained.Due to being obtained according to session corpus The session reply content for taking potentially includes multiple options, and in actual implementation process, user can select most proper as needed When session reply content, or system chooses the session reply content most associated with current sessions content and returned automatically automatically It is multiple.
The method that intelligent session reply content is obtained based on shared corpus provided in an embodiment of the present invention, by set up with The corresponding personal corpus in communication side, the personal corpus of multiple communication sides is merged, acquisition share corpus and Matching and the reply content of current sessions content matching, are solved and are obtained based on existing shared corpus matching in shared corpus Session reply content not accurately technical problem.Not only reduce the workload of the shared corpus of manual creation, Er Qiechuan The shared corpus built is rich in content and diversified in form, with practicality higher and intelligent, so that based on being total to for creating Enjoy corpus matching and obtain more accurately session reply content.It is not difficult to find out simultaneously, compared to directly according to multiple communication sides Session content creates shared corpus, and the present embodiment obtains shared corpus more by merging the personal corpus of multiple communication sides It is easy and quick.
The preferred embodiments of the present invention are these are only, is not intended to limit the invention, for those skilled in the art For member, the present invention can have various modifications and variations.All any modifications within the spirit and principles in the present invention, made, Equivalent, improvement etc., should be included within the scope of the present invention.

Claims (10)

1. a kind of method that intelligent session reply content is obtained based on shared corpus, it is characterised in that including:
Personal corpus corresponding with communication side is set up, wherein, the number of the communication side is more than one;
The personal corpus of the multiple communication side is merged, shared corpus is obtained;
Matching and the reply content of current sessions content matching in the shared corpus, and using the reply content as with The corresponding session reply content of the current sessions content.
2. the method that intelligent session reply content is obtained based on shared corpus according to claim 1, it is characterised in that Setting up personal corpus corresponding with communication side includes:
Gather the session content of communication side;
Obtain the session pair in the session content;
According to default scene tag, collection obtains the session pair scene tag value corresponding with the scene tag;
The session is carried out into matching combination to, the scene tag and scene tag value corresponding with the scene tag, So as to generate personal corpus corresponding with the communication side.
3. the method that intelligent session reply content is obtained based on shared corpus according to claim 2, it is characterised in that The session in the session content is obtained to including:
According to the semanteme of session sentence in the session content, determine the initiation sentence in the session content and reply sentence;
According to default type judgment rule, the type of the initiation sentence and the reply sentence is determined;
Basic session pair is extracted according to the reply sentence that the initiation sentence and initiation sentence and next are initiated between sentence;
Sentence to, the basic session centering is initiated according to the basic session and the type of sentence is replied, at least one session is extracted It is right.
4. the method that intelligent session reply content is obtained based on shared corpus according to claim 3, it is characterised in that According to the semanteme of session sentence in the session content, determine that the sentence of the initiation in the session content and reply sentence include:
Judge whether the sentence of the session in the session content has communication other side to send above in Preset Time interval, if nothing, Then session sentence is defined as initiating sentence;
If so, then judge session sentence whether with the communication other side send above without semantic association, if so, then will be described Session sentence is defined as initiating sentence, otherwise is defined as replying sentence by session sentence.
5. the method that intelligent session reply content is obtained based on shared corpus according to claim 4, it is characterised in that According to default type judgment rule, determining the type of the initiation sentence includes:
Judge whether whether the initiation sentence is with complete independent semantic sentence, if so, then judging the initiation sentence by many It is individual to be constituted with complete independent semantic simple sentence, if so, the type of the initiation sentence then is defined as into complex sentence initiates sentence type, it is no Then for simple sentence initiates sentence type;If it is not, whether the initiation sentence is then judged comprising having complete independent semantic simple sentence, if bag Contain, then the type of the initiation sentence is defined as into non-standard complex sentence initiates sentence type, if not including, for non-standard simple sentence is initiated Sentence type;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard simple sentence initiates the initiation sentence of sentence type can be with The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, by non-standard list The type derivative that sentence initiates the initiation sentence of sentence type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out Derivative extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard complex sentence initiates the initiation sentence of sentence type can be with The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, will be non-standard multiple The type derivative that sentence initiates the initiation sentence of sentence type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out Derivative extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether whether the initiation sentence can be with oneself above and below Continuous session sentence is merged into the sentence group of semantic association, if so, then derive the type of the initiation sentence expanding to the sentence mass-sending first line of a poem Type, does not carry out otherwise deriving extension.
6. the method that intelligent session reply content is obtained based on shared corpus according to claim 4, it is characterised in that According to default type judgment rule, determining the type of the reply sentence includes:
Judge whether whether the reply sentence is with complete independent semantic sentence, if so, then judging the reply sentence by many It is individual to be constituted with complete independent semantic simple sentence, if so, the type of the reply sentence then is defined as into complex sentence replys sentence type, it is no Then for simple sentence replys sentence type;If it is not, whether the reply sentence is then judged comprising having complete independent semantic simple sentence, if bag Contain, then the type of the reply sentence is defined as into non-standard complex sentence replys sentence type, if not including, for non-standard simple sentence is replied Sentence type;
Search for whether the reply sentence of non-standard simple sentence reply sentence type has the session continuous above and below sentence of oneself, if Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard simple sentence replys the reply sentence of sentence type can be with The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, by non-standard list The type derivative of the reply sentence of sentence reply sentence type expands to non-standard sentence group and replys sentence type, if can not, do not carry out Derivative extension;
Search for whether the reply sentence of non-standard complex sentence reply sentence type has the session continuous above and below sentence of oneself, if Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard complex sentence replys the reply sentence of sentence type can be with The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, will be non-standard multiple The type derivative of the reply sentence of sentence reply sentence type expands to non-standard sentence group and replys sentence type, if can not, do not carry out Derivative extension;
Whether judge the reply sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether whether the reply sentence can be with oneself above and below Continuous session sentence is merged into the sentence group of semantic association, and sentence is replied if so, then deriving the type of the reply sentence and expanding to sentence group Type, does not carry out otherwise deriving extension.
7. the method that intelligent session reply content is obtained based on shared corpus according to claim 6, it is characterised in that According to basic session to, the type of sentence is initiated in the basic session centering and the type of sentence is replied in the basic session centering, At least one session is extracted to including:
The type that sentence is initiated in the basic session centering is carried out deriving extension, polytype initiation sentence is obtained;
The type that sentence is replied in the basic session centering is carried out deriving extension, polytype reply sentence is obtained;
According to polytype meeting initiated sentence and polytype reply sentence, combine at least one semantic association Words are to extracting.
8. the method that intelligent session reply content is obtained based on shared corpus according to claim 7, it is characterised in that The personal corpus of the multiple communication side is merged, obtaining shared corpus includes:
The personal corpus of the multiple communication side is combined, combination corpus is obtained;
By the session comprising identical initiation sentence in the combination corpus to carrying out similar terms merging, shared corpus is obtained.
9. the method that intelligent session reply content is obtained based on shared corpus according to claim 8, it is characterised in that Also include after the shared corpus of acquisition:
Whether the session in the shared corpus is judged to comprising multiple reply sentences, if so, then according to default rule to many The individual reply sentence carries out intelligent sequencing.
10. the method that intelligent session reply content is obtained based on shared corpus according to claim 9, its feature is existed In matched in the shared corpus includes with the reply content of current sessions content matching:
Collection is corresponding with current sessions content, and session context label value corresponding with default session context label;
Matched in the shared corpus and the current sessions content, the session context label and the session context The corresponding reply sentence of label value, as reply content.
CN201710076115.2A 2017-02-13 2017-02-13 Method of obtaining intelligent conversation reply content based on shared corpora Pending CN106709072A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710076115.2A CN106709072A (en) 2017-02-13 2017-02-13 Method of obtaining intelligent conversation reply content based on shared corpora

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710076115.2A CN106709072A (en) 2017-02-13 2017-02-13 Method of obtaining intelligent conversation reply content based on shared corpora

Publications (1)

Publication Number Publication Date
CN106709072A true CN106709072A (en) 2017-05-24

Family

ID=58911307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710076115.2A Pending CN106709072A (en) 2017-02-13 2017-02-13 Method of obtaining intelligent conversation reply content based on shared corpora

Country Status (1)

Country Link
CN (1) CN106709072A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018145436A1 (en) * 2017-02-13 2018-08-16 长沙军鸽软件有限公司 Method for extracting conversation pair from conversation content
CN109151044A (en) * 2018-09-06 2019-01-04 广州酷狗计算机科技有限公司 Information-pushing method, device, electronic equipment and storage medium
CN109388717A (en) * 2018-07-20 2019-02-26 北京智能点科技有限公司 A kind of method and system of Mass production corpus
CN110309408A (en) * 2018-03-09 2019-10-08 陈包容 A method of automation search
CN110706704A (en) * 2019-10-17 2020-01-17 四川长虹电器股份有限公司 Method, device and computer equipment for generating voice interaction prototype
CN113672698A (en) * 2021-08-01 2021-11-19 北京网聘咨询有限公司 Intelligent interviewing method, system, equipment and storage medium based on expression analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101068177A (en) * 2007-03-27 2007-11-07 腾讯科技(深圳)有限公司 Interdynamic question-answering system and realizing method thereof
CN105389296A (en) * 2015-12-11 2016-03-09 小米科技有限责任公司 Information partitioning method and apparatus
CN106294774A (en) * 2016-08-11 2017-01-04 北京光年无限科技有限公司 User individual data processing method based on dialogue service and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101068177A (en) * 2007-03-27 2007-11-07 腾讯科技(深圳)有限公司 Interdynamic question-answering system and realizing method thereof
CN105389296A (en) * 2015-12-11 2016-03-09 小米科技有限责任公司 Information partitioning method and apparatus
CN106294774A (en) * 2016-08-11 2017-01-04 北京光年无限科技有限公司 User individual data processing method based on dialogue service and device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018145436A1 (en) * 2017-02-13 2018-08-16 长沙军鸽软件有限公司 Method for extracting conversation pair from conversation content
CN110309408A (en) * 2018-03-09 2019-10-08 陈包容 A method of automation search
CN110309408B (en) * 2018-03-09 2023-07-14 陈包容 Automatic searching method
CN109388717A (en) * 2018-07-20 2019-02-26 北京智能点科技有限公司 A kind of method and system of Mass production corpus
CN109151044A (en) * 2018-09-06 2019-01-04 广州酷狗计算机科技有限公司 Information-pushing method, device, electronic equipment and storage medium
CN109151044B (en) * 2018-09-06 2021-08-27 广州酷狗计算机科技有限公司 Information pushing method and device, electronic equipment and storage medium
CN110706704A (en) * 2019-10-17 2020-01-17 四川长虹电器股份有限公司 Method, device and computer equipment for generating voice interaction prototype
CN113672698A (en) * 2021-08-01 2021-11-19 北京网聘咨询有限公司 Intelligent interviewing method, system, equipment and storage medium based on expression analysis
CN113672698B (en) * 2021-08-01 2024-05-24 北京网聘信息技术有限公司 Intelligent interview method, system, equipment and storage medium based on expression analysis

Similar Documents

Publication Publication Date Title
CN106709072A (en) Method of obtaining intelligent conversation reply content based on shared corpora
Mügge et al. Intersectionality and the politics of knowledge production
US10742574B2 (en) Method and device for implementing instant communication
AU2016201139A1 (en) Conversational question and answer
CN106874452A (en) A kind of method for obtaining session reply content
CN106649280B (en) A method of creating shared corpus
CN107103083A (en) A kind of method that robot realizes intelligent session
CN106874451A (en) A kind of method of the personal exclusive corpus of automatic foundation
CN107025607B (en) Accurate positioning social processing method
CN106407405A (en) A social contact system based on love and marriage matching degree search
Körs How Religious Communities Respond to Religious Diversity
CN106844734A (en) A kind of method for automatically generating session reply content
CN113420058B (en) Conversational academic conference recommendation method based on combination of user historical behaviors
CN106844735A (en) A kind of method of the personal exclusive corpus of automatic foundation
CN106657157A (en) Method for extracting session pairs from session contents
CN114257570B (en) Processing method, device, equipment and medium based on multi-user session
CN107015968A (en) A kind of method that session is actively initiated based on shared corpus
CN114363277A (en) Intelligent chatting method and device based on social relationship and related products
Prasojo et al. The Usage of Group Chatting Platform for English Skills
Addy et al. Conviviality as a Vision and Approach for a Diaconal Society
CN109146737B (en) Intelligent interaction method and device based on examination platform
CN107122459A (en) A kind of method that robot realizes intelligent session
CN107122458A (en) A kind of method that session is actively initiated based on shared corpus
Pio Inspirational cameos: Ethnic minority Indian women entrepreneurs in New Zealand
Huang “Taking Jesus Back to China”: new gospel agents in Shanghai

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170524

WD01 Invention patent application deemed withdrawn after publication