CN106874451A - A kind of method of the personal exclusive corpus of automatic foundation - Google Patents
A kind of method of the personal exclusive corpus of automatic foundation Download PDFInfo
- Publication number
- CN106874451A CN106874451A CN201710076038.0A CN201710076038A CN106874451A CN 106874451 A CN106874451 A CN 106874451A CN 201710076038 A CN201710076038 A CN 201710076038A CN 106874451 A CN106874451 A CN 106874451A
- Authority
- CN
- China
- Prior art keywords
- sentence
- session
- type
- reply
- standard
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The method of the personal exclusive corpus of automatic foundation that the present invention is provided, by the session content for gathering communication side, obtain the session pair in session content, according to default scene tag, collection obtains corresponding with the scene tag scene tag value of session pair and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate personal exclusive corpus, solve existing big using the workload for manually setting up session corpus and do not possess the technical problem of personal specificity.Not only greatly reduce the artificial workload for setting up session corpus, and personal specificity and stronger specific aim are had according to the session content of communication the side session pair extracted and the personal exclusive corpus that corresponding scene tag value is generated, embody personalized level higher.
Description
Technical field
The present invention relates to communication technical field, and in particular to a kind of method of the personal exclusive corpus of automatic foundation.
Background technology
At present, the session reply content for automatically replying is used in intelligent conversational system, often by matching session language material
The mode in storehouse is obtained.Session corpus in said process, mainly by manual creation.The artificial workload for building storehouse is big,
And it is universal not high to build storehouse quality.Additionally, the session corpus in prior art is nearly all common to all users, no
Possess personal specificity and specific aim.For the problem, therefore the present embodiment proposes a kind of dialogue-based content and sets up individual automatically
The method of the exclusive corpus of people.
The content of the invention
It is existing using artificial foundation meeting to solve the invention provides a kind of method of the personal exclusive corpus of automatic foundation
The workload of words corpus is big and does not possess the technical problem of personal specificity.
The method of the personal exclusive corpus of automatic foundation that the present invention is provided, including:
Gather the session content of communication side;
Obtain the session pair in session content;
According to default scene tag, collection obtains session pair scene tag value corresponding with scene tag;
Session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate
Personal exclusive corpus.
Further, the session in session content is obtained to including:
According to the semanteme of session sentence in session content, determine the initiation sentence in session content and reply sentence;
According to default type judgment rule, it is determined that initiating sentence and replying the type of sentence;
Reply sentence according to initiating between sentence and initiation sentence and next initiation sentence extracts basic session pair;
Sentence to, basic session centering is initiated according to basic session and the type of sentence is replied, at least one session pair is extracted.
Further, according to the semanteme of session sentence in session content, determine the initiation sentence in session content and reply sentence bag
Include:
Judge whether the sentence of the session in session content has communication other side to send above in Preset Time interval, if nothing,
Then session sentence is defined as initiating sentence;
If so, then judge session sentence whether with communication other side send above without semantic association, if so, then by session sentence really
It is set to initiation sentence, otherwise is defined as replying sentence by session sentence.
Further, according to default type judgment rule, it is determined that the type for initiating sentence includes:
Judge to initiate whether sentence is with complete independent semantic sentence, if so, then judging to initiate whether sentence is had by multiple
It is made up of complete independent semantic simple sentence, if so, the type for initiating sentence then is defined as into complex sentence initiates sentence type, otherwise it is simple sentence
Initiate sentence type;If it is not, whether then judge to initiate sentence comprising having complete independent semantic simple sentence, if comprising sentence will be initiated
Type be defined as non-standard complex sentence and initiate sentence type, be that non-standard simple sentence initiates sentence type if not including;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if
Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard simple sentence initiation sentence type can be with oneself
Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is initiated into sentence
The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if
Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard complex sentence initiation sentence type can be with oneself
Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is initiated into sentence
The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly
Oneself session continuous above and below sentence, if so, then determining whether initiate sentence whether can be continuous above and below with oneself
Session sentence is merged into the sentence group of semantic association, if so, the type that will then initiate sentence derives expands to sentence mass-sending first line of a poem type, otherwise
Do not carry out deriving extension.
Further, according to default type judgment rule, it is determined that the type for replying sentence includes:
Judge to reply whether sentence is with complete independent semantic sentence, if so, then judging to reply whether sentence is had by multiple
It is made up of complete independent semantic simple sentence, if so, the type for replying sentence then is defined as into complex sentence replys sentence type, otherwise it is simple sentence
Reply sentence type;If it is not, whether then judge to reply sentence comprising having complete independent semantic simple sentence, if comprising sentence will be replied
Type be defined as non-standard complex sentence and reply sentence type, be that non-standard simple sentence replys sentence type if not including;
Search for whether the reply sentence of non-standard simple sentence reply sentence type has the session continuous above and below sentence of oneself, if
Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard simple sentence reply sentence type can be with oneself
Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is replied into sentence
The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Search for whether the reply sentence of non-standard complex sentence reply sentence type has the session continuous above and below sentence of oneself, if
Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard complex sentence reply sentence type can be with oneself
Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is replied into sentence
The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Whether judge the reply sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly
Oneself session continuous above and below sentence, if so, then determining whether reply sentence whether can be continuous above and below with oneself
Session sentence is merged into the sentence group of semantic association, if so, the type derivative that will then reply sentence expands to sentence group replys sentence type, otherwise
Do not carry out deriving extension.
Further, according to basic session to, the type of sentence is initiated in basic session centering and basic session centering is replied
The type of sentence, extracts at least one session to including:
The type that sentence is initiated in basic session centering is carried out to derive extension, polytype initiation sentence is obtained;
The type that sentence is replied in basic session centering is carried out to derive extension, polytype reply sentence is obtained;
According to polytype initiation sentence and polytype reply sentence, the session pair of at least one semantic association is combined
Extracted.
Further, according to default scene tag, collection obtains session pair scene tag value corresponding with scene tag
Including:
Default scene tag storehouse, scene tag storehouse at least includes a scene tag;
In scene tag library selection with session to the scene tag that associates;
Collection obtains session pair scene tag value corresponding with scene tag.
Further, scene tag includes:
Session content theme, the time of session communication both sides, place, date, session intention, weather, season, sex, duty
Industry, post, mood, hobby, body-sensing data, health status, real-time behavior state, constellation, blood group, session communication both sides it
Between relation, age gap away from, seniority in the family gap, the interval time of both sides' session communication, frequency, time span, the sentence of session content
One or more combination in type, sentence class, sentence structure type, and total amount label.
The invention has the advantages that:
The method of the personal exclusive corpus of automatic foundation that the present invention is provided, by gathering the session content of communication side, obtains
The session pair in session content is taken, according to default scene tag, collection obtains session pair scene mark corresponding with scene tag
Label value and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate
Personal exclusive corpus, solves existing big using the workload for manually setting up session corpus and does not possess personal specificity
Technical problem.Not only greatly reduce and manually set up the workload of session corpus, and carry according to the session content of communication side
The personal exclusive corpus of the session pair for taking and the generation of corresponding scene tag value has personal specificity and stronger pin
To property, personalized level higher is embodied.
In addition to objects, features and advantages described above, the present invention also has other objects, features and advantages.
Below with reference to figure, the present invention is further detailed explanation.
Brief description of the drawings
The accompanying drawing for building the part of the application is used for providing a further understanding of the present invention, schematic reality of the invention
Apply example and its illustrate, for explaining the present invention, not build inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the method flow diagram that the preferred embodiment of the present invention sets up personal exclusive corpus automatically;
Fig. 2 is the method for the personal exclusive corpus of the automatic foundation for simplifying embodiment one that the preferred embodiment of the present invention is directed to
Flow chart;
Fig. 3 is the method for the personal exclusive corpus of the automatic foundation for simplifying embodiment two that the preferred embodiment of the present invention is directed to
Flow chart.
Specific embodiment
Embodiments of the invention are described in detail below in conjunction with accompanying drawing, but the present invention can be defined by the claims
Multitude of different ways with covering is implemented.
Reference picture 1, the preferred embodiments of the present invention provide a kind of method of the personal exclusive corpus of automatic foundation, bag
Include:
Step S101, gathers the session content of communication side;
Step S102, obtains the session pair in session content;
Step S103, according to default scene tag, collection obtains session pair scene tag value corresponding with scene tag;
Step S104, it will words carry out matching combination to, scene tag and scene tag value corresponding with scene tag,
So as to generate personal exclusive corpus.
The method of the personal exclusive corpus of automatic foundation provided in an embodiment of the present invention, by the session for gathering communication side
Hold, obtain the session pair in session content, according to default scene tag, collection obtains session pair field corresponding with scene tag
Scape label value and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so that
Personal exclusive corpus is generated, is solved existing big using the workload for manually setting up session corpus and is not possessed personal exclusive
The technical problem of property.The artificial workload for setting up session corpus is not only greatly reduced, and according in the session of communication side
Holding the personal exclusive corpus of the session pair and corresponding scene tag value generation extracted has personal specificity and stronger
Specific aim, embody personalized level higher.
Additionally, the embodiment of the present invention is directly according to session to, scene tag and scene tag corresponding with scene tag
The personal exclusive corpus of value generation, simulates true session context and sets up personal exclusive corpus completely so that the individual of foundation is specially
Category corpus is more precisely and practical.And the present embodiment sets up personal exclusive language material by gathering the personal session content of communication side
Storehouse so that the personal corpus of generation is constituted by the session language material that communication side and other communication other side conversate, so that
Obtaining the personal exclusive corpus set up automatically has personal specificity and stronger specific aim.
It should be noted that, the embodiment of the present invention is by session to, scene tag and scene tag corresponding with scene tag
Value carries out the personal exclusive corpus of matching combination producing, namely according to the content of " session right+scene tag+scene tag value "
With rule of combination, personal exclusive corpus is generated.Further, since different session contents has different scene characteristics, for example
Session content theme, session intention, Session Time, session place, session both sides' relation etc., therefore the present embodiment is obtained in session
To rear, further according to default scene tag, collection obtains session pair scene mark corresponding with scene tag for session in appearance
A label value, and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate
The exclusive corpus of people.Scene tag in the present embodiment by User Defined or automatic calculating, for example, can be session content master
Topic, the time of session communication both sides, place, date, session intention, weather, season, sex, occupation, post, mood, interest love
Good, body-sensing data, health status, real-time behavior state, constellation, blood group, the bipartite relation of session communication, age gap away from,
Seniority in the family gap, the interval time of both sides' session communication, frequency, time span, the sentence pattern of session content, sentence class, sentence structure class
One or more combination in type, and total amount label etc..
And the present embodiment collection is when obtaining corresponding with the scene tag scene tag value of session pair, different sides can be taken
Method realization, the method for specifically including direct collection, such as place scene tag value, can be by the GPS of mobile terminal certainly
Dynamic collection is obtained;The method of reasoning, such as communication two party relation scene tag value, can be by other acquired fields
The reasoning of scape label value is obtained;The method with the term vector of session relevance is calculated, for example, is intended to collection label value for session,
Can be obtained with the term vector of session relevance by calculating;The method of neural network learning, such as mood scene mark
Label value, the grader that session content or other acquired scene tag value inputs are trained can be classified obtain.Additionally,
The present embodiment can also automatically obtain scene tag value with reference to one or more method described above.
Alternatively, the session in session content is obtained to including:
According to the semanteme of session sentence in session content, determine the initiation sentence in session content and reply sentence;
According to default type judgment rule, it is determined that initiating sentence and replying the type of sentence;
Reply sentence according to initiating between sentence and initiation sentence and next initiation sentence extracts basic session pair;
Sentence to, basic session centering is initiated according to basic session and the type of sentence is replied, at least one session pair is extracted.
The existing session pair extracted from session content or question and answer pair, often the session of question-response is to form, and
In actual conversation procedure, communication two party conversates and not complies fully with the conversation modes of question-response, such as communication
The session sentence that other side sends, communication side may reply several session sentences, or for a plurality of session sentence that communication other side sends, lead to
News side may only reply a session sentence.
Therefore it is right if only the form extraction dialogue of question-response is taken, it is understood that there may be problems with:
(1) for the session content that some do not represent in question-response form, session pair is extracted from session content
Difficulty is larger, and precision is relatively low.The session content that sentence+multiple replys sentence form for example is initiated for multiple, session is therefrom extracted
Pair when, it is necessary to analyze reply sentence match with each initiation sentence, process is complicated, greatly, and precision is relatively low for difficulty.
(2) due to it is existing according to session content extract question and answer pair or session to be typically all standard of comparison session sentence,
Or session sentence relatively simple for structure, so as to cause the session sentence for some complicated or non-standard structures precisely to have extracted
Whole property is good and practicality session pair high.
(3) further, since the integrality of the session pair extracted in question-response form is more easily damaged, so as to cause to extract
Session to being unable to the true session of accurate simulation.Regarding to the issue above, the present invention proposes one kind according to initiation sentence and replys sentence
Type method that session pair is extracted from session content.
For the problem, the present embodiment determines the hair in session content by the semanteme according to session sentence in session content
The first line of a poem and reply sentence, according to default type judgment rule, it is determined that initiate sentence and reply the type of sentence, according to initiation sentence and hair
The reply sentence that the first line of a poem and next are initiated between sentence extracts basic session pair, and according to basic session to, basic session centering
Initiate sentence and reply the type of sentence, extract at least one session pair, solve prior art extract session pair difficulty is larger, essence
The relatively low technical problem of degree, has broken the limitation of the session to form of traditional question-response, and according to initiation sentence and return
The type of complex sentence, can not only fast and effeciently extract session pair, and the session pair extracted precision and the degree of accuracy also carry significantly
Rise.Additionally, for the session sentence of some complicated or non-standard structures, it is good and practical that the embodiment of the present invention can precisely extract integrality
Property session pair high so that the session extracted to can the true session of accurate simulation, intelligence degree is higher.Further,
The session that the embodiment of the present invention is extracted to various informative, be conducive to it is dialogue-based to precisely matching intelligent replying content, and
With various informative intelligent replying content is obtained, practicality is higher.
It should be noted that the present embodiment it is determined that initiate sentence and reply sentence type before, first preset initiate sentence and
The type and type judgment rule corresponding with type of sentence are replied, so that according to default type judgment rule, can be quick
It is determined that initiating sentence and replying the type of sentence.And the initiation sentence in the present embodiment specifically refers to the session without communication other side transmission above
Sentence or the session sentence without semantic association above sent with communication other side.
The present embodiment can be by gathering the session of the instant messaging account of communication side, Email Accounts, microblogging number, cell-phone number
Content obtains session content, and wherein session content is text, picture, voice, video or animation form, and when session content is language
When sound, picture, video or animation form, also including the session content of voice, picture, video or animation form is converted into text
The session content of form.
Alternatively, according to the semanteme of session sentence in session content, determine that the sentence of the initiation in session content and reply sentence include:
Judge whether the sentence of the session in session content has communication other side to send above in Preset Time interval, if nothing,
Then session sentence is defined as initiating sentence;
If so, then judge session sentence whether with communication other side send above without semantic association, if so, then by session sentence really
It is set to initiation sentence, otherwise is defined as replying sentence by session sentence.
In order to precisely extract the session pair in session content, the present embodiment is first according to the language of session sentence in session content
Justice, determines the initiation sentence in session content and replys sentence, then further determines to initiate sentence and replys the type of sentence, so that root
Session pair is precisely extracted according to the type initiated sentence and reply sentence.Wherein, the present embodiment it is signified according to session sentence in session content
Semanteme, the detailed process for determining initiation sentence in session content and replying sentence is:Judge the session sentence in session content pre-
If whether there is communication other side to send above in time interval, if nothing, session sentence is defined as initiating sentence, if so, then judging
Session sentence whether with communication other side send above without semantic association, if so, then by session sentence be defined as initiate sentence, otherwise will
Words sentence is defined as replying sentence.
In the conversation procedure of reality, if current sessions sentence is interval interior without the upper of communication other side's transmission in Preset Time
Text, is typically construed as initiating the initial sentence of session, namely initiate sentence.For example assume current sessions sentence for December 3 sent
Session sentence, upper session sentence is to communicate the session sentence that other side sent in December 1, it is assumed that default time interval is 1 day,
Then by judging, current sessions sentence sends above in Preset Time is interval without communication other side, then by current sessions sentence
Be considered initiate session initial sentence, also will current sessions sentence be judged to initiate sentence.And the default time interval of the present embodiment
Specifically by User Defined, for example, can be 1 hour, half a day, one day, one month etc., namely current sessions sentence ought be judged
Sent above without communication other side in 1 hour, half a day, one day, one month, then judge current sessions sentence as sentence is initiated.
Additionally, when session sentence have communication other side send above when, be can determine whether according to actual session content, session sentence may
It is to reply the sentence of reply above that communication other side sends;It is likely to not be to reply communication other side to send above, but sends out again
Play the initiation sentence of session;Or simultaneously be reply communication other side send above reply sentence and again initiation session initiation
Sentence.For such case, the present embodiment is by judging whether session sentence with communication other side sends comes true without semantic association above
Determine the type of session sentence.It should be noted that whether session sentence closes without semanteme above with what communication other side sent in the present embodiment
Connection, specifically refers to whether session sentence includes the sentence without semantic association above sent with communication other side.
For example, when session sentence has communication other side to send above, and communication other side A send above for " recently how
Sample", then for session sentence (the communication side B of the first situation:" pretty good "), can determine whether out that session sentence does not include and communication
The sentence without semantic association above that other side sends, now determines session sentence to reply sentence;For second session of situation
Sentence (communication side B:" me is helped to pay telephone charge"), can determine whether out that session sentence is included with communication other side's transmission above without language
The sentence of justice association, now determines session sentence to initiate sentence;For session sentence (the communication side B of the third situation:" it is pretty good,
Me is helped to pay telephone charge"), can determine whether out that session sentence is same is included with communication other side's transmission above without semantic association
Sentence (" helps me to pay telephone charge"), now determine session sentence to initiate sentence.
The present embodiment is by judging whether the sentence of the session in session content has communication other side to send in Preset Time interval
Above and there is communication other side to send above when judge session sentence whether with communication other side send above without semantic pass
Connection, can precisely determine the initiation sentence and reply sentence in session content, be follow-up accurate according to the initiation for determining sentence and reply sentence
Extract session pair and laid the foundation to setting up personal exclusive corpus according to the session extracted.
Alternatively, according to default type judgment rule, it is determined that the type for initiating sentence includes:
Judge to initiate whether sentence is with complete independent semantic sentence, if so, then judging to initiate whether sentence is had by multiple
It is made up of complete independent semantic simple sentence, if so, the type for initiating sentence then is defined as into complex sentence initiates sentence type, otherwise it is simple sentence
Initiate sentence type;If it is not, whether then judge to initiate sentence comprising having complete independent semantic simple sentence, if comprising sentence will be initiated
Type be defined as non-standard complex sentence and initiate sentence type, be that non-standard simple sentence initiates sentence type if not including;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if
Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard simple sentence initiation sentence type can be with oneself
Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is initiated into sentence
The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if
Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard complex sentence initiation sentence type can be with oneself
Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is initiated into sentence
The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly
Oneself session continuous above and below sentence, if so, then determining whether initiate sentence whether can be continuous above and below with oneself
Session sentence is merged into the sentence group of semantic association, if so, will then have determined that the type of the initiation sentence of type derives expands to sentence mass-sending
First line of a poem type, does not carry out otherwise deriving extension.
In actual implementation process, initiating sentence may be presented with polytype, for example simple sentence, complex sentence, non-standard
Sentence etc., and it is different types of initiate sentence may influence or cause extract session to difference.For the problem, the present embodiment
According to default type judgment rule, it is determined that initiating the type of sentence.Specifically, sentence is being initiated with complete independent semanteme first
Under the premise of, by judging that initiating the simple sentence that sentence is by or multiple is completely independently semantic constitutes, it is determined that initiating sentence for simple sentence
Or complex sentence initiates sentence type, and on the premise of sentence is initiated without complete independent semanteme, by judging whether initiate sentence
Determine the type for initiating sentence for non-standard complex sentence also criteria of right and wrong simple sentence initiates sentence comprising the simple sentence with complete independent semanteme
Type;Then initiated by searching for non-standard simple sentence and non-standard complex sentence the initiations sentence of sentence type whether have oneself above with
Literary continuous session sentence, and whether can be merged into complete independent semantic language with the session continuous above and below of oneself sentence
Sentence, it is determined whether the type derivative that will initiate sentence expands to non-standard sentence mass-sending first line of a poem type;Finally by judging simple sentence, multiple
Whether the initiation sentence of sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has the continuous above and below of oneself
Session sentence, it is determined that whether the type for initiating sentence can derive expands to sentence mass-sending first line of a poem type.
Specifically, the present embodiment determines that being divided into three differentiation processes, i.e., first on the process nature for initiate sentence type sentences
Other process is to initiate sentence to each to initiate sentence type (simple sentence, complex sentence, non-standard simple sentence and non-standard complex sentence) according to four kinds
Differentiated one by one;Second differentiation process is after first differentiation process has been carried out, then to differentiate non-standard simple sentence and non-
Whether the initiation sentence of standard complex sentence initiation sentence type can further derive expands to non-standard sentence mass-sending first line of a poem type;3rd is sentenced
Other process be after second differentiation process has been carried out, then differentiate simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and
Whether the initiation sentence of non-standard sentence realm type can further derive expands to sentence mass-sending first line of a poem type.
On the one hand the present embodiment is conducive to carrying out sentence structure and composition to initiating sentence by determining to initiate the type of sentence
Depth analysis, on the other hand, based on type judgement and structural analysis is carried out to initiating sentence, are conducive to more accurate extraction practicality high
And various informative session pair.It should be noted that initiating whether sentence has the meeting continuous above and below of oneself in the present embodiment
Words sentence specifically refers to initiate whether sentence has the session continuous above and below sentence for sending the sender's transmission for initiating sentence.
Alternatively, according to default type judgment rule, it is determined that the type for replying sentence includes:
Judge to reply whether sentence is with complete independent semantic sentence, if so, then judging to reply whether sentence is had by multiple
It is made up of complete independent semantic simple sentence, if so, the type for replying sentence then is defined as into complex sentence replys sentence type, otherwise it is simple sentence
Reply sentence type;If it is not, whether then judge to reply sentence comprising having complete independent semantic simple sentence, if comprising sentence will be replied
Type be defined as non-standard complex sentence and reply sentence type, be that non-standard simple sentence replys sentence type if not including;
Search for whether the reply sentence of non-standard simple sentence reply sentence type has the session continuous above and below sentence of oneself, if
Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard simple sentence reply sentence type can be with oneself
Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is replied into sentence
The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Search for whether the reply sentence of non-standard complex sentence reply sentence type has the session continuous above and below sentence of oneself, if
Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard complex sentence reply sentence type can be with oneself
Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is replied into sentence
The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Whether judge the reply sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly
Oneself session continuous above and below sentence, if so, then determining whether reply sentence whether can be continuous above and below with oneself
Session sentence is merged into the sentence group of semantic association, if so, will then have determined that the type of the reply sentence of type derives expands to sentence group time
Complex sentence type, does not carry out otherwise deriving extension.
The present embodiment judges that the principle and process of the type replied the type of sentence and judge initiation sentence are essentially identical, therefore no longer
Describe in detail.And on the one hand the present embodiment is conducive to carrying out sentence structure and composition to replying sentence by determining to reply the type of sentence
Depth analysis, on the other hand, based on type judgement and structural analysis is carried out to replying sentence, are conducive to more accurate extraction practicality high
And various informative session pair.It should be noted that replying whether sentence has the meeting continuous above and below of oneself in the present embodiment
Words sentence specifically refers to reply whether sentence has the session continuous above and below sentence for sending the sender's transmission for replying sentence.
Alternatively, according to basic session to, the type of sentence is initiated in basic session centering and sentence is replied in basic session centering
Type, extract at least one session to including:
The type that sentence is initiated in basic session centering is carried out to derive extension, polytype initiation sentence is obtained;
The type that sentence is replied in basic session centering is carried out to derive extension, polytype reply sentence is obtained;
According to polytype initiation sentence and polytype reply sentence, the session pair of at least one semantic association is combined
Extracted.
Due in the present embodiment initiate sentence and reply sentence type include it is various, for example simple sentence, complex sentence, non-standard simple sentence,
Non-standard complex sentence, non-standard sentence group, sentence mass-sending first line of a poem type, and it is simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence, nonstandard
Quasi- sentence group, sentence group reply sentence type, therefore are extracting basic session to rear, high and various informative in order to more precisely extract practicality
Session pair, the type that sentence is initiated in basic session centering derive extension, the polytype hair of acquisition by the present embodiment first
The first line of a poem, then carries out the type that sentence is replied in basic session centering to derive extension, obtains polytype reply sentence, finally according to
Polytype initiation sentence and polytype reply sentence, combine the session of at least one semantic association to extracting, from
And the multiple sessions pair of acquisition can be combined.
For example assume that it is that complex sentence initiates sentence type to initiate sentence type, it is that complex sentence replys sentence type to reply sentence, then by type
After derivative extension, simple sentence can be extracted initiate sentence+simple sentence and reply sentence, complex sentence is initiated sentence+simple sentence and replys sentence, simple sentence initiate sentence+
Complex sentence replys sentence, and complex sentence initiates the session pair that sentence+complex sentence replys the diversified forms such as sentence.
Alternatively, according to default scene tag, collection obtains session pair scene tag value bag corresponding with scene tag
Include:
Default scene tag storehouse, scene tag storehouse at least includes a scene tag;
In scene tag library selection with session to the scene tag that associates;
Collection obtains session pair scene tag value corresponding with scene tag.
It is usually first by presetting scene tag, then according to scene tag that the present embodiment collection obtains scene tag value
Collection obtains session pair two steps of scene tag value corresponding with scene tag and realizes, and in actual implementation process, by
The degree of association of session pair in different sessions to may associate different scene tags or different and different scene tags
Difference, therefore in order to more precisely obtain with session to corresponding scene tag value, the present embodiment is preset for storage scenarios mark first
The scene tag storehouse of label, then in scene tag library selection and session to the scene tag that associates, finally further according to session
Scene tag collection to associating obtains session pair scene tag value corresponding with scene tag.
Specifically, by artificial self-defined or automatic calculating and session to the scene tag that associates, such as different
Session pair, the different scene tag of artificial selection.And this programme definition with session to the scene tag that associates, specifically can be with root
Associated according to the session content with session pair, or associated with the session content theme of session pair, or closed with the Session Time of session pair
The scene tag of connection is obtained.
It should be noted that the scene tag value in the present embodiment is result corresponding with scene tag, can be numerical value,
Can also be nonumeric, and when it is non-numeric form to collect scene tag value, typically also need to according to pre-defined mark
Knowing rule allows it to be converted to the treatable numerical value of computer.Sex is for example collected for female, can be according to pre-defined mark
(" man " output scene label value is 1 to rule, and " female " output scene label value is that 2) output scene label value is 2.Again for example for
Real-time behavior state can also export the treatable numerical value of computer according to pre-defined mark rule, for example, collect scene mark
Label value is when playing ball behavior, the scene tag value to be converted to the numerical value (such as 001) of computer capacity identification, collects scene mark
Label value is when listening the old song form to be, the scene tag value to be converted to numerical value (such as 002) of computer capacity identification etc..
Alternatively, scene tag includes:
Session content theme, the time of session communication both sides, place, date, session intention, weather, season, sex, duty
Industry, post, mood, hobby, body-sensing data, health status, real-time behavior state, constellation, blood group, session communication both sides it
Between relation, age gap away from, seniority in the family gap, the interval time of both sides' session communication, frequency, time span, the sentence of session content
One or more combination in type, sentence class, sentence structure type, and total amount label.
Specifically, the scene tag of the present embodiment is not limited to only including session content theme, the time of session communication both sides,
Place, the date, session intention, weather, season, sex, occupation, post, mood, hobby, body-sensing data, health status,
Real-time behavior state, constellation, blood group, the bipartite relation of session communication, age gap away from, seniority in the family gap, both sides' session communication
Interval time, frequency, time span, in sentence pattern, sentence class, the sentence structure type, and total amount label of session content one
Kind or multiple combination, it is specifically self-defined as needed by user, namely user can increase or delete scene tag.
It should be noted that when the present embodiment gathers scene tag value corresponding with session intention scene tag, Ke Yitong
Cross the session intention assessment model realization that the session for recognizing communication side and/or communication other side for pre-building is intended to.Specifically
Ground, trains with session to the corresponding session intention assessment model of sample, then according to the session intention assessment for training first
Model Identification communication side and/or communication other side are intended to for the session of session pair.
Embodiment is simplified below for two more to enter the method for the personal exclusive corpus of automatic foundation of the invention
One step explanation.
Simplify embodiment one
Reference picture 2, the method for the personal exclusive corpus of the automatic foundation that the offer of embodiment one is provided of the invention, including:
Step S201, gathers the session content of communication side.
Specifically, it is assumed that the session content of the present embodiment collection is the instant messaging account of communication side A, Email Accounts, micro-
Rich number, the session content that is conversated with communication other side B of cell-phone number, wherein, session content be text, picture, voice, video or
Animation form, and when session content is voice, picture, video or animation form, also including by voice, picture, video or dynamic
The session content of unrestrained form is converted to the session content of text formatting.Extracted from session content to describe the present embodiment in detail
The process of session pair, the present embodiment is illustrated with simple communication side A with the session content of communication other side B, specific as follows:
A:Eat
B:Eat.
B:You
A:Me is helped to pay
A:Take
B:100 yuan are altogether paid.
B:The people of queuing can be so many.
Step S202, judges whether the sentence of the session in session content has the upper of communication other side's transmission in Preset Time interval
Text, if nothing, session sentence is defined as initiating sentence;
If so, then judge session sentence whether with communication other side send above without semantic association, if so, then by session sentence really
It is set to initiation sentence, otherwise is defined as replying sentence by session sentence.
Specifically, according to above-mentioned judgment rule, it may be determined that initiation sentence and reply sentence in session content, it is assumed that this implementation
Example is specifically shown in Table 1 by judging to obtain the initiation sentence in session content and replying sentence.
Table 1
Initiate sentence | Reply sentence |
Eat | Eat. |
You | 100 yuan are altogether paid. |
Me is helped to pay | The people of queuing can be so many. |
Take |
Step S203, judges to initiate whether whether sentence is with complete independent semantic sentence, if so, then judging initiate sentence
By multiple there is complete independent semantic simple sentence to constitute, if so, the type for initiating sentence then is defined as into complex sentence initiates sentence type, it is no
Then for simple sentence initiates sentence type, if it is not, then judge to initiate whether sentence is included with complete independent semantic simple sentence, if comprising,
The type for initiating sentence is defined as non-standard complex sentence and initiates sentence type, if not including, for non-standard simple sentence initiates sentence type;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if
Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard simple sentence initiation sentence type can be with oneself
Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is initiated into sentence
The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if
Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard complex sentence initiation sentence type can be with oneself
Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is initiated into sentence
The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly
Oneself session continuous above and below sentence, if so, then determining whether initiate sentence whether can be continuous above and below with oneself
Session sentence is merged into the sentence group of semantic association, if so, will then have determined that the type of the initiation sentence of type derives expands to sentence mass-sending
First line of a poem type, does not carry out otherwise deriving extension.
Specifically, it is assumed that first differentiation process of the present embodiment first in step S203, judge to initiate sentence
Type is as follows, is specifically shown in Table 2.
Table 2
Sequence number | Initiate sentence | Type |
First initiation sentence | Eat | Simple sentence |
Article 2 initiates sentence | You | Simple sentence |
Article 3 initiates sentence | Me is helped to pay | Non-standard simple sentence |
Article 4 initiates sentence | Take | Non-standard simple sentence |
Then, second differentiation process in step S203, i.e., by judging non-standard simple sentence and non-standard complex sentence
Whether initiate the initiations sentence of sentence type has a session continuous above and below of oneself, and whether can with oneself above and under
Literary continuous session sentence is merged into complete independent semantic sentence, it is determined whether initiate non-standard simple sentence and non-standard complex sentence
The type derivative of sentence expands to non-standard sentence mass-sending first line of a poem type.By specific judgement, the Article 3 of the present embodiment and the
Initiating sentence for four can be merged into complete independent semantic sentence, namely now Article 3 and Article 4 can be initiated into sentence
Type derive and expand to non-standard sentence mass-sending first line of a poem type, be specifically shown in Table 3.
Table 3
Finally, the 3rd in step S203 differentiation process, judges simple sentence, complex sentence, non-standard simple sentence, non-standard multiple
Whether the initiation sentence of sentence and non-standard sentence realm type can further derive expands to sentence mass-sending first line of a poem type.
Specifically, it can be seen from table 3, the present embodiment can not will initiate the sentence group that sentence is further merged into semantic association,
I.e. in last process, do not carry out further deriving extension to initiating sentence.Therefore the final type such as institute of table 3 for obtaining initiation sentence
Show.
Step S204, according to default type judgment rule, it is determined that replying the type of sentence.
The present embodiment determines that the principle and process base of the type of sentence are initiated in the principle and process of the type for replying sentence and determination
This is identical, therefore no longer describes in detail, it is assumed that the present embodiment judges that the type for replying sentence is specifically as shown in table 4.
Table 4
Step S205, basic session is extracted according to the reply sentence initiated between sentence and initiation sentence and next initiation sentence
It is right.
Specifically, when the present embodiment initiates sentence extraction session pair for first, first determine whether first initiation sentence with
Whether one is initiated have reply sentence between sentence, if so, basic session pair is then extracted according to the initiation sentence and the reply sentence, by
Initiate have reply sentence between sentence in first and Article 2, then initiate sentence according to first and reply sentence to extract basic session pair.
It should be noted that the present embodiment is after it is determined that initiate to include reply sentence between sentence and next initiation sentence, also needs to calculate and initiate
Sentence with reply sentence whether semantic association, and only in the case of semantic association, just extract basis session pair, do not extract otherwise.
Present embodiment assumes that first is initiated sentence and first reply sentence semantic association, then basic session pair can be extracted, it is assumed that be
Basic session is to 1, and basic session is as shown in table 5 to 1 particular content.
Similarly, when the present embodiment is initiated sentence and extracts basic session pair for Article 2, first determine whether Article 2 initiate sentence with
Whether Article 3 initiates have reply sentence between sentence, and by judging, Article 2 and Article 3 are initiated not including reply between sentence
Sentence, then abandon Article 2 and initiate sentence as initiation sentence.Similarly, sentence is initiated according to Article 3 and Article 4, it is assumed that can extract
The basic session of semantic association is to 2, and basic session is as shown in table 5 to 2 particular content.
Table 5
Step S206, the type that sentence is initiated in basic session centering is carried out to derive extension, obtains polytype initiation sentence.
Specifically, six kinds are had due to initiating the type of sentence in the present embodiment, respectively simple sentence, complex sentence, non-standard simple sentence,
Non-standard complex sentence, non-standard sentence group and sentence mass-sending first line of a poem type, therefore the present embodiment initiates sentence according to basic session centering first
Type carry out deriving extension, due in the present embodiment basic session to the type of the initiation sentence in 1 for simple sentence initiates sentence type,
Its cannot further derive be extended to other five kinds initiation sentence types, so when only include a type of initiation sentence, i.e. simple sentence
The initiation sentence of sentence type is initiated, it is specific as shown in table 6.And according to basic session to the type of the initiation sentence in 2, can be further
Derivative is extended to other kinds of initiation sentence, and such as simple sentence initiates sentence type, specific as shown in table 6.
Table 6
Step S207, the type that sentence is replied in basic session centering is carried out to derive extension, obtains polytype reply sentence.
Specifically, six kinds are had due to replying the type of sentence in the present embodiment, respectively simple sentence, complex sentence, non-standard simple sentence,
Non-standard complex sentence, non-standard sentence group and sentence group reply sentence type.Therefore the present embodiment replys sentence according to basic session centering first
Type carry out deriving extension, due in the present embodiment basic session to the type of the reply sentence in 1 for simple sentence replys sentence type,
Its cannot further derive be extended to other five kinds reply sentence types, so when only include a type of reply sentence, i.e. simple sentence
The reply sentence of sentence type is replied, it is specific as shown in table 7.And according to basic session to the type of the reply sentence in 2, can be further
Derivative is extended to other kinds of reply sentence, and such as complex sentence replys sentence type, specific as shown in table 7.
Table 7
Step S208, according to polytype initiation sentence and polytype reply sentence, combination at least one is semantic to close
The session of connection is to extracting.
Specifically, there was only one kind due to 1, initiating sentence for basic session and replying the type of sentence, so when can only carry
A session pair is taken, and is directed to basic session to 2, be various due to initiating the type of sentence and the type of complex sentence, therefore can be combined and obtain
Multiple sessions pair are obtained, 8 are specifically shown in Table, table 8 is to 26 sessions pair extracted according to basic session.
Table 8
Step S209, according to default scene tag, collection obtains session pair scene tag value corresponding with scene tag.
Specifically, the present embodiment in collection with session to scene tag value corresponding and corresponding with default scene tag
When, scene tag is preset first, then for each session to gathering scene tag corresponding with default scene tag respectively
Value.Assuming that the default scene tag of the present embodiment includes session content theme, session intention, place, weather, session communication both sides
Relation, the age of communication object, the multiple combination of occupation, then can collect with each session to corresponding scene tag
Value, is specifically shown in Table 9.It should be noted that in the present embodiment due to session to 1- sessions to 6 based on session to 2
Derivative extension session pair, thus it is identical to the 2 corresponding scene tag value of scene tag with basic session.Additionally, the present embodiment pin
To different dialogues to that can set different scene tags, and the number of the scene tag for setting can also be different.
Table 9
Step S210, it will words carry out matching combination to, scene tag and scene tag value corresponding with scene tag,
So as to generate personal exclusive corpus.
Specifically, the present embodiment carries out session to, scene tag and scene tag value corresponding with scene tag
With combination, so as to generate personal exclusive corpus, namely combined according to the content of " session right+scene tag+scene tag value "
Rule, generates personal exclusive corpus.
The method of the personal exclusive corpus of automatic foundation provided in an embodiment of the present invention, by the session for gathering communication side
Hold, obtain the session pair in session content, according to default scene tag, collection obtains session pair field corresponding with scene tag
Scape label value and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so that
Personal exclusive corpus is generated, is solved existing big using the workload for manually setting up session corpus and is not possessed personal exclusive
The technical problem of property.The artificial workload for setting up session corpus is not only greatly reduced, and according in the session of communication side
Holding the personal exclusive corpus of the session pair and corresponding scene tag value generation extracted has personal specificity and stronger
Specific aim, embody personalized level higher.
Additionally, the present embodiment by according in session content session sentence semanteme, determine in session content initiation sentence and
Reply sentence, according to default type judgment rule, it is determined that initiate sentence and reply the type of sentence, according to initiate sentence and initiate sentence with
The next reply sentence initiated between sentence extracts basic session pair, and initiates sentence to, basic session centering according to basic session
With the type for replying sentence, extract at least one session pair, solve prior art extract session pair difficulty is larger, precision is relatively low
Technical problem, broken the limitation of the session to form of traditional question-response, and according to initiating sentence and reply sentence
Type, can not only fast and effeciently extract session pair, and the session pair extracted precision and the degree of accuracy also greatly promote.This
Outward, for the session sentence of some complicated or non-standard structures, the embodiment of the present invention can precisely extract that integrality is good and practicality is high
Session pair so that the session extracted to can the true session of accurate simulation, intelligence degree is higher.Further, this hair
The session that bright embodiment is extracted is conducive to dialogue-based obtaining precisely matching intelligent replying content, and matching to various informative
Various informative intelligent replying content is obtained, practicality is higher.
Simplify embodiment two
Reference picture 3, the method for the personal exclusive corpus of the automatic foundation that the offer of embodiment two is provided of the invention, including:
Step S301, gathers the session content of communication side.
Specifically, it is assumed that communication side in the present embodiment is A, then can by gather communication side A instant messaging account,
The session content that Email Accounts, microblogging number, cell-phone number and other communication other side conversate, obtains the session content of communication side A,
Wherein session content is text, picture, voice, video or animation form, and when session content is voice, picture, video or dynamic
During unrestrained form, also including the session content of voice, picture, video or animation form to be converted to the session content of text formatting.
In order to describe the process that the present embodiment sets up personal exclusive corpus in detail, the present embodiment is with the two parts simply side of communicating A's
Session content is illustrated, specific as follows:
Part I (session content of communication side A and communication other side B):
A:How much is a set of for Jun Ge robots shopkeeper
B:Jun Ge robots shop
B:5000 yuan long a set of.
B:Purchase now can also make a call to 8 foldings on the basis of 5000 yuan.
Part II (session content of communication side A and communication other side C):
A:Zhou elder sister exists
C:.
A:The residue degree of your shoulder neck card also has 5 times.
C:I intends reservation and will come in shop to nurse tomorrow.
C:You will be tomorrow in shop
A:I tomorrow can be in shop.
Step S302, obtains the session pair in session content;
Specifically, it is assumed that this implementation is by the semanteme according to session sentence in session content, it may be determined that in session content
Initiate sentence and reply sentence, be specifically shown in Table 10.
Table 10
And assume according to default type judgment rule, to determine and initiate sentence in Part I and Part II session content
Type with sentence is replied, is specifically shown in Table shown in 11 and table 12.
Table 11
Table 12
Moreover, it is assumed that the present embodiment is according to the reply sentence extraction base initiated between sentence and initiation sentence and next initiation sentence
Plinth session pair and sentence to, basic session centering is initiated according to basic session and the type of sentence is replied, finally extract 11 meetings
Words are right, are specifically shown in Table 13.
Table 13
Step S303, presets scene tag storehouse, and scene tag storehouse at least includes a scene tag.
Specifically, present embodiment assumes that scene tag storehouse includes at least one scene tag, and assume that scene tag is meeting
Words content topic, the time of session communication both sides, place, date, session intention, weather, season, sex, occupation, post, the heart
Feelings, hobby, body-sensing data, health status, real-time behavior state, constellation, blood group, the bipartite relation of session communication,
Age gap away from, seniority in the family gap, the interval time of both sides' session communication, frequency, time span, the sentence pattern of session content, sentence class, sentence
One or more combination in formula structure type, and total amount label.
Step S304, in scene tag library selection with session to the scene tag that associates.
Specifically, the present embodiment selected in scene tag library with session to associate scene tag when, it is necessary to be directed to every
One session to choosing scene tag associated with it, and selection and session to associate scene tag when can manually select
Select, it is also possible to by the term vector of the scene tag in the term vector and scene tag storehouse that calculate the session content theme of session pair
Between the degree of association obtain with session to the scene tag that associate, it is assumed that the present embodiment is by calculating acquisition and each session pair
The scene tag of association, it is specific as shown in table 14.Wherein, lower beat " √ " of scene tag in table 14 represents the scene tag and meeting
Words are to association.It should be noted that the present embodiment for different sessions to can choose it is different number of with session to associating
Scene tag.
Table 14
Step S305, collection obtains session pair scene tag value corresponding with scene tag.
Specifically, the present embodiment is after acquisition and session are to the scene tag that associates, continues to gather and obtains session pair and field
The corresponding scene tag value of scape label, namely for each session to gathering the corresponding scene of scene tag associated with it respectively
Label value, is specifically shown in Table 15.
Table 15
Step S306, it will words carry out matching combination to, scene tag and scene tag value corresponding with scene tag,
So as to generate personal exclusive corpus.
Specifically, the present embodiment carries out session to, scene tag and scene tag value corresponding with scene tag
With combination, so as to generate personal exclusive corpus, namely combined according to the content of " session right+scene tag+scene tag value "
Rule, generates personal exclusive corpus.
The method of the personal exclusive corpus of automatic foundation provided in an embodiment of the present invention, by the session for gathering communication side
Hold, obtain the session pair in session content, according to default scene tag, collection obtains session pair field corresponding with scene tag
Scape label value and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so that
Personal exclusive corpus is generated, is solved existing big using the workload for manually setting up session corpus and is not possessed personal exclusive
The technical problem of property.The artificial workload for setting up session corpus is not only greatly reduced, and according in the session of communication side
Holding the personal exclusive corpus of the session pair and corresponding scene tag value generation extracted has personal specificity and stronger
Specific aim, embody personalized level higher.Additionally, the session extracted from session content of the present embodiment is to form and interior
Hold various, truer simulation human brain conversation procedure, be that the automatic personal exclusive corpus matching set up of follow-up basis is obtained precisely
Reply content lay the foundation.
Additionally, the present embodiment by according in session content session sentence semanteme, determine in session content initiation sentence and
Reply sentence, according to default type judgment rule, it is determined that initiate sentence and reply the type of sentence, according to initiate sentence and initiate sentence with
The next reply sentence initiated between sentence extracts basic session pair, and initiates sentence to, basic session centering according to basic session
With the type for replying sentence, extract at least one session pair, solve prior art extract session pair difficulty is larger, precision is relatively low
Technical problem, broken the limitation of the session to form of traditional question-response, and according to initiating sentence and reply sentence
Type, can not only fast and effeciently extract session pair, and the session pair extracted precision and the degree of accuracy also greatly promote.This
Outward, for the session sentence of some complicated or non-standard structures, the embodiment of the present invention can precisely extract that integrality is good and practicality is high
Session pair so that the session extracted to can the true session of accurate simulation, intelligence degree is higher.Further, this hair
The session that bright embodiment is extracted is conducive to dialogue-based obtaining precisely matching intelligent replying content, and matching to various informative
Various informative intelligent replying content is obtained, practicality is higher.
The preferred embodiments of the present invention are these are only, is not intended to limit the invention, for those skilled in the art
For member, the present invention can have various modifications and variations.All any modifications within the spirit and principles in the present invention, made,
Equivalent, improvement etc., should be included within the scope of the present invention.
Claims (8)
1. the method for the personal exclusive corpus of a kind of automatic foundation, it is characterised in that including:
Gather the session content of communication side;
Obtain the session pair in the session content;
According to default scene tag, collection obtains the session pair scene tag value corresponding with the scene tag;
The session is carried out into matching combination to, the scene tag and scene tag value corresponding with the scene tag,
So as to generate personal exclusive corpus.
2. the method for the personal exclusive corpus of automatic foundation according to claim 1, it is characterised in that obtain the session
Session in content is to including:
According to the semanteme of session sentence in the session content, determine the initiation sentence in the session content and reply sentence;
According to default type judgment rule, the type of the initiation sentence and the reply sentence is determined;
Basic session pair is extracted according to the reply sentence that the initiation sentence and initiation sentence and next are initiated between sentence;
Sentence to, the basic session centering is initiated according to the basic session and the type of sentence is replied, at least one session is extracted
It is right.
3. the method for the personal exclusive corpus of automatic foundation according to claim 2, it is characterised in that according to the session
The semanteme of session sentence in content, determines that the sentence of the initiation in the session content and reply sentence include:
Judge whether the sentence of the session in the session content has communication other side to send above in Preset Time interval, if nothing,
Then session sentence is defined as initiating sentence;
If so, then judge session sentence whether with the communication other side send above without semantic association, if so, then will be described
Session sentence is defined as initiating sentence, otherwise is defined as replying sentence by session sentence.
4. the method for the personal exclusive corpus of automatic foundation according to claim 3, it is characterised in that according to default class
Type judgment rule, determining the type of the initiation sentence includes:
Judge whether whether the initiation sentence is with complete independent semantic sentence, if so, then judging the initiation sentence by many
It is individual to be constituted with complete independent semantic simple sentence, if so, the type of the initiation sentence then is defined as into complex sentence initiates sentence type, it is no
Then for simple sentence initiates sentence type;If it is not, whether the initiation sentence is then judged comprising having complete independent semantic simple sentence, if bag
Contain, then the type of the initiation sentence is defined as into non-standard complex sentence initiates sentence type, if not including, for non-standard simple sentence is initiated
Sentence type;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if
Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard simple sentence initiates the initiation sentence of sentence type can be with
The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, by non-standard list
The type derivative that sentence initiates the initiation sentence of sentence type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out
Derivative extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if
Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard complex sentence initiates the initiation sentence of sentence type can be with
The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, will be non-standard multiple
The type derivative that sentence initiates the initiation sentence of sentence type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out
Derivative extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly
Oneself session continuous above and below sentence, if so, then determining whether whether the initiation sentence can be with oneself above and below
Continuous session sentence is merged into the sentence group of semantic association, if so, then derive the type of the initiation sentence expanding to the sentence mass-sending first line of a poem
Type, does not carry out otherwise deriving extension.
5. the method for the personal exclusive corpus of automatic foundation according to claim 3, it is characterised in that according to default class
Type judgment rule, determining the type of the reply sentence includes:
Judge whether whether the reply sentence is with complete independent semantic sentence, if so, then judging the reply sentence by many
It is individual to be constituted with complete independent semantic simple sentence, if so, the type of the reply sentence then is defined as into complex sentence replys sentence type, it is no
Then for simple sentence replys sentence type;If it is not, whether the reply sentence is then judged comprising having complete independent semantic simple sentence, if bag
Contain, then the type of the reply sentence is defined as into non-standard complex sentence replys sentence type, if not including, for non-standard simple sentence is replied
Sentence type;
Search for whether the reply sentence of non-standard simple sentence reply sentence type has the session continuous above and below sentence of oneself, if
Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard simple sentence replys the reply sentence of sentence type can be with
The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, by non-standard list
The type derivative of the reply sentence of sentence reply sentence type expands to non-standard sentence group and replys sentence type, if can not, do not carry out
Derivative extension;
Search for whether the reply sentence of non-standard complex sentence reply sentence type has the session continuous above and below sentence of oneself, if
Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard complex sentence replys the reply sentence of sentence type can be with
The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, will be non-standard multiple
The type derivative of the reply sentence of sentence reply sentence type expands to non-standard sentence group and replys sentence type, if can not, do not carry out
Derivative extension;
Whether judge the reply sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly
Oneself session continuous above and below sentence, if so, then determining whether whether the reply sentence can be with oneself above and below
Continuous session sentence is merged into the sentence group of semantic association, and sentence is replied if so, then deriving the type of the reply sentence and expanding to sentence group
Type, does not carry out otherwise deriving extension.
6. the method for the personal exclusive corpus of automatic foundation according to claim 5, it is characterised in that according to basic session
To, the type of sentence is initiated in the basic session centering and the type of sentence is replied in the basic session centering, extracts at least one
Session is to including:
The type that sentence is initiated in the basic session centering is carried out deriving extension, polytype initiation sentence is obtained;
The type that sentence is replied in the basic session centering is carried out deriving extension, polytype reply sentence is obtained;
According to polytype meeting initiated sentence and polytype reply sentence, combine at least one semantic association
Words are to extracting.
7. according to the method for the claim 1-6 personal exclusive corpus of any described automatic foundation, it is characterised in that according to pre-
If scene tag, collection obtains the session pair scene tag value corresponding with the scene tag to be included:
Default scene tag storehouse, the scene tag storehouse at least includes a scene tag;
Selected in the scene tag storehouse with the session to the scene tag that associates;
Collection obtains the session pair scene tag value corresponding with the scene tag.
8. the method for the personal exclusive corpus of automatic foundation according to claim 7, it is characterised in that the scene tag
Including:
Session content theme, the time of session communication both sides, place, date, session intention, weather, season, sex, occupation, duty
Business, mood, hobby, body-sensing data, health status, real-time behavior state, constellation, blood group, session communication are bipartite
Relation, age gap away from, seniority in the family gap, the interval time of both sides' session communication, frequency, time span, the sentence pattern of session content, sentence
One or more combination in class, sentence structure type, and total amount label.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710076038.0A CN106874451A (en) | 2017-02-13 | 2017-02-13 | A kind of method of the personal exclusive corpus of automatic foundation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710076038.0A CN106874451A (en) | 2017-02-13 | 2017-02-13 | A kind of method of the personal exclusive corpus of automatic foundation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106874451A true CN106874451A (en) | 2017-06-20 |
Family
ID=59165937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710076038.0A Pending CN106874451A (en) | 2017-02-13 | 2017-02-13 | A kind of method of the personal exclusive corpus of automatic foundation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106874451A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197101A (en) * | 2017-12-19 | 2018-06-22 | 浪潮软件股份有限公司 | A kind of corpus labeling method and device |
WO2018145436A1 (en) * | 2017-02-13 | 2018-08-16 | 长沙军鸽软件有限公司 | Method for extracting conversation pair from conversation content |
CN109388717A (en) * | 2018-07-20 | 2019-02-26 | 北京智能点科技有限公司 | A kind of method and system of Mass production corpus |
CN109977390A (en) * | 2017-12-27 | 2019-07-05 | 北京搜狗科技发展有限公司 | A kind of method and device generating text |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103390047A (en) * | 2013-07-18 | 2013-11-13 | 天格科技(杭州)有限公司 | Chatting robot knowledge base and construction method thereof |
CN103412855A (en) * | 2013-06-27 | 2013-11-27 | 华中师范大学 | Method and system for automatic identification of relative words in complex sentence of modern Chinese language |
CN104881402A (en) * | 2015-06-02 | 2015-09-02 | 北京京东尚科信息技术有限公司 | Method and device for analyzing semantic orientation of Chinese network topic comment text |
CN105389296A (en) * | 2015-12-11 | 2016-03-09 | 小米科技有限责任公司 | Information partitioning method and apparatus |
CN105528403A (en) * | 2015-12-02 | 2016-04-27 | 小米科技有限责任公司 | Target data identification method and apparatus |
-
2017
- 2017-02-13 CN CN201710076038.0A patent/CN106874451A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103412855A (en) * | 2013-06-27 | 2013-11-27 | 华中师范大学 | Method and system for automatic identification of relative words in complex sentence of modern Chinese language |
CN103390047A (en) * | 2013-07-18 | 2013-11-13 | 天格科技(杭州)有限公司 | Chatting robot knowledge base and construction method thereof |
CN104881402A (en) * | 2015-06-02 | 2015-09-02 | 北京京东尚科信息技术有限公司 | Method and device for analyzing semantic orientation of Chinese network topic comment text |
CN105528403A (en) * | 2015-12-02 | 2016-04-27 | 小米科技有限责任公司 | Target data identification method and apparatus |
CN105389296A (en) * | 2015-12-11 | 2016-03-09 | 小米科技有限责任公司 | Information partitioning method and apparatus |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018145436A1 (en) * | 2017-02-13 | 2018-08-16 | 长沙军鸽软件有限公司 | Method for extracting conversation pair from conversation content |
CN108197101A (en) * | 2017-12-19 | 2018-06-22 | 浪潮软件股份有限公司 | A kind of corpus labeling method and device |
CN108197101B (en) * | 2017-12-19 | 2021-09-14 | 浪潮软件股份有限公司 | Corpus labeling method and apparatus |
CN109977390A (en) * | 2017-12-27 | 2019-07-05 | 北京搜狗科技发展有限公司 | A kind of method and device generating text |
CN109977390B (en) * | 2017-12-27 | 2023-11-03 | 北京搜狗科技发展有限公司 | Method and device for generating text |
CN109388717A (en) * | 2018-07-20 | 2019-02-26 | 北京智能点科技有限公司 | A kind of method and system of Mass production corpus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104598445B (en) | Automatically request-answering system and method | |
CN106874452A (en) | A kind of method for obtaining session reply content | |
CN106874451A (en) | A kind of method of the personal exclusive corpus of automatic foundation | |
CN106709072A (en) | Method of obtaining intelligent conversation reply content based on shared corpora | |
CN105931638A (en) | Intelligent-robot-oriented dialog system data processing method and device | |
CN106407178A (en) | Session abstract generation method and device | |
CN107123057A (en) | User recommends method and device | |
CN107103083A (en) | A kind of method that robot realizes intelligent session | |
CN106653016A (en) | Intelligent interaction method and intelligent interaction device | |
JP2021108142A (en) | Information processing system, information processing method, and information processing program | |
CN103425982A (en) | Information processing apparatus, information processing method, and program | |
CN108304424B (en) | Text keyword extraction method and text keyword extraction device | |
CN111798279A (en) | Dialog-based user portrait generation method and apparatus | |
CN107861961A (en) | Dialog information generation method and device | |
CN106649410B (en) | Method and device for obtaining chat reply content | |
CN107623621A (en) | Language material collection method of chatting and device | |
CN102999507A (en) | Recommendation processing method and device for information of network microblog celebrities | |
CN110209778A (en) | A kind of method and relevant apparatus of dialogue generation | |
CN103294725A (en) | Intelligent response robot software | |
CN106844735A (en) | A kind of method of the personal exclusive corpus of automatic foundation | |
CN104702759A (en) | Address list setting method and address list setting device | |
JP2019036171A (en) | System for assisting in creation of interaction scenario corpus | |
CN112287082A (en) | Data processing method, device, equipment and storage medium combining RPA and AI | |
CN106844734A (en) | A kind of method for automatically generating session reply content | |
CN106356056B (en) | Audio recognition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170620 |