CN106874451A - A kind of method of the personal exclusive corpus of automatic foundation - Google Patents

A kind of method of the personal exclusive corpus of automatic foundation Download PDF

Info

Publication number
CN106874451A
CN106874451A CN201710076038.0A CN201710076038A CN106874451A CN 106874451 A CN106874451 A CN 106874451A CN 201710076038 A CN201710076038 A CN 201710076038A CN 106874451 A CN106874451 A CN 106874451A
Authority
CN
China
Prior art keywords
sentence
session
type
reply
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710076038.0A
Other languages
Chinese (zh)
Inventor
陈包容
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Dove Software Co Ltd
Original Assignee
Changsha Dove Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Dove Software Co Ltd filed Critical Changsha Dove Software Co Ltd
Priority to CN201710076038.0A priority Critical patent/CN106874451A/en
Publication of CN106874451A publication Critical patent/CN106874451A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The method of the personal exclusive corpus of automatic foundation that the present invention is provided, by the session content for gathering communication side, obtain the session pair in session content, according to default scene tag, collection obtains corresponding with the scene tag scene tag value of session pair and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate personal exclusive corpus, solve existing big using the workload for manually setting up session corpus and do not possess the technical problem of personal specificity.Not only greatly reduce the artificial workload for setting up session corpus, and personal specificity and stronger specific aim are had according to the session content of communication the side session pair extracted and the personal exclusive corpus that corresponding scene tag value is generated, embody personalized level higher.

Description

A kind of method of the personal exclusive corpus of automatic foundation
Technical field
The present invention relates to communication technical field, and in particular to a kind of method of the personal exclusive corpus of automatic foundation.
Background technology
At present, the session reply content for automatically replying is used in intelligent conversational system, often by matching session language material The mode in storehouse is obtained.Session corpus in said process, mainly by manual creation.The artificial workload for building storehouse is big, And it is universal not high to build storehouse quality.Additionally, the session corpus in prior art is nearly all common to all users, no Possess personal specificity and specific aim.For the problem, therefore the present embodiment proposes a kind of dialogue-based content and sets up individual automatically The method of the exclusive corpus of people.
The content of the invention
It is existing using artificial foundation meeting to solve the invention provides a kind of method of the personal exclusive corpus of automatic foundation The workload of words corpus is big and does not possess the technical problem of personal specificity.
The method of the personal exclusive corpus of automatic foundation that the present invention is provided, including:
Gather the session content of communication side;
Obtain the session pair in session content;
According to default scene tag, collection obtains session pair scene tag value corresponding with scene tag;
Session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate Personal exclusive corpus.
Further, the session in session content is obtained to including:
According to the semanteme of session sentence in session content, determine the initiation sentence in session content and reply sentence;
According to default type judgment rule, it is determined that initiating sentence and replying the type of sentence;
Reply sentence according to initiating between sentence and initiation sentence and next initiation sentence extracts basic session pair;
Sentence to, basic session centering is initiated according to basic session and the type of sentence is replied, at least one session pair is extracted.
Further, according to the semanteme of session sentence in session content, determine the initiation sentence in session content and reply sentence bag Include:
Judge whether the sentence of the session in session content has communication other side to send above in Preset Time interval, if nothing, Then session sentence is defined as initiating sentence;
If so, then judge session sentence whether with communication other side send above without semantic association, if so, then by session sentence really It is set to initiation sentence, otherwise is defined as replying sentence by session sentence.
Further, according to default type judgment rule, it is determined that the type for initiating sentence includes:
Judge to initiate whether sentence is with complete independent semantic sentence, if so, then judging to initiate whether sentence is had by multiple It is made up of complete independent semantic simple sentence, if so, the type for initiating sentence then is defined as into complex sentence initiates sentence type, otherwise it is simple sentence Initiate sentence type;If it is not, whether then judge to initiate sentence comprising having complete independent semantic simple sentence, if comprising sentence will be initiated Type be defined as non-standard complex sentence and initiate sentence type, be that non-standard simple sentence initiates sentence type if not including;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard simple sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard complex sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether initiate sentence whether can be continuous above and below with oneself Session sentence is merged into the sentence group of semantic association, if so, the type that will then initiate sentence derives expands to sentence mass-sending first line of a poem type, otherwise Do not carry out deriving extension.
Further, according to default type judgment rule, it is determined that the type for replying sentence includes:
Judge to reply whether sentence is with complete independent semantic sentence, if so, then judging to reply whether sentence is had by multiple It is made up of complete independent semantic simple sentence, if so, the type for replying sentence then is defined as into complex sentence replys sentence type, otherwise it is simple sentence Reply sentence type;If it is not, whether then judge to reply sentence comprising having complete independent semantic simple sentence, if comprising sentence will be replied Type be defined as non-standard complex sentence and reply sentence type, be that non-standard simple sentence replys sentence type if not including;
Search for whether the reply sentence of non-standard simple sentence reply sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard simple sentence reply sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is replied into sentence The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Search for whether the reply sentence of non-standard complex sentence reply sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard complex sentence reply sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is replied into sentence The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Whether judge the reply sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether reply sentence whether can be continuous above and below with oneself Session sentence is merged into the sentence group of semantic association, if so, the type derivative that will then reply sentence expands to sentence group replys sentence type, otherwise Do not carry out deriving extension.
Further, according to basic session to, the type of sentence is initiated in basic session centering and basic session centering is replied The type of sentence, extracts at least one session to including:
The type that sentence is initiated in basic session centering is carried out to derive extension, polytype initiation sentence is obtained;
The type that sentence is replied in basic session centering is carried out to derive extension, polytype reply sentence is obtained;
According to polytype initiation sentence and polytype reply sentence, the session pair of at least one semantic association is combined Extracted.
Further, according to default scene tag, collection obtains session pair scene tag value corresponding with scene tag Including:
Default scene tag storehouse, scene tag storehouse at least includes a scene tag;
In scene tag library selection with session to the scene tag that associates;
Collection obtains session pair scene tag value corresponding with scene tag.
Further, scene tag includes:
Session content theme, the time of session communication both sides, place, date, session intention, weather, season, sex, duty Industry, post, mood, hobby, body-sensing data, health status, real-time behavior state, constellation, blood group, session communication both sides it Between relation, age gap away from, seniority in the family gap, the interval time of both sides' session communication, frequency, time span, the sentence of session content One or more combination in type, sentence class, sentence structure type, and total amount label.
The invention has the advantages that:
The method of the personal exclusive corpus of automatic foundation that the present invention is provided, by gathering the session content of communication side, obtains The session pair in session content is taken, according to default scene tag, collection obtains session pair scene mark corresponding with scene tag Label value and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate Personal exclusive corpus, solves existing big using the workload for manually setting up session corpus and does not possess personal specificity Technical problem.Not only greatly reduce and manually set up the workload of session corpus, and carry according to the session content of communication side The personal exclusive corpus of the session pair for taking and the generation of corresponding scene tag value has personal specificity and stronger pin To property, personalized level higher is embodied.
In addition to objects, features and advantages described above, the present invention also has other objects, features and advantages. Below with reference to figure, the present invention is further detailed explanation.
Brief description of the drawings
The accompanying drawing for building the part of the application is used for providing a further understanding of the present invention, schematic reality of the invention Apply example and its illustrate, for explaining the present invention, not build inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the method flow diagram that the preferred embodiment of the present invention sets up personal exclusive corpus automatically;
Fig. 2 is the method for the personal exclusive corpus of the automatic foundation for simplifying embodiment one that the preferred embodiment of the present invention is directed to Flow chart;
Fig. 3 is the method for the personal exclusive corpus of the automatic foundation for simplifying embodiment two that the preferred embodiment of the present invention is directed to Flow chart.
Specific embodiment
Embodiments of the invention are described in detail below in conjunction with accompanying drawing, but the present invention can be defined by the claims Multitude of different ways with covering is implemented.
Reference picture 1, the preferred embodiments of the present invention provide a kind of method of the personal exclusive corpus of automatic foundation, bag Include:
Step S101, gathers the session content of communication side;
Step S102, obtains the session pair in session content;
Step S103, according to default scene tag, collection obtains session pair scene tag value corresponding with scene tag;
Step S104, it will words carry out matching combination to, scene tag and scene tag value corresponding with scene tag, So as to generate personal exclusive corpus.
The method of the personal exclusive corpus of automatic foundation provided in an embodiment of the present invention, by the session for gathering communication side Hold, obtain the session pair in session content, according to default scene tag, collection obtains session pair field corresponding with scene tag Scape label value and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so that Personal exclusive corpus is generated, is solved existing big using the workload for manually setting up session corpus and is not possessed personal exclusive The technical problem of property.The artificial workload for setting up session corpus is not only greatly reduced, and according in the session of communication side Holding the personal exclusive corpus of the session pair and corresponding scene tag value generation extracted has personal specificity and stronger Specific aim, embody personalized level higher.
Additionally, the embodiment of the present invention is directly according to session to, scene tag and scene tag corresponding with scene tag The personal exclusive corpus of value generation, simulates true session context and sets up personal exclusive corpus completely so that the individual of foundation is specially Category corpus is more precisely and practical.And the present embodiment sets up personal exclusive language material by gathering the personal session content of communication side Storehouse so that the personal corpus of generation is constituted by the session language material that communication side and other communication other side conversate, so that Obtaining the personal exclusive corpus set up automatically has personal specificity and stronger specific aim.
It should be noted that, the embodiment of the present invention is by session to, scene tag and scene tag corresponding with scene tag Value carries out the personal exclusive corpus of matching combination producing, namely according to the content of " session right+scene tag+scene tag value " With rule of combination, personal exclusive corpus is generated.Further, since different session contents has different scene characteristics, for example Session content theme, session intention, Session Time, session place, session both sides' relation etc., therefore the present embodiment is obtained in session To rear, further according to default scene tag, collection obtains session pair scene mark corresponding with scene tag for session in appearance A label value, and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so as to generate The exclusive corpus of people.Scene tag in the present embodiment by User Defined or automatic calculating, for example, can be session content master Topic, the time of session communication both sides, place, date, session intention, weather, season, sex, occupation, post, mood, interest love Good, body-sensing data, health status, real-time behavior state, constellation, blood group, the bipartite relation of session communication, age gap away from, Seniority in the family gap, the interval time of both sides' session communication, frequency, time span, the sentence pattern of session content, sentence class, sentence structure class One or more combination in type, and total amount label etc..
And the present embodiment collection is when obtaining corresponding with the scene tag scene tag value of session pair, different sides can be taken Method realization, the method for specifically including direct collection, such as place scene tag value, can be by the GPS of mobile terminal certainly Dynamic collection is obtained;The method of reasoning, such as communication two party relation scene tag value, can be by other acquired fields The reasoning of scape label value is obtained;The method with the term vector of session relevance is calculated, for example, is intended to collection label value for session, Can be obtained with the term vector of session relevance by calculating;The method of neural network learning, such as mood scene mark Label value, the grader that session content or other acquired scene tag value inputs are trained can be classified obtain.Additionally, The present embodiment can also automatically obtain scene tag value with reference to one or more method described above.
Alternatively, the session in session content is obtained to including:
According to the semanteme of session sentence in session content, determine the initiation sentence in session content and reply sentence;
According to default type judgment rule, it is determined that initiating sentence and replying the type of sentence;
Reply sentence according to initiating between sentence and initiation sentence and next initiation sentence extracts basic session pair;
Sentence to, basic session centering is initiated according to basic session and the type of sentence is replied, at least one session pair is extracted.
The existing session pair extracted from session content or question and answer pair, often the session of question-response is to form, and In actual conversation procedure, communication two party conversates and not complies fully with the conversation modes of question-response, such as communication The session sentence that other side sends, communication side may reply several session sentences, or for a plurality of session sentence that communication other side sends, lead to News side may only reply a session sentence.
Therefore it is right if only the form extraction dialogue of question-response is taken, it is understood that there may be problems with:
(1) for the session content that some do not represent in question-response form, session pair is extracted from session content Difficulty is larger, and precision is relatively low.The session content that sentence+multiple replys sentence form for example is initiated for multiple, session is therefrom extracted Pair when, it is necessary to analyze reply sentence match with each initiation sentence, process is complicated, greatly, and precision is relatively low for difficulty.
(2) due to it is existing according to session content extract question and answer pair or session to be typically all standard of comparison session sentence, Or session sentence relatively simple for structure, so as to cause the session sentence for some complicated or non-standard structures precisely to have extracted Whole property is good and practicality session pair high.
(3) further, since the integrality of the session pair extracted in question-response form is more easily damaged, so as to cause to extract Session to being unable to the true session of accurate simulation.Regarding to the issue above, the present invention proposes one kind according to initiation sentence and replys sentence Type method that session pair is extracted from session content.
For the problem, the present embodiment determines the hair in session content by the semanteme according to session sentence in session content The first line of a poem and reply sentence, according to default type judgment rule, it is determined that initiate sentence and reply the type of sentence, according to initiation sentence and hair The reply sentence that the first line of a poem and next are initiated between sentence extracts basic session pair, and according to basic session to, basic session centering Initiate sentence and reply the type of sentence, extract at least one session pair, solve prior art extract session pair difficulty is larger, essence The relatively low technical problem of degree, has broken the limitation of the session to form of traditional question-response, and according to initiation sentence and return The type of complex sentence, can not only fast and effeciently extract session pair, and the session pair extracted precision and the degree of accuracy also carry significantly Rise.Additionally, for the session sentence of some complicated or non-standard structures, it is good and practical that the embodiment of the present invention can precisely extract integrality Property session pair high so that the session extracted to can the true session of accurate simulation, intelligence degree is higher.Further, The session that the embodiment of the present invention is extracted to various informative, be conducive to it is dialogue-based to precisely matching intelligent replying content, and With various informative intelligent replying content is obtained, practicality is higher.
It should be noted that the present embodiment it is determined that initiate sentence and reply sentence type before, first preset initiate sentence and The type and type judgment rule corresponding with type of sentence are replied, so that according to default type judgment rule, can be quick It is determined that initiating sentence and replying the type of sentence.And the initiation sentence in the present embodiment specifically refers to the session without communication other side transmission above Sentence or the session sentence without semantic association above sent with communication other side.
The present embodiment can be by gathering the session of the instant messaging account of communication side, Email Accounts, microblogging number, cell-phone number Content obtains session content, and wherein session content is text, picture, voice, video or animation form, and when session content is language When sound, picture, video or animation form, also including the session content of voice, picture, video or animation form is converted into text The session content of form.
Alternatively, according to the semanteme of session sentence in session content, determine that the sentence of the initiation in session content and reply sentence include:
Judge whether the sentence of the session in session content has communication other side to send above in Preset Time interval, if nothing, Then session sentence is defined as initiating sentence;
If so, then judge session sentence whether with communication other side send above without semantic association, if so, then by session sentence really It is set to initiation sentence, otherwise is defined as replying sentence by session sentence.
In order to precisely extract the session pair in session content, the present embodiment is first according to the language of session sentence in session content Justice, determines the initiation sentence in session content and replys sentence, then further determines to initiate sentence and replys the type of sentence, so that root Session pair is precisely extracted according to the type initiated sentence and reply sentence.Wherein, the present embodiment it is signified according to session sentence in session content Semanteme, the detailed process for determining initiation sentence in session content and replying sentence is:Judge the session sentence in session content pre- If whether there is communication other side to send above in time interval, if nothing, session sentence is defined as initiating sentence, if so, then judging Session sentence whether with communication other side send above without semantic association, if so, then by session sentence be defined as initiate sentence, otherwise will Words sentence is defined as replying sentence.
In the conversation procedure of reality, if current sessions sentence is interval interior without the upper of communication other side's transmission in Preset Time Text, is typically construed as initiating the initial sentence of session, namely initiate sentence.For example assume current sessions sentence for December 3 sent Session sentence, upper session sentence is to communicate the session sentence that other side sent in December 1, it is assumed that default time interval is 1 day, Then by judging, current sessions sentence sends above in Preset Time is interval without communication other side, then by current sessions sentence Be considered initiate session initial sentence, also will current sessions sentence be judged to initiate sentence.And the default time interval of the present embodiment Specifically by User Defined, for example, can be 1 hour, half a day, one day, one month etc., namely current sessions sentence ought be judged Sent above without communication other side in 1 hour, half a day, one day, one month, then judge current sessions sentence as sentence is initiated.
Additionally, when session sentence have communication other side send above when, be can determine whether according to actual session content, session sentence may It is to reply the sentence of reply above that communication other side sends;It is likely to not be to reply communication other side to send above, but sends out again Play the initiation sentence of session;Or simultaneously be reply communication other side send above reply sentence and again initiation session initiation Sentence.For such case, the present embodiment is by judging whether session sentence with communication other side sends comes true without semantic association above Determine the type of session sentence.It should be noted that whether session sentence closes without semanteme above with what communication other side sent in the present embodiment Connection, specifically refers to whether session sentence includes the sentence without semantic association above sent with communication other side.
For example, when session sentence has communication other side to send above, and communication other side A send above for " recently how Sample", then for session sentence (the communication side B of the first situation:" pretty good "), can determine whether out that session sentence does not include and communication The sentence without semantic association above that other side sends, now determines session sentence to reply sentence;For second session of situation Sentence (communication side B:" me is helped to pay telephone charge"), can determine whether out that session sentence is included with communication other side's transmission above without language The sentence of justice association, now determines session sentence to initiate sentence;For session sentence (the communication side B of the third situation:" it is pretty good, Me is helped to pay telephone charge"), can determine whether out that session sentence is same is included with communication other side's transmission above without semantic association Sentence (" helps me to pay telephone charge"), now determine session sentence to initiate sentence.
The present embodiment is by judging whether the sentence of the session in session content has communication other side to send in Preset Time interval Above and there is communication other side to send above when judge session sentence whether with communication other side send above without semantic pass Connection, can precisely determine the initiation sentence and reply sentence in session content, be follow-up accurate according to the initiation for determining sentence and reply sentence Extract session pair and laid the foundation to setting up personal exclusive corpus according to the session extracted.
Alternatively, according to default type judgment rule, it is determined that the type for initiating sentence includes:
Judge to initiate whether sentence is with complete independent semantic sentence, if so, then judging to initiate whether sentence is had by multiple It is made up of complete independent semantic simple sentence, if so, the type for initiating sentence then is defined as into complex sentence initiates sentence type, otherwise it is simple sentence Initiate sentence type;If it is not, whether then judge to initiate sentence comprising having complete independent semantic simple sentence, if comprising sentence will be initiated Type be defined as non-standard complex sentence and initiate sentence type, be that non-standard simple sentence initiates sentence type if not including;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard simple sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard complex sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether initiate sentence whether can be continuous above and below with oneself Session sentence is merged into the sentence group of semantic association, if so, will then have determined that the type of the initiation sentence of type derives expands to sentence mass-sending First line of a poem type, does not carry out otherwise deriving extension.
In actual implementation process, initiating sentence may be presented with polytype, for example simple sentence, complex sentence, non-standard Sentence etc., and it is different types of initiate sentence may influence or cause extract session to difference.For the problem, the present embodiment According to default type judgment rule, it is determined that initiating the type of sentence.Specifically, sentence is being initiated with complete independent semanteme first Under the premise of, by judging that initiating the simple sentence that sentence is by or multiple is completely independently semantic constitutes, it is determined that initiating sentence for simple sentence Or complex sentence initiates sentence type, and on the premise of sentence is initiated without complete independent semanteme, by judging whether initiate sentence Determine the type for initiating sentence for non-standard complex sentence also criteria of right and wrong simple sentence initiates sentence comprising the simple sentence with complete independent semanteme Type;Then initiated by searching for non-standard simple sentence and non-standard complex sentence the initiations sentence of sentence type whether have oneself above with Literary continuous session sentence, and whether can be merged into complete independent semantic language with the session continuous above and below of oneself sentence Sentence, it is determined whether the type derivative that will initiate sentence expands to non-standard sentence mass-sending first line of a poem type;Finally by judging simple sentence, multiple Whether the initiation sentence of sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has the continuous above and below of oneself Session sentence, it is determined that whether the type for initiating sentence can derive expands to sentence mass-sending first line of a poem type.
Specifically, the present embodiment determines that being divided into three differentiation processes, i.e., first on the process nature for initiate sentence type sentences Other process is to initiate sentence to each to initiate sentence type (simple sentence, complex sentence, non-standard simple sentence and non-standard complex sentence) according to four kinds Differentiated one by one;Second differentiation process is after first differentiation process has been carried out, then to differentiate non-standard simple sentence and non- Whether the initiation sentence of standard complex sentence initiation sentence type can further derive expands to non-standard sentence mass-sending first line of a poem type;3rd is sentenced Other process be after second differentiation process has been carried out, then differentiate simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and Whether the initiation sentence of non-standard sentence realm type can further derive expands to sentence mass-sending first line of a poem type.
On the one hand the present embodiment is conducive to carrying out sentence structure and composition to initiating sentence by determining to initiate the type of sentence Depth analysis, on the other hand, based on type judgement and structural analysis is carried out to initiating sentence, are conducive to more accurate extraction practicality high And various informative session pair.It should be noted that initiating whether sentence has the meeting continuous above and below of oneself in the present embodiment Words sentence specifically refers to initiate whether sentence has the session continuous above and below sentence for sending the sender's transmission for initiating sentence.
Alternatively, according to default type judgment rule, it is determined that the type for replying sentence includes:
Judge to reply whether sentence is with complete independent semantic sentence, if so, then judging to reply whether sentence is had by multiple It is made up of complete independent semantic simple sentence, if so, the type for replying sentence then is defined as into complex sentence replys sentence type, otherwise it is simple sentence Reply sentence type;If it is not, whether then judge to reply sentence comprising having complete independent semantic simple sentence, if comprising sentence will be replied Type be defined as non-standard complex sentence and reply sentence type, be that non-standard simple sentence replys sentence type if not including;
Search for whether the reply sentence of non-standard simple sentence reply sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard simple sentence reply sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is replied into sentence The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Search for whether the reply sentence of non-standard complex sentence reply sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the reply sentence of non-standard complex sentence reply sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is replied into sentence The type derivative of the reply sentence of type expands to non-standard sentence group and replys sentence type, if can not, do not carry out deriving extension;
Whether judge the reply sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether reply sentence whether can be continuous above and below with oneself Session sentence is merged into the sentence group of semantic association, if so, will then have determined that the type of the reply sentence of type derives expands to sentence group time Complex sentence type, does not carry out otherwise deriving extension.
The present embodiment judges that the principle and process of the type replied the type of sentence and judge initiation sentence are essentially identical, therefore no longer Describe in detail.And on the one hand the present embodiment is conducive to carrying out sentence structure and composition to replying sentence by determining to reply the type of sentence Depth analysis, on the other hand, based on type judgement and structural analysis is carried out to replying sentence, are conducive to more accurate extraction practicality high And various informative session pair.It should be noted that replying whether sentence has the meeting continuous above and below of oneself in the present embodiment Words sentence specifically refers to reply whether sentence has the session continuous above and below sentence for sending the sender's transmission for replying sentence.
Alternatively, according to basic session to, the type of sentence is initiated in basic session centering and sentence is replied in basic session centering Type, extract at least one session to including:
The type that sentence is initiated in basic session centering is carried out to derive extension, polytype initiation sentence is obtained;
The type that sentence is replied in basic session centering is carried out to derive extension, polytype reply sentence is obtained;
According to polytype initiation sentence and polytype reply sentence, the session pair of at least one semantic association is combined Extracted.
Due in the present embodiment initiate sentence and reply sentence type include it is various, for example simple sentence, complex sentence, non-standard simple sentence, Non-standard complex sentence, non-standard sentence group, sentence mass-sending first line of a poem type, and it is simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence, nonstandard Quasi- sentence group, sentence group reply sentence type, therefore are extracting basic session to rear, high and various informative in order to more precisely extract practicality Session pair, the type that sentence is initiated in basic session centering derive extension, the polytype hair of acquisition by the present embodiment first The first line of a poem, then carries out the type that sentence is replied in basic session centering to derive extension, obtains polytype reply sentence, finally according to Polytype initiation sentence and polytype reply sentence, combine the session of at least one semantic association to extracting, from And the multiple sessions pair of acquisition can be combined.
For example assume that it is that complex sentence initiates sentence type to initiate sentence type, it is that complex sentence replys sentence type to reply sentence, then by type After derivative extension, simple sentence can be extracted initiate sentence+simple sentence and reply sentence, complex sentence is initiated sentence+simple sentence and replys sentence, simple sentence initiate sentence+ Complex sentence replys sentence, and complex sentence initiates the session pair that sentence+complex sentence replys the diversified forms such as sentence.
Alternatively, according to default scene tag, collection obtains session pair scene tag value bag corresponding with scene tag Include:
Default scene tag storehouse, scene tag storehouse at least includes a scene tag;
In scene tag library selection with session to the scene tag that associates;
Collection obtains session pair scene tag value corresponding with scene tag.
It is usually first by presetting scene tag, then according to scene tag that the present embodiment collection obtains scene tag value Collection obtains session pair two steps of scene tag value corresponding with scene tag and realizes, and in actual implementation process, by The degree of association of session pair in different sessions to may associate different scene tags or different and different scene tags Difference, therefore in order to more precisely obtain with session to corresponding scene tag value, the present embodiment is preset for storage scenarios mark first The scene tag storehouse of label, then in scene tag library selection and session to the scene tag that associates, finally further according to session Scene tag collection to associating obtains session pair scene tag value corresponding with scene tag.
Specifically, by artificial self-defined or automatic calculating and session to the scene tag that associates, such as different Session pair, the different scene tag of artificial selection.And this programme definition with session to the scene tag that associates, specifically can be with root Associated according to the session content with session pair, or associated with the session content theme of session pair, or closed with the Session Time of session pair The scene tag of connection is obtained.
It should be noted that the scene tag value in the present embodiment is result corresponding with scene tag, can be numerical value, Can also be nonumeric, and when it is non-numeric form to collect scene tag value, typically also need to according to pre-defined mark Knowing rule allows it to be converted to the treatable numerical value of computer.Sex is for example collected for female, can be according to pre-defined mark (" man " output scene label value is 1 to rule, and " female " output scene label value is that 2) output scene label value is 2.Again for example for Real-time behavior state can also export the treatable numerical value of computer according to pre-defined mark rule, for example, collect scene mark Label value is when playing ball behavior, the scene tag value to be converted to the numerical value (such as 001) of computer capacity identification, collects scene mark Label value is when listening the old song form to be, the scene tag value to be converted to numerical value (such as 002) of computer capacity identification etc..
Alternatively, scene tag includes:
Session content theme, the time of session communication both sides, place, date, session intention, weather, season, sex, duty Industry, post, mood, hobby, body-sensing data, health status, real-time behavior state, constellation, blood group, session communication both sides it Between relation, age gap away from, seniority in the family gap, the interval time of both sides' session communication, frequency, time span, the sentence of session content One or more combination in type, sentence class, sentence structure type, and total amount label.
Specifically, the scene tag of the present embodiment is not limited to only including session content theme, the time of session communication both sides, Place, the date, session intention, weather, season, sex, occupation, post, mood, hobby, body-sensing data, health status, Real-time behavior state, constellation, blood group, the bipartite relation of session communication, age gap away from, seniority in the family gap, both sides' session communication Interval time, frequency, time span, in sentence pattern, sentence class, the sentence structure type, and total amount label of session content one Kind or multiple combination, it is specifically self-defined as needed by user, namely user can increase or delete scene tag.
It should be noted that when the present embodiment gathers scene tag value corresponding with session intention scene tag, Ke Yitong Cross the session intention assessment model realization that the session for recognizing communication side and/or communication other side for pre-building is intended to.Specifically Ground, trains with session to the corresponding session intention assessment model of sample, then according to the session intention assessment for training first Model Identification communication side and/or communication other side are intended to for the session of session pair.
Embodiment is simplified below for two more to enter the method for the personal exclusive corpus of automatic foundation of the invention One step explanation.
Simplify embodiment one
Reference picture 2, the method for the personal exclusive corpus of the automatic foundation that the offer of embodiment one is provided of the invention, including:
Step S201, gathers the session content of communication side.
Specifically, it is assumed that the session content of the present embodiment collection is the instant messaging account of communication side A, Email Accounts, micro- Rich number, the session content that is conversated with communication other side B of cell-phone number, wherein, session content be text, picture, voice, video or Animation form, and when session content is voice, picture, video or animation form, also including by voice, picture, video or dynamic The session content of unrestrained form is converted to the session content of text formatting.Extracted from session content to describe the present embodiment in detail The process of session pair, the present embodiment is illustrated with simple communication side A with the session content of communication other side B, specific as follows:
A:Eat
B:Eat.
B:You
A:Me is helped to pay
A:Take
B:100 yuan are altogether paid.
B:The people of queuing can be so many.
Step S202, judges whether the sentence of the session in session content has the upper of communication other side's transmission in Preset Time interval Text, if nothing, session sentence is defined as initiating sentence;
If so, then judge session sentence whether with communication other side send above without semantic association, if so, then by session sentence really It is set to initiation sentence, otherwise is defined as replying sentence by session sentence.
Specifically, according to above-mentioned judgment rule, it may be determined that initiation sentence and reply sentence in session content, it is assumed that this implementation Example is specifically shown in Table 1 by judging to obtain the initiation sentence in session content and replying sentence.
Table 1
Initiate sentence Reply sentence
Eat Eat.
You 100 yuan are altogether paid.
Me is helped to pay The people of queuing can be so many.
Take
Step S203, judges to initiate whether whether sentence is with complete independent semantic sentence, if so, then judging initiate sentence By multiple there is complete independent semantic simple sentence to constitute, if so, the type for initiating sentence then is defined as into complex sentence initiates sentence type, it is no Then for simple sentence initiates sentence type, if it is not, then judge to initiate whether sentence is included with complete independent semantic simple sentence, if comprising, The type for initiating sentence is defined as non-standard complex sentence and initiates sentence type, if not including, for non-standard simple sentence initiates sentence type;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard simple sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard simple sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if Nothing, then do not carry out deriving extension, if so, then determining whether whether the initiation sentence of non-standard complex sentence initiation sentence type can be with oneself Session continuous above and below sentence be merged into complete independent semantic sentence, if can, non-standard complex sentence is initiated into sentence The type derivative of the initiation sentence of type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out deriving extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether initiate sentence whether can be continuous above and below with oneself Session sentence is merged into the sentence group of semantic association, if so, will then have determined that the type of the initiation sentence of type derives expands to sentence mass-sending First line of a poem type, does not carry out otherwise deriving extension.
Specifically, it is assumed that first differentiation process of the present embodiment first in step S203, judge to initiate sentence Type is as follows, is specifically shown in Table 2.
Table 2
Sequence number Initiate sentence Type
First initiation sentence Eat Simple sentence
Article 2 initiates sentence You Simple sentence
Article 3 initiates sentence Me is helped to pay Non-standard simple sentence
Article 4 initiates sentence Take Non-standard simple sentence
Then, second differentiation process in step S203, i.e., by judging non-standard simple sentence and non-standard complex sentence Whether initiate the initiations sentence of sentence type has a session continuous above and below of oneself, and whether can with oneself above and under Literary continuous session sentence is merged into complete independent semantic sentence, it is determined whether initiate non-standard simple sentence and non-standard complex sentence The type derivative of sentence expands to non-standard sentence mass-sending first line of a poem type.By specific judgement, the Article 3 of the present embodiment and the Initiating sentence for four can be merged into complete independent semantic sentence, namely now Article 3 and Article 4 can be initiated into sentence Type derive and expand to non-standard sentence mass-sending first line of a poem type, be specifically shown in Table 3.
Table 3
Finally, the 3rd in step S203 differentiation process, judges simple sentence, complex sentence, non-standard simple sentence, non-standard multiple Whether the initiation sentence of sentence and non-standard sentence realm type can further derive expands to sentence mass-sending first line of a poem type.
Specifically, it can be seen from table 3, the present embodiment can not will initiate the sentence group that sentence is further merged into semantic association, I.e. in last process, do not carry out further deriving extension to initiating sentence.Therefore the final type such as institute of table 3 for obtaining initiation sentence Show.
Step S204, according to default type judgment rule, it is determined that replying the type of sentence.
The present embodiment determines that the principle and process base of the type of sentence are initiated in the principle and process of the type for replying sentence and determination This is identical, therefore no longer describes in detail, it is assumed that the present embodiment judges that the type for replying sentence is specifically as shown in table 4.
Table 4
Step S205, basic session is extracted according to the reply sentence initiated between sentence and initiation sentence and next initiation sentence It is right.
Specifically, when the present embodiment initiates sentence extraction session pair for first, first determine whether first initiation sentence with Whether one is initiated have reply sentence between sentence, if so, basic session pair is then extracted according to the initiation sentence and the reply sentence, by Initiate have reply sentence between sentence in first and Article 2, then initiate sentence according to first and reply sentence to extract basic session pair. It should be noted that the present embodiment is after it is determined that initiate to include reply sentence between sentence and next initiation sentence, also needs to calculate and initiate Sentence with reply sentence whether semantic association, and only in the case of semantic association, just extract basis session pair, do not extract otherwise. Present embodiment assumes that first is initiated sentence and first reply sentence semantic association, then basic session pair can be extracted, it is assumed that be Basic session is to 1, and basic session is as shown in table 5 to 1 particular content.
Similarly, when the present embodiment is initiated sentence and extracts basic session pair for Article 2, first determine whether Article 2 initiate sentence with Whether Article 3 initiates have reply sentence between sentence, and by judging, Article 2 and Article 3 are initiated not including reply between sentence Sentence, then abandon Article 2 and initiate sentence as initiation sentence.Similarly, sentence is initiated according to Article 3 and Article 4, it is assumed that can extract The basic session of semantic association is to 2, and basic session is as shown in table 5 to 2 particular content.
Table 5
Step S206, the type that sentence is initiated in basic session centering is carried out to derive extension, obtains polytype initiation sentence.
Specifically, six kinds are had due to initiating the type of sentence in the present embodiment, respectively simple sentence, complex sentence, non-standard simple sentence, Non-standard complex sentence, non-standard sentence group and sentence mass-sending first line of a poem type, therefore the present embodiment initiates sentence according to basic session centering first Type carry out deriving extension, due in the present embodiment basic session to the type of the initiation sentence in 1 for simple sentence initiates sentence type, Its cannot further derive be extended to other five kinds initiation sentence types, so when only include a type of initiation sentence, i.e. simple sentence The initiation sentence of sentence type is initiated, it is specific as shown in table 6.And according to basic session to the type of the initiation sentence in 2, can be further Derivative is extended to other kinds of initiation sentence, and such as simple sentence initiates sentence type, specific as shown in table 6.
Table 6
Step S207, the type that sentence is replied in basic session centering is carried out to derive extension, obtains polytype reply sentence.
Specifically, six kinds are had due to replying the type of sentence in the present embodiment, respectively simple sentence, complex sentence, non-standard simple sentence, Non-standard complex sentence, non-standard sentence group and sentence group reply sentence type.Therefore the present embodiment replys sentence according to basic session centering first Type carry out deriving extension, due in the present embodiment basic session to the type of the reply sentence in 1 for simple sentence replys sentence type, Its cannot further derive be extended to other five kinds reply sentence types, so when only include a type of reply sentence, i.e. simple sentence The reply sentence of sentence type is replied, it is specific as shown in table 7.And according to basic session to the type of the reply sentence in 2, can be further Derivative is extended to other kinds of reply sentence, and such as complex sentence replys sentence type, specific as shown in table 7.
Table 7
Step S208, according to polytype initiation sentence and polytype reply sentence, combination at least one is semantic to close The session of connection is to extracting.
Specifically, there was only one kind due to 1, initiating sentence for basic session and replying the type of sentence, so when can only carry A session pair is taken, and is directed to basic session to 2, be various due to initiating the type of sentence and the type of complex sentence, therefore can be combined and obtain Multiple sessions pair are obtained, 8 are specifically shown in Table, table 8 is to 26 sessions pair extracted according to basic session.
Table 8
Step S209, according to default scene tag, collection obtains session pair scene tag value corresponding with scene tag.
Specifically, the present embodiment in collection with session to scene tag value corresponding and corresponding with default scene tag When, scene tag is preset first, then for each session to gathering scene tag corresponding with default scene tag respectively Value.Assuming that the default scene tag of the present embodiment includes session content theme, session intention, place, weather, session communication both sides Relation, the age of communication object, the multiple combination of occupation, then can collect with each session to corresponding scene tag Value, is specifically shown in Table 9.It should be noted that in the present embodiment due to session to 1- sessions to 6 based on session to 2 Derivative extension session pair, thus it is identical to the 2 corresponding scene tag value of scene tag with basic session.Additionally, the present embodiment pin To different dialogues to that can set different scene tags, and the number of the scene tag for setting can also be different.
Table 9
Step S210, it will words carry out matching combination to, scene tag and scene tag value corresponding with scene tag, So as to generate personal exclusive corpus.
Specifically, the present embodiment carries out session to, scene tag and scene tag value corresponding with scene tag With combination, so as to generate personal exclusive corpus, namely combined according to the content of " session right+scene tag+scene tag value " Rule, generates personal exclusive corpus.
The method of the personal exclusive corpus of automatic foundation provided in an embodiment of the present invention, by the session for gathering communication side Hold, obtain the session pair in session content, according to default scene tag, collection obtains session pair field corresponding with scene tag Scape label value and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so that Personal exclusive corpus is generated, is solved existing big using the workload for manually setting up session corpus and is not possessed personal exclusive The technical problem of property.The artificial workload for setting up session corpus is not only greatly reduced, and according in the session of communication side Holding the personal exclusive corpus of the session pair and corresponding scene tag value generation extracted has personal specificity and stronger Specific aim, embody personalized level higher.
Additionally, the present embodiment by according in session content session sentence semanteme, determine in session content initiation sentence and Reply sentence, according to default type judgment rule, it is determined that initiate sentence and reply the type of sentence, according to initiate sentence and initiate sentence with The next reply sentence initiated between sentence extracts basic session pair, and initiates sentence to, basic session centering according to basic session With the type for replying sentence, extract at least one session pair, solve prior art extract session pair difficulty is larger, precision is relatively low Technical problem, broken the limitation of the session to form of traditional question-response, and according to initiating sentence and reply sentence Type, can not only fast and effeciently extract session pair, and the session pair extracted precision and the degree of accuracy also greatly promote.This Outward, for the session sentence of some complicated or non-standard structures, the embodiment of the present invention can precisely extract that integrality is good and practicality is high Session pair so that the session extracted to can the true session of accurate simulation, intelligence degree is higher.Further, this hair The session that bright embodiment is extracted is conducive to dialogue-based obtaining precisely matching intelligent replying content, and matching to various informative Various informative intelligent replying content is obtained, practicality is higher.
Simplify embodiment two
Reference picture 3, the method for the personal exclusive corpus of the automatic foundation that the offer of embodiment two is provided of the invention, including:
Step S301, gathers the session content of communication side.
Specifically, it is assumed that communication side in the present embodiment is A, then can by gather communication side A instant messaging account, The session content that Email Accounts, microblogging number, cell-phone number and other communication other side conversate, obtains the session content of communication side A, Wherein session content is text, picture, voice, video or animation form, and when session content is voice, picture, video or dynamic During unrestrained form, also including the session content of voice, picture, video or animation form to be converted to the session content of text formatting. In order to describe the process that the present embodiment sets up personal exclusive corpus in detail, the present embodiment is with the two parts simply side of communicating A's Session content is illustrated, specific as follows:
Part I (session content of communication side A and communication other side B):
A:How much is a set of for Jun Ge robots shopkeeper
B:Jun Ge robots shop
B:5000 yuan long a set of.
B:Purchase now can also make a call to 8 foldings on the basis of 5000 yuan.
Part II (session content of communication side A and communication other side C):
A:Zhou elder sister exists
C:.
A:The residue degree of your shoulder neck card also has 5 times.
C:I intends reservation and will come in shop to nurse tomorrow.
C:You will be tomorrow in shop
A:I tomorrow can be in shop.
Step S302, obtains the session pair in session content;
Specifically, it is assumed that this implementation is by the semanteme according to session sentence in session content, it may be determined that in session content Initiate sentence and reply sentence, be specifically shown in Table 10.
Table 10
And assume according to default type judgment rule, to determine and initiate sentence in Part I and Part II session content Type with sentence is replied, is specifically shown in Table shown in 11 and table 12.
Table 11
Table 12
Moreover, it is assumed that the present embodiment is according to the reply sentence extraction base initiated between sentence and initiation sentence and next initiation sentence Plinth session pair and sentence to, basic session centering is initiated according to basic session and the type of sentence is replied, finally extract 11 meetings Words are right, are specifically shown in Table 13.
Table 13
Step S303, presets scene tag storehouse, and scene tag storehouse at least includes a scene tag.
Specifically, present embodiment assumes that scene tag storehouse includes at least one scene tag, and assume that scene tag is meeting Words content topic, the time of session communication both sides, place, date, session intention, weather, season, sex, occupation, post, the heart Feelings, hobby, body-sensing data, health status, real-time behavior state, constellation, blood group, the bipartite relation of session communication, Age gap away from, seniority in the family gap, the interval time of both sides' session communication, frequency, time span, the sentence pattern of session content, sentence class, sentence One or more combination in formula structure type, and total amount label.
Step S304, in scene tag library selection with session to the scene tag that associates.
Specifically, the present embodiment selected in scene tag library with session to associate scene tag when, it is necessary to be directed to every One session to choosing scene tag associated with it, and selection and session to associate scene tag when can manually select Select, it is also possible to by the term vector of the scene tag in the term vector and scene tag storehouse that calculate the session content theme of session pair Between the degree of association obtain with session to the scene tag that associate, it is assumed that the present embodiment is by calculating acquisition and each session pair The scene tag of association, it is specific as shown in table 14.Wherein, lower beat " √ " of scene tag in table 14 represents the scene tag and meeting Words are to association.It should be noted that the present embodiment for different sessions to can choose it is different number of with session to associating Scene tag.
Table 14
Step S305, collection obtains session pair scene tag value corresponding with scene tag.
Specifically, the present embodiment is after acquisition and session are to the scene tag that associates, continues to gather and obtains session pair and field The corresponding scene tag value of scape label, namely for each session to gathering the corresponding scene of scene tag associated with it respectively Label value, is specifically shown in Table 15.
Table 15
Step S306, it will words carry out matching combination to, scene tag and scene tag value corresponding with scene tag, So as to generate personal exclusive corpus.
Specifically, the present embodiment carries out session to, scene tag and scene tag value corresponding with scene tag With combination, so as to generate personal exclusive corpus, namely combined according to the content of " session right+scene tag+scene tag value " Rule, generates personal exclusive corpus.
The method of the personal exclusive corpus of automatic foundation provided in an embodiment of the present invention, by the session for gathering communication side Hold, obtain the session pair in session content, according to default scene tag, collection obtains session pair field corresponding with scene tag Scape label value and session is carried out into matching combination to, scene tag and scene tag value corresponding with scene tag, so that Personal exclusive corpus is generated, is solved existing big using the workload for manually setting up session corpus and is not possessed personal exclusive The technical problem of property.The artificial workload for setting up session corpus is not only greatly reduced, and according in the session of communication side Holding the personal exclusive corpus of the session pair and corresponding scene tag value generation extracted has personal specificity and stronger Specific aim, embody personalized level higher.Additionally, the session extracted from session content of the present embodiment is to form and interior Hold various, truer simulation human brain conversation procedure, be that the automatic personal exclusive corpus matching set up of follow-up basis is obtained precisely Reply content lay the foundation.
Additionally, the present embodiment by according in session content session sentence semanteme, determine in session content initiation sentence and Reply sentence, according to default type judgment rule, it is determined that initiate sentence and reply the type of sentence, according to initiate sentence and initiate sentence with The next reply sentence initiated between sentence extracts basic session pair, and initiates sentence to, basic session centering according to basic session With the type for replying sentence, extract at least one session pair, solve prior art extract session pair difficulty is larger, precision is relatively low Technical problem, broken the limitation of the session to form of traditional question-response, and according to initiating sentence and reply sentence Type, can not only fast and effeciently extract session pair, and the session pair extracted precision and the degree of accuracy also greatly promote.This Outward, for the session sentence of some complicated or non-standard structures, the embodiment of the present invention can precisely extract that integrality is good and practicality is high Session pair so that the session extracted to can the true session of accurate simulation, intelligence degree is higher.Further, this hair The session that bright embodiment is extracted is conducive to dialogue-based obtaining precisely matching intelligent replying content, and matching to various informative Various informative intelligent replying content is obtained, practicality is higher.
The preferred embodiments of the present invention are these are only, is not intended to limit the invention, for those skilled in the art For member, the present invention can have various modifications and variations.All any modifications within the spirit and principles in the present invention, made, Equivalent, improvement etc., should be included within the scope of the present invention.

Claims (8)

1. the method for the personal exclusive corpus of a kind of automatic foundation, it is characterised in that including:
Gather the session content of communication side;
Obtain the session pair in the session content;
According to default scene tag, collection obtains the session pair scene tag value corresponding with the scene tag;
The session is carried out into matching combination to, the scene tag and scene tag value corresponding with the scene tag, So as to generate personal exclusive corpus.
2. the method for the personal exclusive corpus of automatic foundation according to claim 1, it is characterised in that obtain the session Session in content is to including:
According to the semanteme of session sentence in the session content, determine the initiation sentence in the session content and reply sentence;
According to default type judgment rule, the type of the initiation sentence and the reply sentence is determined;
Basic session pair is extracted according to the reply sentence that the initiation sentence and initiation sentence and next are initiated between sentence;
Sentence to, the basic session centering is initiated according to the basic session and the type of sentence is replied, at least one session is extracted It is right.
3. the method for the personal exclusive corpus of automatic foundation according to claim 2, it is characterised in that according to the session The semanteme of session sentence in content, determines that the sentence of the initiation in the session content and reply sentence include:
Judge whether the sentence of the session in the session content has communication other side to send above in Preset Time interval, if nothing, Then session sentence is defined as initiating sentence;
If so, then judge session sentence whether with the communication other side send above without semantic association, if so, then will be described Session sentence is defined as initiating sentence, otherwise is defined as replying sentence by session sentence.
4. the method for the personal exclusive corpus of automatic foundation according to claim 3, it is characterised in that according to default class Type judgment rule, determining the type of the initiation sentence includes:
Judge whether whether the initiation sentence is with complete independent semantic sentence, if so, then judging the initiation sentence by many It is individual to be constituted with complete independent semantic simple sentence, if so, the type of the initiation sentence then is defined as into complex sentence initiates sentence type, it is no Then for simple sentence initiates sentence type;If it is not, whether the initiation sentence is then judged comprising having complete independent semantic simple sentence, if bag Contain, then the type of the initiation sentence is defined as into non-standard complex sentence initiates sentence type, if not including, for non-standard simple sentence is initiated Sentence type;
Search for whether the initiation sentence of non-standard simple sentence initiation sentence type has the session continuous above and below sentence of oneself, if Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard simple sentence initiates the initiation sentence of sentence type can be with The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, by non-standard list The type derivative that sentence initiates the initiation sentence of sentence type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out Derivative extension;
Search for whether the initiation sentence of non-standard complex sentence initiation sentence type has the session continuous above and below sentence of oneself, if Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard complex sentence initiates the initiation sentence of sentence type can be with The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, will be non-standard multiple The type derivative that sentence initiates the initiation sentence of sentence type expands to non-standard sentence mass-sending first line of a poem type, if can not, do not carry out Derivative extension;
Whether judge the initiation sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether whether the initiation sentence can be with oneself above and below Continuous session sentence is merged into the sentence group of semantic association, if so, then derive the type of the initiation sentence expanding to the sentence mass-sending first line of a poem Type, does not carry out otherwise deriving extension.
5. the method for the personal exclusive corpus of automatic foundation according to claim 3, it is characterised in that according to default class Type judgment rule, determining the type of the reply sentence includes:
Judge whether whether the reply sentence is with complete independent semantic sentence, if so, then judging the reply sentence by many It is individual to be constituted with complete independent semantic simple sentence, if so, the type of the reply sentence then is defined as into complex sentence replys sentence type, it is no Then for simple sentence replys sentence type;If it is not, whether the reply sentence is then judged comprising having complete independent semantic simple sentence, if bag Contain, then the type of the reply sentence is defined as into non-standard complex sentence replys sentence type, if not including, for non-standard simple sentence is replied Sentence type;
Search for whether the reply sentence of non-standard simple sentence reply sentence type has the session continuous above and below sentence of oneself, if Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard simple sentence replys the reply sentence of sentence type can be with The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, by non-standard list The type derivative of the reply sentence of sentence reply sentence type expands to non-standard sentence group and replys sentence type, if can not, do not carry out Derivative extension;
Search for whether the reply sentence of non-standard complex sentence reply sentence type has the session continuous above and below sentence of oneself, if Whether nothing, then do not carry out deriving extension, if so, then determining whether that non-standard complex sentence replys the reply sentence of sentence type can be with The session continuous above and below sentence of oneself is merged into complete independent semantic sentence, if can, will be non-standard multiple The type derivative of the reply sentence of sentence reply sentence type expands to non-standard sentence group and replys sentence type, if can not, do not carry out Derivative extension;
Whether judge the reply sentence of simple sentence, complex sentence, non-standard simple sentence, non-standard complex sentence and non-standard sentence realm type has certainly Oneself session continuous above and below sentence, if so, then determining whether whether the reply sentence can be with oneself above and below Continuous session sentence is merged into the sentence group of semantic association, and sentence is replied if so, then deriving the type of the reply sentence and expanding to sentence group Type, does not carry out otherwise deriving extension.
6. the method for the personal exclusive corpus of automatic foundation according to claim 5, it is characterised in that according to basic session To, the type of sentence is initiated in the basic session centering and the type of sentence is replied in the basic session centering, extracts at least one Session is to including:
The type that sentence is initiated in the basic session centering is carried out deriving extension, polytype initiation sentence is obtained;
The type that sentence is replied in the basic session centering is carried out deriving extension, polytype reply sentence is obtained;
According to polytype meeting initiated sentence and polytype reply sentence, combine at least one semantic association Words are to extracting.
7. according to the method for the claim 1-6 personal exclusive corpus of any described automatic foundation, it is characterised in that according to pre- If scene tag, collection obtains the session pair scene tag value corresponding with the scene tag to be included:
Default scene tag storehouse, the scene tag storehouse at least includes a scene tag;
Selected in the scene tag storehouse with the session to the scene tag that associates;
Collection obtains the session pair scene tag value corresponding with the scene tag.
8. the method for the personal exclusive corpus of automatic foundation according to claim 7, it is characterised in that the scene tag Including:
Session content theme, the time of session communication both sides, place, date, session intention, weather, season, sex, occupation, duty Business, mood, hobby, body-sensing data, health status, real-time behavior state, constellation, blood group, session communication are bipartite Relation, age gap away from, seniority in the family gap, the interval time of both sides' session communication, frequency, time span, the sentence pattern of session content, sentence One or more combination in class, sentence structure type, and total amount label.
CN201710076038.0A 2017-02-13 2017-02-13 A kind of method of the personal exclusive corpus of automatic foundation Pending CN106874451A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710076038.0A CN106874451A (en) 2017-02-13 2017-02-13 A kind of method of the personal exclusive corpus of automatic foundation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710076038.0A CN106874451A (en) 2017-02-13 2017-02-13 A kind of method of the personal exclusive corpus of automatic foundation

Publications (1)

Publication Number Publication Date
CN106874451A true CN106874451A (en) 2017-06-20

Family

ID=59165937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710076038.0A Pending CN106874451A (en) 2017-02-13 2017-02-13 A kind of method of the personal exclusive corpus of automatic foundation

Country Status (1)

Country Link
CN (1) CN106874451A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197101A (en) * 2017-12-19 2018-06-22 浪潮软件股份有限公司 A kind of corpus labeling method and device
WO2018145436A1 (en) * 2017-02-13 2018-08-16 长沙军鸽软件有限公司 Method for extracting conversation pair from conversation content
CN109388717A (en) * 2018-07-20 2019-02-26 北京智能点科技有限公司 A kind of method and system of Mass production corpus
CN109977390A (en) * 2017-12-27 2019-07-05 北京搜狗科技发展有限公司 A kind of method and device generating text

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390047A (en) * 2013-07-18 2013-11-13 天格科技(杭州)有限公司 Chatting robot knowledge base and construction method thereof
CN103412855A (en) * 2013-06-27 2013-11-27 华中师范大学 Method and system for automatic identification of relative words in complex sentence of modern Chinese language
CN104881402A (en) * 2015-06-02 2015-09-02 北京京东尚科信息技术有限公司 Method and device for analyzing semantic orientation of Chinese network topic comment text
CN105389296A (en) * 2015-12-11 2016-03-09 小米科技有限责任公司 Information partitioning method and apparatus
CN105528403A (en) * 2015-12-02 2016-04-27 小米科技有限责任公司 Target data identification method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412855A (en) * 2013-06-27 2013-11-27 华中师范大学 Method and system for automatic identification of relative words in complex sentence of modern Chinese language
CN103390047A (en) * 2013-07-18 2013-11-13 天格科技(杭州)有限公司 Chatting robot knowledge base and construction method thereof
CN104881402A (en) * 2015-06-02 2015-09-02 北京京东尚科信息技术有限公司 Method and device for analyzing semantic orientation of Chinese network topic comment text
CN105528403A (en) * 2015-12-02 2016-04-27 小米科技有限责任公司 Target data identification method and apparatus
CN105389296A (en) * 2015-12-11 2016-03-09 小米科技有限责任公司 Information partitioning method and apparatus

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018145436A1 (en) * 2017-02-13 2018-08-16 长沙军鸽软件有限公司 Method for extracting conversation pair from conversation content
CN108197101A (en) * 2017-12-19 2018-06-22 浪潮软件股份有限公司 A kind of corpus labeling method and device
CN108197101B (en) * 2017-12-19 2021-09-14 浪潮软件股份有限公司 Corpus labeling method and apparatus
CN109977390A (en) * 2017-12-27 2019-07-05 北京搜狗科技发展有限公司 A kind of method and device generating text
CN109977390B (en) * 2017-12-27 2023-11-03 北京搜狗科技发展有限公司 Method and device for generating text
CN109388717A (en) * 2018-07-20 2019-02-26 北京智能点科技有限公司 A kind of method and system of Mass production corpus

Similar Documents

Publication Publication Date Title
CN104598445B (en) Automatically request-answering system and method
CN106874452A (en) A kind of method for obtaining session reply content
CN106874451A (en) A kind of method of the personal exclusive corpus of automatic foundation
CN106709072A (en) Method of obtaining intelligent conversation reply content based on shared corpora
CN105931638A (en) Intelligent-robot-oriented dialog system data processing method and device
CN106407178A (en) Session abstract generation method and device
CN107123057A (en) User recommends method and device
CN107103083A (en) A kind of method that robot realizes intelligent session
CN106653016A (en) Intelligent interaction method and intelligent interaction device
JP2021108142A (en) Information processing system, information processing method, and information processing program
CN103425982A (en) Information processing apparatus, information processing method, and program
CN108304424B (en) Text keyword extraction method and text keyword extraction device
CN111798279A (en) Dialog-based user portrait generation method and apparatus
CN107861961A (en) Dialog information generation method and device
CN106649410B (en) Method and device for obtaining chat reply content
CN107623621A (en) Language material collection method of chatting and device
CN102999507A (en) Recommendation processing method and device for information of network microblog celebrities
CN110209778A (en) A kind of method and relevant apparatus of dialogue generation
CN103294725A (en) Intelligent response robot software
CN106844735A (en) A kind of method of the personal exclusive corpus of automatic foundation
CN104702759A (en) Address list setting method and address list setting device
JP2019036171A (en) System for assisting in creation of interaction scenario corpus
CN112287082A (en) Data processing method, device, equipment and storage medium combining RPA and AI
CN106844734A (en) A kind of method for automatically generating session reply content
CN106356056B (en) Audio recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170620