CN107766565A - Conversational character differentiating method and system - Google Patents
Conversational character differentiating method and system Download PDFInfo
- Publication number
- CN107766565A CN107766565A CN201711088425.2A CN201711088425A CN107766565A CN 107766565 A CN107766565 A CN 107766565A CN 201711088425 A CN201711088425 A CN 201711088425A CN 107766565 A CN107766565 A CN 107766565A
- Authority
- CN
- China
- Prior art keywords
- dialogue
- words
- corpus
- conversational character
- analyzed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of conversational character differentiating method, methods described includes:Obtain all words of the dialogue of dialogue to be analyzed;All words for being analysed to the dialogue of dialogue respectively according to the term vector model pre-established are converted to term vector, to obtain all term vectors of the dialogue of dialogue to be analyzed;The label of conversational character corresponding with the dialogue of dialogue to be analyzed is obtained according to all term vectors of the dialogue of dialogue to be analyzed and the conversational character discrimination model established previously according to the first corpus;Wherein, the first corpus includes the dialogue language material of dialogue art to be analyzed;The dialogue of first corpus includes the label of more standard dialogues and the conversational character corresponding to each sentence standard dialogue;Conversational character in the dialogue to be analyzed is distinguished according to the label of conversational character corresponding to all dialogues of the dialogue to be analyzed recognized.The conversational character differentiating method of the present invention realizes the differentiation of conversational character, while present invention also offers a kind of conversational character compartment system.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to conversational character differentiating method and system.
Background technology
Usual conversation content can be related to more than two conversational characters, be needed in some occasions for some conversational character
Dialogue is analyzed, therefore, it is necessary to conversational character is made a distinction.
Traditional conversational character differentiating method mainly distinguishes speaker's identity by identifying the vocal print feature of speaker, and
The text of conversation content is entered by row label according to speaker's identity when voice is converted into text.
Inventor is in implementing the present invention, it may, have found that existing conversational character differentiating method has as a drawback that:
Existing conversational character differentiating method needs to be acquired the vocal print feature of different people, and it is easily by the body of different people
The influence of the factors such as body situation, age, mood and the interference of environmental noise, in addition, people in the case of speaker is mixed
Vocal print feature is not easy to extract, and causes conversational character differentiating method to realize that difficulty is big, accuracy is low.
The content of the invention
The present invention proposes conversational character differentiating method and system, realizes the differentiation of conversational character, improves accuracy.
One aspect of the present invention provides a kind of conversational character differentiating method, and methods described includes:
Obtain all words of the dialogue of dialogue to be analyzed;
All words of the dialogue of the dialogue to be analyzed are converted to by word according to the term vector model pre-established respectively
Vector, to obtain all term vectors of the dialogue of the dialogue to be analyzed;
According to all term vectors of the dialogue of the dialogue to be analyzed and the dialogue angle established previously according to the first corpus
Color discrimination model obtains the label of conversational character corresponding with the dialogue of the dialogue to be analyzed;Wherein, first corpus
Include the dialogue language material of the dialogue art to be analyzed;The dialogue language material of first corpus includes more standard dialogues
With the label of the conversational character corresponding to each sentence standard dialogue;
Treated point according to being distinguished the label of conversational character corresponding to all dialogues of the dialogue to be analyzed recognized
Conversational character in analysis dialogue.
In a kind of optional embodiment, methods described also includes:
Instruction in response to establishing conversational character discrimination model, all standard dialogues of first corpus are carried out in advance
Processing, to obtain all words of each sentence standard dialogue;
All words of each sentence standard dialogue are converted to by term vector according to the term vector model respectively, to obtain
All term vectors of each sentence standard dialogue;
Based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
According to all term vectors of each sentence standard dialogue and the dialogue angle corresponding to each sentence standard dialogue
The label of color is trained to the deep-cycle neural network model, to obtain the differentiation of the conversational character of first corpus
Model.
In a kind of optional embodiment, methods described also includes:
Instruction in response to establishing term vector model, obtain the second corpus;Wherein, second corpus includes multiple
The language material in field;
All texts of second corpus are segmented, stop words is removed and goes distracter to handle, it is described to obtain
All words of second corpus;
Based on word2vec algorithms, the term vector model is established according to all words of second corpus.
In a kind of optional embodiment, the instruction in response to establishing conversational character discrimination model, to described the
All standard dialogues of one corpus are pre-processed, to obtain all words of each sentence standard dialogue, including:
Instruction in response to establishing conversational character discrimination model, according to belonging to stammerer Words partition system and the dialogue to be analyzed
The vocabulary in field segments to all standard dialogues of first corpus respectively, to obtain each sentence standard dialogue
All words to be selected;
Stop words is rejected in all words to be selected of each sentence standard dialogue, to obtain each sentence standard dialogue
All words.
In a kind of optional embodiment, described rejected in all words to be selected of each sentence standard dialogue disables
Word, to obtain all words of each sentence standard dialogue, including:
Stop words is rejected in all words to be selected of each sentence standard dialogue according to vocabulary is disabled, to obtain each sentence institute
State all words of standard dialogue.
Another aspect of the present invention also provides a kind of conversational character compartment system, and the system includes:
First acquisition module, all words of the dialogue for obtaining dialogue to be analyzed;
Characteristic matching module, for according to the term vector model that pre-establishes respectively by the dialogue of the dialogue to be analyzed
All words are converted to term vector, to obtain all term vectors of the dialogue of the dialogue to be analyzed;
Label acquisition module, for all term vectors of the dialogue according to the dialogue to be analyzed and previously according to the first language
The conversational character discrimination model for expecting to establish in storehouse obtains the label of conversational character corresponding with the dialogue of the dialogue to be analyzed;Its
In, first corpus includes the dialogue language material of the dialogue art to be analyzed;First corpus to language
Material includes the label of more standard dialogues and the conversational character corresponding to each sentence standard dialogue;
Role's discriminating module, for conversational character corresponding to all dialogues according to the dialogue to be analyzed recognized
Label distinguishes the conversational character in the dialogue to be analyzed.
As it is highly preferred that the system also includes:
First pretreatment module, for the instruction in response to establishing conversational character discrimination model, to first corpus
All standard dialogues pre-processed, to obtain all words of each sentence standard dialogue;
First term vector module, for according to the term vector model respectively by all words of each sentence standard dialogue
Term vector is converted to, to obtain all term vectors of each sentence standard dialogue;
Model construction module, for based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
Model training module, for all term vectors according to each sentence standard dialogue and it is described correspond to each sentence described in
The label of the conversational character of standard dialogue is trained to the deep-cycle neural network model, to obtain first language material
The conversational character discrimination model in storehouse.
In a kind of optional embodiment, the system also includes:
Second corpus acquisition module, for the instruction in response to establishing term vector model, obtain the second corpus;Its
In, second corpus includes the language material of multiple fields;
Second pretreatment module, for being segmented to all texts of second corpus, removing stop words and go to do
Item processing is disturbed, to obtain all words of second corpus;
Term vector model building module, for based on word2vec algorithms, according to all words of second corpus
Establish the term vector model.
In a kind of optional embodiment, first pretreatment module includes:
Participle unit, for the instruction in response to establishing conversational character discrimination model, according to stammerer Words partition system and described
The vocabulary of dialogue art to be analyzed segments to all standard dialogues of first corpus respectively, each to obtain
All words to be selected of the sentence standard dialogue;
Stop words unit is removed, for rejecting stop words in all words to be selected of each sentence standard dialogue, to obtain
All words of each sentence standard dialogue.
It is described to go stop words unit to include in a kind of optional embodiment:
Stop words subelement is removed, for being rejected according to deactivation vocabulary in all words to be selected of each sentence standard dialogue
Stop words, to obtain all words of each sentence standard dialogue.
Compared to prior art, the present invention has beneficial effect prominent as follows:The invention provides a kind of conversational character
Differentiating method and system, wherein method include:Obtain all words of the dialogue of dialogue to be analyzed;According to the word pre-established to
All words of the dialogue of the dialogue to be analyzed are converted to term vector by amount model respectively, to obtain the dialogue to be analyzed
All term vectors of dialogue;Established according to all term vectors of the dialogue of the dialogue to be analyzed and previously according to the first corpus
Conversational character discrimination model obtain the label of corresponding with the dialogue of the dialogue to be analyzed conversational character;Wherein, described
One corpus includes the dialogue language material of the dialogue art to be analyzed;The dialogue language material of first corpus includes more
The label of standard dialogue and conversational character corresponding to each sentence standard dialogue;According to the dialogue to be analyzed recognized
The label of conversational character corresponding to all dialogues distinguishes the conversational character in the dialogue to be analyzed.Dialogue angle provided by the invention
Color differentiating method and system, it is semantic similar between word and word so as to obtain by the way that all words are converted into term vector
Property, then combined with conversational character discrimination model to obtain the label of conversational character, improve the accuracy that conversational character is distinguished;Pass through
The conversational character discrimination model established previously according to the first corpus, come relative to the method that conversational character is judged using keyword
Say, can more identify the label of conversational character exactly, so as to improve the discriminant accuracy of conversational character.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the first embodiment of conversational character differentiating method provided by the invention;
Fig. 2 is the structural representation of the first embodiment of conversational character compartment system provided by the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made
Embodiment, belong to the scope of protection of the invention.
It is the schematic flow sheet of the first embodiment of conversational character differentiating method provided by the invention referring to Fig. 1, the side
Method includes:
S101, obtain all words of the dialogue of dialogue to be analyzed;
S102, all words of the dialogue of the dialogue to be analyzed are changed respectively according to the term vector model pre-established
For term vector, to obtain all term vectors of the dialogue of the dialogue to be analyzed;
S103, according to all term vectors of the dialogue of the dialogue to be analyzed and pair established previously according to the first corpus
Talk about the label that role's discrimination model obtains conversational character corresponding with the dialogue of the dialogue to be analyzed;Wherein, first language
Expect that storehouse includes the dialogue language material of the dialogue art to be analyzed;The dialogue language material of first corpus includes more standards
The label of dialogue and conversational character corresponding to each sentence standard dialogue;
S104, according to being distinguished the label of conversational character corresponding to all dialogues of the dialogue to be analyzed recognized
Conversational character in dialogue to be analyzed
It should be noted that the field refers to application field, such as being related to treating point for air-conditioning sales applications field
Analysis dialogue, the language material of the first language material library storage are the language material in air-conditioning sales applications field, in actual applications, can by news,
The carrier of characters such as magazine, webpage obtain the dialogue language material related to air-conditioning sales applications field.The first language material library storage
It should include all dialogue angles of the dialogue of the dialogue to be analyzed corresponding to the label of the conversational character of each sentence standard dialogue
The label of color;If for example, the conversational character of the dialogue to be analyzed includes customer service and client, the dialogue of the first language material library storage
Language material is customer service and the language material of client, and the label corresponding to the conversational character of the standard dialogue is customer service or the label of client.
I.e. because term vector can represent semantic, therefore, by the way that all words are converted into term vector, not only reduce and calculate
The intractability of machine, and the similitude between word and word is obtained, then combined with conversational character discrimination model to obtain
The label of conversational character, improve the accuracy that conversational character is distinguished;Pass through the conversational character established previously according to the first corpus
Discrimination model, for the method for conversational character is judged using keyword, the label of conversational character can be more identified exactly,
So as to improve the discriminant accuracy of conversational character.
In a kind of optional embodiment, methods described also includes:In all words for the dialogue for obtaining dialogue to be analyzed
Before language, dialogic voice is obtained;The dialogic voice is converted into text, to obtain the dialogue to be analyzed.
In a kind of optional embodiment, methods described also includes:
Instruction in response to establishing conversational character discrimination model, all standard dialogues of first corpus are carried out in advance
Processing, to obtain all words of each sentence standard dialogue;
All words of each sentence standard dialogue are converted to by term vector according to the term vector model respectively, to obtain
All term vectors of each sentence standard dialogue;
Based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
According to all term vectors of each sentence standard dialogue and the dialogue angle corresponding to each sentence standard dialogue
The label of color is trained to the deep-cycle neural network model, to obtain the differentiation of the conversational character of first corpus
Model.
I.e. by being trained to the deep-cycle neural network model built based on long memory network algorithm in short-term, overcome
Conventional method feature extraction deficiency, the defects of understandability is weak, so that the conversational character discrimination model established has more
High differentiation accuracy rate.
In a kind of optional embodiment, all standard dialogues to first corpus pre-process,
To obtain all words of each sentence standard dialogue, including:
All dialogue language materials of first corpus are segmented, replace unusual word, to obtain each sentence standard
All words of dialogue.
In a kind of optional embodiment, methods described also includes:
Instruction in response to establishing term vector model, obtain the second corpus;Wherein, second corpus includes multiple
The language material in field;
All texts of second corpus are segmented, stop words is removed and goes distracter to handle, it is described to obtain
All words of second corpus;
Based on word2vec (word to vector words steering volume) algorithm, according to all words of second corpus
Language establishes the term vector model.
Term vector model is established by word2vec algorithms, the spy of word due to word2vec algorithm high efficiency extractions
, can be by term vector mould without retaining stop words when levying, therefore all texts of second corpus being pre-processed
Type accurately represents the semantic relation with word, reduces data volume, improves the accuracy and efficiency of conversational character differentiation;Pass through
Second corpus establishes term vector model, and establishes conversational character discrimination model by the first corpus, is easy to use difference
Corpus training is separated and independently performed to term vector model and conversational character discrimination model so that the former need not be by the latter
Constraint so that the term vector model of foundation has preferable durability.
In a kind of optional embodiment, the instruction in response to establishing conversational character discrimination model, to described the
All standard dialogues of one corpus are pre-processed, to obtain all words of each sentence standard dialogue, including:
Instruction in response to establishing conversational character discrimination model, according to belonging to stammerer Words partition system and the dialogue to be analyzed
The vocabulary in field segments to all standard dialogues of first corpus respectively, to obtain each sentence standard dialogue
All words to be selected;
Stop words is rejected in all words to be selected of each sentence standard dialogue, to obtain each sentence standard dialogue
All words.
Segmented by the vocabulary of stammer Words partition system and the dialogue art to be analyzed, be easy to quick standard
Really obtain all words to be selected of each sentence standard dialogue, then by rejecting stop words, avoid by text analyzing almost
Inoperative word substitutes into Data processing, reduces data volume on the premise of valid data are ensured, improves dialogue angle
The accuracy and efficiency that color is distinguished.
In a kind of optional embodiment, described rejected in all words to be selected of each sentence standard dialogue disables
Word, to obtain all words of each sentence standard dialogue, including:
Stop words is rejected in all words to be selected of each sentence standard dialogue according to vocabulary is disabled, to obtain each sentence institute
State all words of standard dialogue.
Stop words is quickly rejected by disabling vocabulary, improves the efficiency that conversational character is distinguished.
Referring to Fig. 2, it is the structural representation of the first embodiment of conversational character compartment system provided by the invention, described
System includes:
First acquisition module 201, all words of the dialogue for obtaining dialogue to be analyzed;
Characteristic matching module 202, for according to the term vector model that pre-establishes respectively by pair of the dialogue to be analyzed
White all words are converted to term vector, to obtain all term vectors of the dialogue of the dialogue to be analyzed;
Label acquisition module 203, for all term vectors of the dialogue according to the dialogue to be analyzed and previously according to
The conversational character discrimination model that one corpus is established obtains the label of conversational character corresponding with the dialogue of the dialogue to be analyzed;
Wherein, first corpus includes the dialogue language material of the dialogue art to be analyzed;The dialogue of first corpus
Language material includes the label of more standard dialogues and the conversational character corresponding to each sentence standard dialogue;
Role's discriminating module 204, for talking with angle corresponding to all dialogues according to the dialogue to be analyzed recognized
The label of color distinguishes the conversational character in the dialogue to be analyzed.
It should be noted that the field refers to application field, such as being related to treating point for air-conditioning sales applications field
Analysis dialogue, the language material of the first language material library storage are the language material in air-conditioning sales applications field, in actual applications, can by news,
The carrier of characters such as magazine, webpage obtain the dialogue language material related to air-conditioning sales applications field.The first language material library storage
It should include all dialogue angles of the dialogue of the dialogue to be analyzed corresponding to the label of the conversational character of each sentence standard dialogue
The label of color;If for example, the conversational character of the dialogue to be analyzed includes customer service and client, the dialogue of the first language material library storage
Language material is customer service and the language material of client, and the label corresponding to the conversational character of the standard dialogue is customer service or the label of client.
I.e. because term vector can represent semantic, therefore, by the way that all words are converted into term vector, not only reduce and calculate
The intractability of machine, and the similitude between word and word is obtained, then combined with conversational character discrimination model to obtain
The label of conversational character, improve the accuracy that conversational character is distinguished;Pass through the conversational character established previously according to the first corpus
Discrimination model, for the method for conversational character is judged using keyword, the label of conversational character can be more identified exactly,
So as to improve the discriminant accuracy of conversational character.
In a kind of optional embodiment, the system also includes:Dialogic voice acquisition module, for treating point obtaining
Before all words for analysing the dialogue of dialogue, dialogic voice is obtained;Voice conversion module, for the dialogic voice to be converted into
Text, to obtain the dialogue to be analyzed.
In a kind of optional embodiment, the system also includes:
First pretreatment module, for the instruction in response to establishing conversational character discrimination model, to first corpus
All standard dialogues pre-processed, to obtain all words of each sentence standard dialogue;
First term vector module, for according to the term vector model respectively by all words of each sentence standard dialogue
Term vector is converted to, to obtain all term vectors of each sentence standard dialogue;
Model construction module, for based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
Model training module, for all term vectors according to each sentence standard dialogue and it is described correspond to each sentence described in
The label of the conversational character of standard dialogue is trained to the deep-cycle neural network model, to obtain first language material
The conversational character discrimination model in storehouse.
I.e. by being trained to the deep-cycle neural network model built based on long memory network algorithm in short-term, overcome
Conventional method feature extraction deficiency, the defects of understandability is weak, so that the conversational character discrimination model established has more
High differentiation accuracy rate.
In a kind of optional embodiment, the system also includes:
Second corpus acquisition module, for the instruction in response to establishing term vector model, obtain the second corpus;Its
In, second corpus includes the language material of multiple fields;
Second pretreatment module, for being segmented to all texts of second corpus, removing stop words and go to do
Item processing is disturbed, to obtain all words of second corpus;
Term vector model building module, for based on word2vec algorithms, according to all words of second corpus
Establish the term vector model.
Term vector model is established by word2vec algorithms, the spy of word due to word2vec algorithm high efficiency extractions
, can be by term vector mould without retaining stop words when levying, therefore all texts of second corpus being pre-processed
Type accurately represents the semantic relation with word, reduces data volume, improves the accuracy and efficiency of conversational character differentiation;Pass through
Second corpus establishes term vector model, and establishes conversational character discrimination model by the first corpus, is easy to use difference
Corpus training is separated and independently performed to term vector model and conversational character discrimination model so that the former need not be by the latter
Constraint so that the term vector model of foundation has preferable durability.
In a kind of optional embodiment, first pretreatment module includes:
Participle unit, for the instruction in response to establishing conversational character discrimination model, according to stammerer Words partition system and described
The vocabulary of dialogue art to be analyzed segments to all standard dialogues of first corpus respectively, each to obtain
All words to be selected of the sentence standard dialogue;
Stop words unit is removed, for rejecting stop words in all words to be selected of each sentence standard dialogue, to obtain
All words of each sentence standard dialogue.
Segmented by the vocabulary of stammer Words partition system and the dialogue art to be analyzed, be easy to quick standard
Really obtain all words to be selected of each sentence standard dialogue, then by rejecting stop words, avoid by text analyzing almost
Inoperative word substitutes into Data processing, reduces data volume on the premise of valid data are ensured, improves dialogue angle
The accuracy and efficiency that color is distinguished.
It is described to go stop words unit to include in a kind of optional embodiment:
Stop words subelement is removed, for being rejected according to deactivation vocabulary in all words to be selected of each sentence standard dialogue
Stop words, to obtain all words of each sentence standard dialogue.
Stop words is quickly rejected by disabling vocabulary, improves the efficiency that conversational character is distinguished.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with
The hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory medium
In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Described storage medium can be magnetic disc, light
Disk, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory,
RAM) etc..
Described above is the preferred embodiment of the present invention, it is noted that for those skilled in the art
For, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as
Protection scope of the present invention.
Claims (10)
1. a kind of conversational character differentiating method, it is characterised in that methods described includes:
Obtain all words of the dialogue of dialogue to be analyzed;
All words of the dialogue of the dialogue to be analyzed are converted to by term vector according to the term vector model pre-established respectively,
To obtain all term vectors of the dialogue of the dialogue to be analyzed;
Sentenced according to all term vectors of the dialogue of the dialogue to be analyzed and the conversational character established previously according to the first corpus
Other model obtains the label of conversational character corresponding with the dialogue of the dialogue to be analyzed;Wherein, first corpus includes
The dialogue language material of the dialogue art to be analyzed;The dialogue language material of first corpus includes more standard dialogues and right
Should be in the label of the conversational character of each sentence standard dialogue;
It is described to be analyzed right to be distinguished according to the label of conversational character corresponding to all dialogues of the dialogue to be analyzed recognized
Conversational character in words.
2. conversational character differentiating method as claimed in claim 1, it is characterised in that methods described also includes:
Instruction in response to establishing conversational character discrimination model, all standard dialogues of first corpus are located in advance
Reason, to obtain all words of each sentence standard dialogue;
All words of each sentence standard dialogue are converted to by term vector according to the term vector model respectively, to obtain each sentence
All term vectors of the standard dialogue;
Based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
According to all term vectors of each sentence standard dialogue and the conversational character corresponding to each sentence standard dialogue
Label is trained to the deep-cycle neural network model, differentiates mould to obtain the conversational character of first corpus
Type.
3. conversational character differentiating method as claimed in claim 2, it is characterised in that methods described also includes:
Instruction in response to establishing term vector model, obtain the second corpus;Wherein, second corpus includes multiple fields
Language material;
All texts of second corpus are segmented, stop words is removed and goes distracter to handle, to obtain described second
All words of corpus;
Based on word2vec algorithms, the term vector model is established according to all words of second corpus.
4. conversational character differentiating method as claimed in claim 2, it is characterised in that described to differentiate in response to establishing conversational character
The instruction of model, all standard dialogues of first corpus are pre-processed, to obtain each sentence standard dialogue
All words, including:
Instruction in response to establishing conversational character discrimination model, according to stammerer Words partition system and the dialogue art to be analyzed
Vocabulary all standard dialogues of first corpus are segmented respectively, to obtain the institute of each sentence standard dialogue
There is word to be selected;
Stop words is rejected in all words to be selected of each sentence standard dialogue, to obtain all of each sentence standard dialogue
Word.
5. conversational character differentiating method as claimed in claim 4, it is characterised in that the institute in each sentence standard dialogue
Have in word to be selected and reject stop words, to obtain all words of each sentence standard dialogue, including:
Stop words is rejected in all words to be selected of each sentence standard dialogue according to vocabulary is disabled, to obtain each sentence mark
All words of quasi- dialogue.
6. a kind of conversational character compartment system, it is characterised in that the system includes:
First acquisition module, all words of the dialogue for obtaining dialogue to be analyzed;
Characteristic matching module, for according to the term vector model that pre-establishes respectively by all of the dialogue of the dialogue to be analyzed
Word is converted to term vector, to obtain all term vectors of the dialogue of the dialogue to be analyzed;
Label acquisition module, for all term vectors of the dialogue according to the dialogue to be analyzed and previously according to the first corpus
The conversational character discrimination model of foundation obtains the label of conversational character corresponding with the dialogue of the dialogue to be analyzed;Wherein, institute
State the dialogue language material that the first corpus includes the dialogue art to be analyzed;The dialogue language material of first corpus includes
The label of more standard dialogues and conversational character corresponding to each sentence standard dialogue;
Role's discriminating module, the label for conversational character corresponding to all dialogues according to the dialogue to be analyzed recognized
Distinguish the conversational character in the dialogue to be analyzed.
7. conversational character compartment system as claimed in claim 6, it is characterised in that the system also includes:
First pretreatment module, for the instruction in response to establishing conversational character discrimination model, to the institute of first corpus
There is standard dialogue to be pre-processed, to obtain all words of each sentence standard dialogue;
First term vector module, for respectively being changed all words of each sentence standard dialogue according to the term vector model
For term vector, to obtain all term vectors of each sentence standard dialogue;
Model construction module, for based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
Model training module, for all term vectors according to each sentence standard dialogue and described correspond to each sentence standard
The label of the conversational character of dialogue is trained to the deep-cycle neural network model, to obtain first corpus
Conversational character discrimination model.
8. conversational character compartment system as claimed in claim 7, it is characterised in that the system also includes:
Second corpus acquisition module, for the instruction in response to establishing term vector model, obtain the second corpus;Wherein, institute
Stating the second corpus includes the language material of multiple fields;
Second pretreatment module, for being segmented to all texts of second corpus, removing stop words and remove distracter
Processing, to obtain all words of second corpus;
Term vector model building module, for based on word2vec algorithms, being established according to all words of second corpus
The term vector model.
9. conversational character compartment system as claimed in claim 7, it is characterised in that first pretreatment module includes:
Participle unit, for the instruction in response to establishing conversational character discrimination model, according to stammerer Words partition system and described treat point
The vocabulary of analysis dialogue art segments to all standard dialogues of first corpus respectively, to obtain each sentence institute
State all words to be selected of standard dialogue;
Stop words unit is removed, for rejecting stop words in all words to be selected of each sentence standard dialogue, to obtain each sentence
All words of the standard dialogue.
10. conversational character compartment system as claimed in claim 8, it is characterised in that described to go stop words unit to include:
Stop words subelement is removed, for rejecting deactivation in all words to be selected of each sentence standard dialogue according to deactivation vocabulary
Word, to obtain all words of each sentence standard dialogue.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711088425.2A CN107766565A (en) | 2017-11-06 | 2017-11-06 | Conversational character differentiating method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711088425.2A CN107766565A (en) | 2017-11-06 | 2017-11-06 | Conversational character differentiating method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107766565A true CN107766565A (en) | 2018-03-06 |
Family
ID=61272829
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711088425.2A Pending CN107766565A (en) | 2017-11-06 | 2017-11-06 | Conversational character differentiating method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107766565A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112270167A (en) * | 2020-10-14 | 2021-01-26 | 北京百度网讯科技有限公司 | Role labeling method and device, electronic equipment and storage medium |
CN112270169A (en) * | 2020-10-14 | 2021-01-26 | 北京百度网讯科技有限公司 | Dialogue role prediction method and device, electronic equipment and storage medium |
CN112270198A (en) * | 2020-10-27 | 2021-01-26 | 北京百度网讯科技有限公司 | Role determination method and device, electronic equipment and storage medium |
CN112861509A (en) * | 2021-02-08 | 2021-05-28 | 青牛智胜(深圳)科技有限公司 | Role analysis method and system based on multi-head attention mechanism |
CN113744742A (en) * | 2020-05-29 | 2021-12-03 | 中国电信股份有限公司 | Role identification method, device and system in conversation scene |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050065815A1 (en) * | 2003-09-19 | 2005-03-24 | Mazar Scott Thomas | Information management system and method for an implantable medical device |
CN1852354A (en) * | 2005-10-17 | 2006-10-25 | 华为技术有限公司 | Method and device for collecting user behavior characteristics |
US20140358539A1 (en) * | 2013-05-29 | 2014-12-04 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for building a language model |
CN105786782A (en) * | 2016-03-25 | 2016-07-20 | 北京搜狗科技发展有限公司 | Word vector training method and device |
CN105868184A (en) * | 2016-05-10 | 2016-08-17 | 大连理工大学 | Chinese name recognition method based on recurrent neural network |
CN106776572A (en) * | 2016-12-27 | 2017-05-31 | 竹间智能科技(上海)有限公司 | A kind of people claims recognition methods |
CN107295149A (en) * | 2016-03-30 | 2017-10-24 | 北京搜狗科技发展有限公司 | A kind for the treatment of method and apparatus of strange phone |
-
2017
- 2017-11-06 CN CN201711088425.2A patent/CN107766565A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050065815A1 (en) * | 2003-09-19 | 2005-03-24 | Mazar Scott Thomas | Information management system and method for an implantable medical device |
CN1852354A (en) * | 2005-10-17 | 2006-10-25 | 华为技术有限公司 | Method and device for collecting user behavior characteristics |
US20140358539A1 (en) * | 2013-05-29 | 2014-12-04 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for building a language model |
CN105786782A (en) * | 2016-03-25 | 2016-07-20 | 北京搜狗科技发展有限公司 | Word vector training method and device |
CN107295149A (en) * | 2016-03-30 | 2017-10-24 | 北京搜狗科技发展有限公司 | A kind for the treatment of method and apparatus of strange phone |
CN105868184A (en) * | 2016-05-10 | 2016-08-17 | 大连理工大学 | Chinese name recognition method based on recurrent neural network |
CN106776572A (en) * | 2016-12-27 | 2017-05-31 | 竹间智能科技(上海)有限公司 | A kind of people claims recognition methods |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113744742A (en) * | 2020-05-29 | 2021-12-03 | 中国电信股份有限公司 | Role identification method, device and system in conversation scene |
CN113744742B (en) * | 2020-05-29 | 2024-01-30 | 中国电信股份有限公司 | Role identification method, device and system under dialogue scene |
CN112270167A (en) * | 2020-10-14 | 2021-01-26 | 北京百度网讯科技有限公司 | Role labeling method and device, electronic equipment and storage medium |
CN112270169A (en) * | 2020-10-14 | 2021-01-26 | 北京百度网讯科技有限公司 | Dialogue role prediction method and device, electronic equipment and storage medium |
CN112270169B (en) * | 2020-10-14 | 2023-07-25 | 北京百度网讯科技有限公司 | Method and device for predicting dialogue roles, electronic equipment and storage medium |
US11907671B2 (en) | 2020-10-14 | 2024-02-20 | Beijing Baidu Netcom Science Technology Co., Ltd. | Role labeling method, electronic device and storage medium |
CN112270198A (en) * | 2020-10-27 | 2021-01-26 | 北京百度网讯科技有限公司 | Role determination method and device, electronic equipment and storage medium |
CN112270198B (en) * | 2020-10-27 | 2021-08-17 | 北京百度网讯科技有限公司 | Role determination method and device, electronic equipment and storage medium |
CN112861509A (en) * | 2021-02-08 | 2021-05-28 | 青牛智胜(深圳)科技有限公司 | Role analysis method and system based on multi-head attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107766565A (en) | Conversational character differentiating method and system | |
CN107885723A (en) | Conversational character differentiating method and system | |
CN106504768A (en) | Phone testing audio frequency classification method and device based on artificial intelligence | |
CN104167208A (en) | Speaker recognition method and device | |
CN110390946A (en) | A kind of audio signal processing method, device, electronic equipment and storage medium | |
CN108648760B (en) | Real-time voiceprint identification system and method | |
CN103365834B (en) | Language Ambiguity eliminates system and method | |
CN105931637A (en) | User-defined instruction recognition speech photographing system | |
CN103474061A (en) | Automatic distinguishing method based on integration of classifier for Chinese dialects | |
Akhtiamov et al. | Speech and Text Analysis for Multimodal Addressee Detection in Human-Human-Computer Interaction. | |
WO2023088448A1 (en) | Speech processing method and device, and storage medium | |
Zhang et al. | Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features. | |
CN111128128A (en) | Voice keyword detection method based on complementary model scoring fusion | |
CN111274390B (en) | Emotion cause determining method and device based on dialogue data | |
Zheng et al. | Acoustic texttiling for story segmentation of spoken documents | |
Irtza et al. | A hierarchical framework for language identification | |
Ramaiah et al. | Accent detection in handwriting based on writing styles | |
Jokinen et al. | Variation in Spoken North Sami Language. | |
CN113255362A (en) | Method and device for filtering and identifying human voice, electronic device and storage medium | |
CN107480128A (en) | The segmenting method and device of Chinese text | |
CN108563688B (en) | Emotion recognition method for movie and television script characters | |
CN109902306A (en) | A kind of audio recognition method, device, storage medium and speech ciphering equipment | |
CN107133226A (en) | A kind of method and device for distinguishing theme | |
Bock et al. | Assessing the efficacy of benchmarks for automatic speech accent recognition | |
CN115063155A (en) | Data labeling method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180306 |