CN107766565A - Conversational character differentiating method and system - Google Patents

Conversational character differentiating method and system Download PDF

Info

Publication number
CN107766565A
CN107766565A CN201711088425.2A CN201711088425A CN107766565A CN 107766565 A CN107766565 A CN 107766565A CN 201711088425 A CN201711088425 A CN 201711088425A CN 107766565 A CN107766565 A CN 107766565A
Authority
CN
China
Prior art keywords
dialogue
words
corpus
conversational character
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711088425.2A
Other languages
Chinese (zh)
Inventor
英高海
林载辉
赵舒阳
朱德明
李坤
李冬梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GCI Science and Technology Co Ltd
Original Assignee
GCI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GCI Science and Technology Co Ltd filed Critical GCI Science and Technology Co Ltd
Priority to CN201711088425.2A priority Critical patent/CN107766565A/en
Publication of CN107766565A publication Critical patent/CN107766565A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of conversational character differentiating method, methods described includes:Obtain all words of the dialogue of dialogue to be analyzed;All words for being analysed to the dialogue of dialogue respectively according to the term vector model pre-established are converted to term vector, to obtain all term vectors of the dialogue of dialogue to be analyzed;The label of conversational character corresponding with the dialogue of dialogue to be analyzed is obtained according to all term vectors of the dialogue of dialogue to be analyzed and the conversational character discrimination model established previously according to the first corpus;Wherein, the first corpus includes the dialogue language material of dialogue art to be analyzed;The dialogue of first corpus includes the label of more standard dialogues and the conversational character corresponding to each sentence standard dialogue;Conversational character in the dialogue to be analyzed is distinguished according to the label of conversational character corresponding to all dialogues of the dialogue to be analyzed recognized.The conversational character differentiating method of the present invention realizes the differentiation of conversational character, while present invention also offers a kind of conversational character compartment system.

Description

Conversational character differentiating method and system
Technical field
The present invention relates to technical field of data processing, more particularly to conversational character differentiating method and system.
Background technology
Usual conversation content can be related to more than two conversational characters, be needed in some occasions for some conversational character Dialogue is analyzed, therefore, it is necessary to conversational character is made a distinction.
Traditional conversational character differentiating method mainly distinguishes speaker's identity by identifying the vocal print feature of speaker, and The text of conversation content is entered by row label according to speaker's identity when voice is converted into text.
Inventor is in implementing the present invention, it may, have found that existing conversational character differentiating method has as a drawback that:
Existing conversational character differentiating method needs to be acquired the vocal print feature of different people, and it is easily by the body of different people The influence of the factors such as body situation, age, mood and the interference of environmental noise, in addition, people in the case of speaker is mixed Vocal print feature is not easy to extract, and causes conversational character differentiating method to realize that difficulty is big, accuracy is low.
The content of the invention
The present invention proposes conversational character differentiating method and system, realizes the differentiation of conversational character, improves accuracy.
One aspect of the present invention provides a kind of conversational character differentiating method, and methods described includes:
Obtain all words of the dialogue of dialogue to be analyzed;
All words of the dialogue of the dialogue to be analyzed are converted to by word according to the term vector model pre-established respectively Vector, to obtain all term vectors of the dialogue of the dialogue to be analyzed;
According to all term vectors of the dialogue of the dialogue to be analyzed and the dialogue angle established previously according to the first corpus Color discrimination model obtains the label of conversational character corresponding with the dialogue of the dialogue to be analyzed;Wherein, first corpus Include the dialogue language material of the dialogue art to be analyzed;The dialogue language material of first corpus includes more standard dialogues With the label of the conversational character corresponding to each sentence standard dialogue;
Treated point according to being distinguished the label of conversational character corresponding to all dialogues of the dialogue to be analyzed recognized Conversational character in analysis dialogue.
In a kind of optional embodiment, methods described also includes:
Instruction in response to establishing conversational character discrimination model, all standard dialogues of first corpus are carried out in advance Processing, to obtain all words of each sentence standard dialogue;
All words of each sentence standard dialogue are converted to by term vector according to the term vector model respectively, to obtain All term vectors of each sentence standard dialogue;
Based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
According to all term vectors of each sentence standard dialogue and the dialogue angle corresponding to each sentence standard dialogue The label of color is trained to the deep-cycle neural network model, to obtain the differentiation of the conversational character of first corpus Model.
In a kind of optional embodiment, methods described also includes:
Instruction in response to establishing term vector model, obtain the second corpus;Wherein, second corpus includes multiple The language material in field;
All texts of second corpus are segmented, stop words is removed and goes distracter to handle, it is described to obtain All words of second corpus;
Based on word2vec algorithms, the term vector model is established according to all words of second corpus.
In a kind of optional embodiment, the instruction in response to establishing conversational character discrimination model, to described the All standard dialogues of one corpus are pre-processed, to obtain all words of each sentence standard dialogue, including:
Instruction in response to establishing conversational character discrimination model, according to belonging to stammerer Words partition system and the dialogue to be analyzed The vocabulary in field segments to all standard dialogues of first corpus respectively, to obtain each sentence standard dialogue All words to be selected;
Stop words is rejected in all words to be selected of each sentence standard dialogue, to obtain each sentence standard dialogue All words.
In a kind of optional embodiment, described rejected in all words to be selected of each sentence standard dialogue disables Word, to obtain all words of each sentence standard dialogue, including:
Stop words is rejected in all words to be selected of each sentence standard dialogue according to vocabulary is disabled, to obtain each sentence institute State all words of standard dialogue.
Another aspect of the present invention also provides a kind of conversational character compartment system, and the system includes:
First acquisition module, all words of the dialogue for obtaining dialogue to be analyzed;
Characteristic matching module, for according to the term vector model that pre-establishes respectively by the dialogue of the dialogue to be analyzed All words are converted to term vector, to obtain all term vectors of the dialogue of the dialogue to be analyzed;
Label acquisition module, for all term vectors of the dialogue according to the dialogue to be analyzed and previously according to the first language The conversational character discrimination model for expecting to establish in storehouse obtains the label of conversational character corresponding with the dialogue of the dialogue to be analyzed;Its In, first corpus includes the dialogue language material of the dialogue art to be analyzed;First corpus to language Material includes the label of more standard dialogues and the conversational character corresponding to each sentence standard dialogue;
Role's discriminating module, for conversational character corresponding to all dialogues according to the dialogue to be analyzed recognized Label distinguishes the conversational character in the dialogue to be analyzed.
As it is highly preferred that the system also includes:
First pretreatment module, for the instruction in response to establishing conversational character discrimination model, to first corpus All standard dialogues pre-processed, to obtain all words of each sentence standard dialogue;
First term vector module, for according to the term vector model respectively by all words of each sentence standard dialogue Term vector is converted to, to obtain all term vectors of each sentence standard dialogue;
Model construction module, for based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
Model training module, for all term vectors according to each sentence standard dialogue and it is described correspond to each sentence described in The label of the conversational character of standard dialogue is trained to the deep-cycle neural network model, to obtain first language material The conversational character discrimination model in storehouse.
In a kind of optional embodiment, the system also includes:
Second corpus acquisition module, for the instruction in response to establishing term vector model, obtain the second corpus;Its In, second corpus includes the language material of multiple fields;
Second pretreatment module, for being segmented to all texts of second corpus, removing stop words and go to do Item processing is disturbed, to obtain all words of second corpus;
Term vector model building module, for based on word2vec algorithms, according to all words of second corpus Establish the term vector model.
In a kind of optional embodiment, first pretreatment module includes:
Participle unit, for the instruction in response to establishing conversational character discrimination model, according to stammerer Words partition system and described The vocabulary of dialogue art to be analyzed segments to all standard dialogues of first corpus respectively, each to obtain All words to be selected of the sentence standard dialogue;
Stop words unit is removed, for rejecting stop words in all words to be selected of each sentence standard dialogue, to obtain All words of each sentence standard dialogue.
It is described to go stop words unit to include in a kind of optional embodiment:
Stop words subelement is removed, for being rejected according to deactivation vocabulary in all words to be selected of each sentence standard dialogue Stop words, to obtain all words of each sentence standard dialogue.
Compared to prior art, the present invention has beneficial effect prominent as follows:The invention provides a kind of conversational character Differentiating method and system, wherein method include:Obtain all words of the dialogue of dialogue to be analyzed;According to the word pre-established to All words of the dialogue of the dialogue to be analyzed are converted to term vector by amount model respectively, to obtain the dialogue to be analyzed All term vectors of dialogue;Established according to all term vectors of the dialogue of the dialogue to be analyzed and previously according to the first corpus Conversational character discrimination model obtain the label of corresponding with the dialogue of the dialogue to be analyzed conversational character;Wherein, described One corpus includes the dialogue language material of the dialogue art to be analyzed;The dialogue language material of first corpus includes more The label of standard dialogue and conversational character corresponding to each sentence standard dialogue;According to the dialogue to be analyzed recognized The label of conversational character corresponding to all dialogues distinguishes the conversational character in the dialogue to be analyzed.Dialogue angle provided by the invention Color differentiating method and system, it is semantic similar between word and word so as to obtain by the way that all words are converted into term vector Property, then combined with conversational character discrimination model to obtain the label of conversational character, improve the accuracy that conversational character is distinguished;Pass through The conversational character discrimination model established previously according to the first corpus, come relative to the method that conversational character is judged using keyword Say, can more identify the label of conversational character exactly, so as to improve the discriminant accuracy of conversational character.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the first embodiment of conversational character differentiating method provided by the invention;
Fig. 2 is the structural representation of the first embodiment of conversational character compartment system provided by the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.
It is the schematic flow sheet of the first embodiment of conversational character differentiating method provided by the invention referring to Fig. 1, the side Method includes:
S101, obtain all words of the dialogue of dialogue to be analyzed;
S102, all words of the dialogue of the dialogue to be analyzed are changed respectively according to the term vector model pre-established For term vector, to obtain all term vectors of the dialogue of the dialogue to be analyzed;
S103, according to all term vectors of the dialogue of the dialogue to be analyzed and pair established previously according to the first corpus Talk about the label that role's discrimination model obtains conversational character corresponding with the dialogue of the dialogue to be analyzed;Wherein, first language Expect that storehouse includes the dialogue language material of the dialogue art to be analyzed;The dialogue language material of first corpus includes more standards The label of dialogue and conversational character corresponding to each sentence standard dialogue;
S104, according to being distinguished the label of conversational character corresponding to all dialogues of the dialogue to be analyzed recognized Conversational character in dialogue to be analyzed
It should be noted that the field refers to application field, such as being related to treating point for air-conditioning sales applications field Analysis dialogue, the language material of the first language material library storage are the language material in air-conditioning sales applications field, in actual applications, can by news, The carrier of characters such as magazine, webpage obtain the dialogue language material related to air-conditioning sales applications field.The first language material library storage It should include all dialogue angles of the dialogue of the dialogue to be analyzed corresponding to the label of the conversational character of each sentence standard dialogue The label of color;If for example, the conversational character of the dialogue to be analyzed includes customer service and client, the dialogue of the first language material library storage Language material is customer service and the language material of client, and the label corresponding to the conversational character of the standard dialogue is customer service or the label of client.
I.e. because term vector can represent semantic, therefore, by the way that all words are converted into term vector, not only reduce and calculate The intractability of machine, and the similitude between word and word is obtained, then combined with conversational character discrimination model to obtain The label of conversational character, improve the accuracy that conversational character is distinguished;Pass through the conversational character established previously according to the first corpus Discrimination model, for the method for conversational character is judged using keyword, the label of conversational character can be more identified exactly, So as to improve the discriminant accuracy of conversational character.
In a kind of optional embodiment, methods described also includes:In all words for the dialogue for obtaining dialogue to be analyzed Before language, dialogic voice is obtained;The dialogic voice is converted into text, to obtain the dialogue to be analyzed.
In a kind of optional embodiment, methods described also includes:
Instruction in response to establishing conversational character discrimination model, all standard dialogues of first corpus are carried out in advance Processing, to obtain all words of each sentence standard dialogue;
All words of each sentence standard dialogue are converted to by term vector according to the term vector model respectively, to obtain All term vectors of each sentence standard dialogue;
Based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
According to all term vectors of each sentence standard dialogue and the dialogue angle corresponding to each sentence standard dialogue The label of color is trained to the deep-cycle neural network model, to obtain the differentiation of the conversational character of first corpus Model.
I.e. by being trained to the deep-cycle neural network model built based on long memory network algorithm in short-term, overcome Conventional method feature extraction deficiency, the defects of understandability is weak, so that the conversational character discrimination model established has more High differentiation accuracy rate.
In a kind of optional embodiment, all standard dialogues to first corpus pre-process, To obtain all words of each sentence standard dialogue, including:
All dialogue language materials of first corpus are segmented, replace unusual word, to obtain each sentence standard All words of dialogue.
In a kind of optional embodiment, methods described also includes:
Instruction in response to establishing term vector model, obtain the second corpus;Wherein, second corpus includes multiple The language material in field;
All texts of second corpus are segmented, stop words is removed and goes distracter to handle, it is described to obtain All words of second corpus;
Based on word2vec (word to vector words steering volume) algorithm, according to all words of second corpus Language establishes the term vector model.
Term vector model is established by word2vec algorithms, the spy of word due to word2vec algorithm high efficiency extractions , can be by term vector mould without retaining stop words when levying, therefore all texts of second corpus being pre-processed Type accurately represents the semantic relation with word, reduces data volume, improves the accuracy and efficiency of conversational character differentiation;Pass through Second corpus establishes term vector model, and establishes conversational character discrimination model by the first corpus, is easy to use difference Corpus training is separated and independently performed to term vector model and conversational character discrimination model so that the former need not be by the latter Constraint so that the term vector model of foundation has preferable durability.
In a kind of optional embodiment, the instruction in response to establishing conversational character discrimination model, to described the All standard dialogues of one corpus are pre-processed, to obtain all words of each sentence standard dialogue, including:
Instruction in response to establishing conversational character discrimination model, according to belonging to stammerer Words partition system and the dialogue to be analyzed The vocabulary in field segments to all standard dialogues of first corpus respectively, to obtain each sentence standard dialogue All words to be selected;
Stop words is rejected in all words to be selected of each sentence standard dialogue, to obtain each sentence standard dialogue All words.
Segmented by the vocabulary of stammer Words partition system and the dialogue art to be analyzed, be easy to quick standard Really obtain all words to be selected of each sentence standard dialogue, then by rejecting stop words, avoid by text analyzing almost Inoperative word substitutes into Data processing, reduces data volume on the premise of valid data are ensured, improves dialogue angle The accuracy and efficiency that color is distinguished.
In a kind of optional embodiment, described rejected in all words to be selected of each sentence standard dialogue disables Word, to obtain all words of each sentence standard dialogue, including:
Stop words is rejected in all words to be selected of each sentence standard dialogue according to vocabulary is disabled, to obtain each sentence institute State all words of standard dialogue.
Stop words is quickly rejected by disabling vocabulary, improves the efficiency that conversational character is distinguished.
Referring to Fig. 2, it is the structural representation of the first embodiment of conversational character compartment system provided by the invention, described System includes:
First acquisition module 201, all words of the dialogue for obtaining dialogue to be analyzed;
Characteristic matching module 202, for according to the term vector model that pre-establishes respectively by pair of the dialogue to be analyzed White all words are converted to term vector, to obtain all term vectors of the dialogue of the dialogue to be analyzed;
Label acquisition module 203, for all term vectors of the dialogue according to the dialogue to be analyzed and previously according to The conversational character discrimination model that one corpus is established obtains the label of conversational character corresponding with the dialogue of the dialogue to be analyzed; Wherein, first corpus includes the dialogue language material of the dialogue art to be analyzed;The dialogue of first corpus Language material includes the label of more standard dialogues and the conversational character corresponding to each sentence standard dialogue;
Role's discriminating module 204, for talking with angle corresponding to all dialogues according to the dialogue to be analyzed recognized The label of color distinguishes the conversational character in the dialogue to be analyzed.
It should be noted that the field refers to application field, such as being related to treating point for air-conditioning sales applications field Analysis dialogue, the language material of the first language material library storage are the language material in air-conditioning sales applications field, in actual applications, can by news, The carrier of characters such as magazine, webpage obtain the dialogue language material related to air-conditioning sales applications field.The first language material library storage It should include all dialogue angles of the dialogue of the dialogue to be analyzed corresponding to the label of the conversational character of each sentence standard dialogue The label of color;If for example, the conversational character of the dialogue to be analyzed includes customer service and client, the dialogue of the first language material library storage Language material is customer service and the language material of client, and the label corresponding to the conversational character of the standard dialogue is customer service or the label of client.
I.e. because term vector can represent semantic, therefore, by the way that all words are converted into term vector, not only reduce and calculate The intractability of machine, and the similitude between word and word is obtained, then combined with conversational character discrimination model to obtain The label of conversational character, improve the accuracy that conversational character is distinguished;Pass through the conversational character established previously according to the first corpus Discrimination model, for the method for conversational character is judged using keyword, the label of conversational character can be more identified exactly, So as to improve the discriminant accuracy of conversational character.
In a kind of optional embodiment, the system also includes:Dialogic voice acquisition module, for treating point obtaining Before all words for analysing the dialogue of dialogue, dialogic voice is obtained;Voice conversion module, for the dialogic voice to be converted into Text, to obtain the dialogue to be analyzed.
In a kind of optional embodiment, the system also includes:
First pretreatment module, for the instruction in response to establishing conversational character discrimination model, to first corpus All standard dialogues pre-processed, to obtain all words of each sentence standard dialogue;
First term vector module, for according to the term vector model respectively by all words of each sentence standard dialogue Term vector is converted to, to obtain all term vectors of each sentence standard dialogue;
Model construction module, for based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
Model training module, for all term vectors according to each sentence standard dialogue and it is described correspond to each sentence described in The label of the conversational character of standard dialogue is trained to the deep-cycle neural network model, to obtain first language material The conversational character discrimination model in storehouse.
I.e. by being trained to the deep-cycle neural network model built based on long memory network algorithm in short-term, overcome Conventional method feature extraction deficiency, the defects of understandability is weak, so that the conversational character discrimination model established has more High differentiation accuracy rate.
In a kind of optional embodiment, the system also includes:
Second corpus acquisition module, for the instruction in response to establishing term vector model, obtain the second corpus;Its In, second corpus includes the language material of multiple fields;
Second pretreatment module, for being segmented to all texts of second corpus, removing stop words and go to do Item processing is disturbed, to obtain all words of second corpus;
Term vector model building module, for based on word2vec algorithms, according to all words of second corpus Establish the term vector model.
Term vector model is established by word2vec algorithms, the spy of word due to word2vec algorithm high efficiency extractions , can be by term vector mould without retaining stop words when levying, therefore all texts of second corpus being pre-processed Type accurately represents the semantic relation with word, reduces data volume, improves the accuracy and efficiency of conversational character differentiation;Pass through Second corpus establishes term vector model, and establishes conversational character discrimination model by the first corpus, is easy to use difference Corpus training is separated and independently performed to term vector model and conversational character discrimination model so that the former need not be by the latter Constraint so that the term vector model of foundation has preferable durability.
In a kind of optional embodiment, first pretreatment module includes:
Participle unit, for the instruction in response to establishing conversational character discrimination model, according to stammerer Words partition system and described The vocabulary of dialogue art to be analyzed segments to all standard dialogues of first corpus respectively, each to obtain All words to be selected of the sentence standard dialogue;
Stop words unit is removed, for rejecting stop words in all words to be selected of each sentence standard dialogue, to obtain All words of each sentence standard dialogue.
Segmented by the vocabulary of stammer Words partition system and the dialogue art to be analyzed, be easy to quick standard Really obtain all words to be selected of each sentence standard dialogue, then by rejecting stop words, avoid by text analyzing almost Inoperative word substitutes into Data processing, reduces data volume on the premise of valid data are ensured, improves dialogue angle The accuracy and efficiency that color is distinguished.
It is described to go stop words unit to include in a kind of optional embodiment:
Stop words subelement is removed, for being rejected according to deactivation vocabulary in all words to be selected of each sentence standard dialogue Stop words, to obtain all words of each sentence standard dialogue.
Stop words is quickly rejected by disabling vocabulary, improves the efficiency that conversational character is distinguished.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with The hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory medium In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Described storage medium can be magnetic disc, light Disk, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
Described above is the preferred embodiment of the present invention, it is noted that for those skilled in the art For, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (10)

1. a kind of conversational character differentiating method, it is characterised in that methods described includes:
Obtain all words of the dialogue of dialogue to be analyzed;
All words of the dialogue of the dialogue to be analyzed are converted to by term vector according to the term vector model pre-established respectively, To obtain all term vectors of the dialogue of the dialogue to be analyzed;
Sentenced according to all term vectors of the dialogue of the dialogue to be analyzed and the conversational character established previously according to the first corpus Other model obtains the label of conversational character corresponding with the dialogue of the dialogue to be analyzed;Wherein, first corpus includes The dialogue language material of the dialogue art to be analyzed;The dialogue language material of first corpus includes more standard dialogues and right Should be in the label of the conversational character of each sentence standard dialogue;
It is described to be analyzed right to be distinguished according to the label of conversational character corresponding to all dialogues of the dialogue to be analyzed recognized Conversational character in words.
2. conversational character differentiating method as claimed in claim 1, it is characterised in that methods described also includes:
Instruction in response to establishing conversational character discrimination model, all standard dialogues of first corpus are located in advance Reason, to obtain all words of each sentence standard dialogue;
All words of each sentence standard dialogue are converted to by term vector according to the term vector model respectively, to obtain each sentence All term vectors of the standard dialogue;
Based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
According to all term vectors of each sentence standard dialogue and the conversational character corresponding to each sentence standard dialogue Label is trained to the deep-cycle neural network model, differentiates mould to obtain the conversational character of first corpus Type.
3. conversational character differentiating method as claimed in claim 2, it is characterised in that methods described also includes:
Instruction in response to establishing term vector model, obtain the second corpus;Wherein, second corpus includes multiple fields Language material;
All texts of second corpus are segmented, stop words is removed and goes distracter to handle, to obtain described second All words of corpus;
Based on word2vec algorithms, the term vector model is established according to all words of second corpus.
4. conversational character differentiating method as claimed in claim 2, it is characterised in that described to differentiate in response to establishing conversational character The instruction of model, all standard dialogues of first corpus are pre-processed, to obtain each sentence standard dialogue All words, including:
Instruction in response to establishing conversational character discrimination model, according to stammerer Words partition system and the dialogue art to be analyzed Vocabulary all standard dialogues of first corpus are segmented respectively, to obtain the institute of each sentence standard dialogue There is word to be selected;
Stop words is rejected in all words to be selected of each sentence standard dialogue, to obtain all of each sentence standard dialogue Word.
5. conversational character differentiating method as claimed in claim 4, it is characterised in that the institute in each sentence standard dialogue Have in word to be selected and reject stop words, to obtain all words of each sentence standard dialogue, including:
Stop words is rejected in all words to be selected of each sentence standard dialogue according to vocabulary is disabled, to obtain each sentence mark All words of quasi- dialogue.
6. a kind of conversational character compartment system, it is characterised in that the system includes:
First acquisition module, all words of the dialogue for obtaining dialogue to be analyzed;
Characteristic matching module, for according to the term vector model that pre-establishes respectively by all of the dialogue of the dialogue to be analyzed Word is converted to term vector, to obtain all term vectors of the dialogue of the dialogue to be analyzed;
Label acquisition module, for all term vectors of the dialogue according to the dialogue to be analyzed and previously according to the first corpus The conversational character discrimination model of foundation obtains the label of conversational character corresponding with the dialogue of the dialogue to be analyzed;Wherein, institute State the dialogue language material that the first corpus includes the dialogue art to be analyzed;The dialogue language material of first corpus includes The label of more standard dialogues and conversational character corresponding to each sentence standard dialogue;
Role's discriminating module, the label for conversational character corresponding to all dialogues according to the dialogue to be analyzed recognized Distinguish the conversational character in the dialogue to be analyzed.
7. conversational character compartment system as claimed in claim 6, it is characterised in that the system also includes:
First pretreatment module, for the instruction in response to establishing conversational character discrimination model, to the institute of first corpus There is standard dialogue to be pre-processed, to obtain all words of each sentence standard dialogue;
First term vector module, for respectively being changed all words of each sentence standard dialogue according to the term vector model For term vector, to obtain all term vectors of each sentence standard dialogue;
Model construction module, for based on the long structure of memory network algorithm in short-term deep-cycle neural network model;
Model training module, for all term vectors according to each sentence standard dialogue and described correspond to each sentence standard The label of the conversational character of dialogue is trained to the deep-cycle neural network model, to obtain first corpus Conversational character discrimination model.
8. conversational character compartment system as claimed in claim 7, it is characterised in that the system also includes:
Second corpus acquisition module, for the instruction in response to establishing term vector model, obtain the second corpus;Wherein, institute Stating the second corpus includes the language material of multiple fields;
Second pretreatment module, for being segmented to all texts of second corpus, removing stop words and remove distracter Processing, to obtain all words of second corpus;
Term vector model building module, for based on word2vec algorithms, being established according to all words of second corpus The term vector model.
9. conversational character compartment system as claimed in claim 7, it is characterised in that first pretreatment module includes:
Participle unit, for the instruction in response to establishing conversational character discrimination model, according to stammerer Words partition system and described treat point The vocabulary of analysis dialogue art segments to all standard dialogues of first corpus respectively, to obtain each sentence institute State all words to be selected of standard dialogue;
Stop words unit is removed, for rejecting stop words in all words to be selected of each sentence standard dialogue, to obtain each sentence All words of the standard dialogue.
10. conversational character compartment system as claimed in claim 8, it is characterised in that described to go stop words unit to include:
Stop words subelement is removed, for rejecting deactivation in all words to be selected of each sentence standard dialogue according to deactivation vocabulary Word, to obtain all words of each sentence standard dialogue.
CN201711088425.2A 2017-11-06 2017-11-06 Conversational character differentiating method and system Pending CN107766565A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711088425.2A CN107766565A (en) 2017-11-06 2017-11-06 Conversational character differentiating method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711088425.2A CN107766565A (en) 2017-11-06 2017-11-06 Conversational character differentiating method and system

Publications (1)

Publication Number Publication Date
CN107766565A true CN107766565A (en) 2018-03-06

Family

ID=61272829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711088425.2A Pending CN107766565A (en) 2017-11-06 2017-11-06 Conversational character differentiating method and system

Country Status (1)

Country Link
CN (1) CN107766565A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270167A (en) * 2020-10-14 2021-01-26 北京百度网讯科技有限公司 Role labeling method and device, electronic equipment and storage medium
CN112270169A (en) * 2020-10-14 2021-01-26 北京百度网讯科技有限公司 Dialogue role prediction method and device, electronic equipment and storage medium
CN112270198A (en) * 2020-10-27 2021-01-26 北京百度网讯科技有限公司 Role determination method and device, electronic equipment and storage medium
CN112861509A (en) * 2021-02-08 2021-05-28 青牛智胜(深圳)科技有限公司 Role analysis method and system based on multi-head attention mechanism
CN113744742A (en) * 2020-05-29 2021-12-03 中国电信股份有限公司 Role identification method, device and system in conversation scene

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050065815A1 (en) * 2003-09-19 2005-03-24 Mazar Scott Thomas Information management system and method for an implantable medical device
CN1852354A (en) * 2005-10-17 2006-10-25 华为技术有限公司 Method and device for collecting user behavior characteristics
US20140358539A1 (en) * 2013-05-29 2014-12-04 Tencent Technology (Shenzhen) Company Limited Method and apparatus for building a language model
CN105786782A (en) * 2016-03-25 2016-07-20 北京搜狗科技发展有限公司 Word vector training method and device
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN106776572A (en) * 2016-12-27 2017-05-31 竹间智能科技(上海)有限公司 A kind of people claims recognition methods
CN107295149A (en) * 2016-03-30 2017-10-24 北京搜狗科技发展有限公司 A kind for the treatment of method and apparatus of strange phone

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050065815A1 (en) * 2003-09-19 2005-03-24 Mazar Scott Thomas Information management system and method for an implantable medical device
CN1852354A (en) * 2005-10-17 2006-10-25 华为技术有限公司 Method and device for collecting user behavior characteristics
US20140358539A1 (en) * 2013-05-29 2014-12-04 Tencent Technology (Shenzhen) Company Limited Method and apparatus for building a language model
CN105786782A (en) * 2016-03-25 2016-07-20 北京搜狗科技发展有限公司 Word vector training method and device
CN107295149A (en) * 2016-03-30 2017-10-24 北京搜狗科技发展有限公司 A kind for the treatment of method and apparatus of strange phone
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN106776572A (en) * 2016-12-27 2017-05-31 竹间智能科技(上海)有限公司 A kind of people claims recognition methods

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744742A (en) * 2020-05-29 2021-12-03 中国电信股份有限公司 Role identification method, device and system in conversation scene
CN113744742B (en) * 2020-05-29 2024-01-30 中国电信股份有限公司 Role identification method, device and system under dialogue scene
CN112270167A (en) * 2020-10-14 2021-01-26 北京百度网讯科技有限公司 Role labeling method and device, electronic equipment and storage medium
CN112270169A (en) * 2020-10-14 2021-01-26 北京百度网讯科技有限公司 Dialogue role prediction method and device, electronic equipment and storage medium
CN112270169B (en) * 2020-10-14 2023-07-25 北京百度网讯科技有限公司 Method and device for predicting dialogue roles, electronic equipment and storage medium
US11907671B2 (en) 2020-10-14 2024-02-20 Beijing Baidu Netcom Science Technology Co., Ltd. Role labeling method, electronic device and storage medium
CN112270198A (en) * 2020-10-27 2021-01-26 北京百度网讯科技有限公司 Role determination method and device, electronic equipment and storage medium
CN112270198B (en) * 2020-10-27 2021-08-17 北京百度网讯科技有限公司 Role determination method and device, electronic equipment and storage medium
CN112861509A (en) * 2021-02-08 2021-05-28 青牛智胜(深圳)科技有限公司 Role analysis method and system based on multi-head attention mechanism

Similar Documents

Publication Publication Date Title
CN107766565A (en) Conversational character differentiating method and system
CN107885723A (en) Conversational character differentiating method and system
CN106504768A (en) Phone testing audio frequency classification method and device based on artificial intelligence
CN104167208A (en) Speaker recognition method and device
CN110390946A (en) A kind of audio signal processing method, device, electronic equipment and storage medium
CN108648760B (en) Real-time voiceprint identification system and method
CN103365834B (en) Language Ambiguity eliminates system and method
CN105931637A (en) User-defined instruction recognition speech photographing system
CN103474061A (en) Automatic distinguishing method based on integration of classifier for Chinese dialects
Akhtiamov et al. Speech and Text Analysis for Multimodal Addressee Detection in Human-Human-Computer Interaction.
WO2023088448A1 (en) Speech processing method and device, and storage medium
Zhang et al. Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features.
CN111128128A (en) Voice keyword detection method based on complementary model scoring fusion
CN111274390B (en) Emotion cause determining method and device based on dialogue data
Zheng et al. Acoustic texttiling for story segmentation of spoken documents
Irtza et al. A hierarchical framework for language identification
Ramaiah et al. Accent detection in handwriting based on writing styles
Jokinen et al. Variation in Spoken North Sami Language.
CN113255362A (en) Method and device for filtering and identifying human voice, electronic device and storage medium
CN107480128A (en) The segmenting method and device of Chinese text
CN108563688B (en) Emotion recognition method for movie and television script characters
CN109902306A (en) A kind of audio recognition method, device, storage medium and speech ciphering equipment
CN107133226A (en) A kind of method and device for distinguishing theme
Bock et al. Assessing the efficacy of benchmarks for automatic speech accent recognition
CN115063155A (en) Data labeling method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180306