CN108597521A

CN108597521A - Audio role divides interactive system, method, terminal and the medium with identification word

Info

Publication number: CN108597521A
Application number: CN201810421520.8A
Authority: CN
Inventors: 徐涌
Original assignee: 徐涌
Current assignee: Guangzhou xinyuxinban Internet Information Service Co., Ltd
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2018-09-28

Abstract

The invention discloses the interactive systems of a kind of conversation audio role segmentation and identification word, including server and user terminal, server includes speech processing module, speech recognition character module and output module, and speech processing module is configured as playing out dialogue audio data stream to be identified；User terminal is obtained to the batch operation of speech roles and identifies that speech roles distribute；Voice data stream is marked by role；It is partitioned into the voice data stream corresponding to different role according to role's label；Speech recognition character module is configured as the voice data stream of different role being identified as text information；Output module is configured as output character information.Server is marked the differentiation of role according to user terminal, divides, the audio data stream of segmentation corresponding text information is converted into again to export, realization is split automatically to the conversation audio of different role and text conversion, quickly, efficiently and accurately realizes conversation audio role segmentation and Text region.

Description

Audio role divides interactive system, method, terminal and the medium with identification word

Technical field

The present invention relates to audio frequency identification technique fields, and in particular to a kind of friendship of conversation audio role segmentation and identification word Mutual system, method, terminal and medium.

Background technology

Existing automatic identification conversational character simultaneously carries out the technology of voice segmentation and role's ownership that there is also precision is not high Problem inevitably there is a situation where to identify and cut inaccuracy, it is also necessary to manual cutting of arranging in pairs or groups voice and distribute role come into Row accurate adjustment, the interactive mode of existing manually implemented audio segmentation are predominantly arranged starting and ending in a section audio and divide Point, then audio is intercepted out, but can not the dialogue split be carried out role's ownership automatically, and voice is switched into text simultaneously Word content.That is, needing to realize segmentation voice, the affiliated role of distribution voice and the friendship that voice is switched to word content function Mutual mode is not yet integrated at present, is operated less efficient.

Invention content

For the defects in the prior art, one of the objects of the present invention is to provide a kind of conversation audio role segmentation and knowledges The interactive system of other word, realization is split automatically to the conversation audio of different role and text conversion, quick, efficient, accurate Really realize conversation audio role segmentation and Text region.

In a first aspect, the interactive system of conversation audio role segmentation and identification word provided in an embodiment of the present invention, including Server and user terminal, the server receive the dialogue audio data stream to be identified that user terminal is sent；The server Including speech processing module, speech recognition character module and output module, the speech processing module is configured as to be identified Dialogue audio data stream plays out；User terminal is obtained to the batch operation of speech roles and identifies that speech roles distribute；It presses Voice data stream is marked in role；It is partitioned into the voice data stream corresponding to different role according to role's label；Institute's predicate Sound identification character module is configured as the voice data stream of different role being identified as text information；The output module is configured For output character information.

Optionally, the speech processing module includes voice playing module, and the voice playing module is configured as playing Dialogue audio data stream to be identified.

Optionally, the speech processing module further includes role's mark module, and role's mark module is configured as root Information is distributed according to the speech roles, role's label is carried out to the voice data stream of broadcasting, and record the corresponding sound of role's label The time point of frequency data stream.

Optionally, the speech processing module further includes voice segmentation module, voice segmentation module be configured as by The voice data stream that the voice data stream of adjacent time point is marked as different role is split processing, to adjacent time point Voice data stream is marked as the adjacent voice data stream of same role then without dividing processing, is partitioned into different role correspondence Voice data stream.

Second aspect, the exchange method of audio role segmentation and identification word provided in an embodiment of the present invention, specifically includes Following steps：

Server receives and obtains the dialogue audio data stream to be identified of user terminal transmission；

Server obtains user terminal and flows into edlin request to the dialogue audio data to be identified；

Server plays out dialogue audio data stream to be identified；

Server obtains user terminal to the batch operation of speech roles and identifies that speech roles distribute, by conversation audio number Role's label is carried out to dialogue audio data stream by role distribution according to stream, and records the corresponding audio data of role's label The time point of stream；

Server is partitioned into the voice data stream corresponding to different role according to role's label；

Voice data stream corresponding to the different role is identified and is converted to text information by server；

Server exports the text information.

Optionally, the server is partitioned into the specific side of the voice data stream corresponding to different role according to role's label Method includes：The voice data stream that the voice data stream of adjacent time point is marked as to different role is split processing, to phase The voice data stream at adjacent time point is marked as the adjacent voice data stream of same role then without dividing processing.

The third aspect, mobile terminal provided in an embodiment of the present invention, including processor, input equipment, output equipment and deposit Reservoir, the processor, input equipment, output equipment and memory are connected with each other, and the memory is for storing computer journey Sequence, the computer program include program instruction, and the processor is configured for calling described program instruction, executes above-mentioned side Method.

Fourth aspect, computer readable storage medium provided in an embodiment of the present invention, the computer program include program Instruction, described program instruction make the processor execute the above method when being executed by a processor.

Beneficial effects of the present invention：

Interactive system, method, terminal and Jie of conversation audio role segmentation and identification word provided in an embodiment of the present invention Matter obtains differentiation of the user to role by obtaining the operating interactive gesture of user on the subscriber terminal, and server is according to user The differentiation of terminal-pair role carries out role's label, segmentation to dialogue audio data stream, then the audio data stream of segmentation is converted into Corresponding text information output, realization is split automatically to the conversation audio of different role and text conversion, quick, efficient, Accurately realize conversation audio role segmentation and Text region.

Description of the drawings

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element Or part is generally identified by similar reference numeral.In attached drawing, each element or part might not be drawn according to actual ratio.

Fig. 1 shows the first reality of a kind of conversation audio role segmentation and the interactive system of identification word provided by the invention Apply the functional block diagram of example；

Fig. 2 shows the second embodiments of conversation audio role provided by the invention segmentation and the interactive system of identification word Functional block diagram；

Fig. 3 shows the first embodiment of conversation audio role segmentation and the exchange method of identification word provided by the invention Flow chart；

Fig. 4 shows the structural schematic diagram of the first embodiment of mobile terminal provided by the invention.

Specific implementation mode

The embodiment of technical solution of the present invention is described in detail below in conjunction with attached drawing.Following embodiment is only used for Clearly illustrate technical scheme of the present invention, therefore be intended only as example, and the protection of the present invention cannot be limited with this Range.

It should be noted that unless otherwise indicated, technical term or scientific terminology used in this application should be this hair The ordinary meaning that bright one of ordinary skill in the art are understood.

As shown in Figure 1, showing the interactive system of a kind of conversation audio role segmentation and identification word provided by the invention First embodiment functional block diagram, which includes server 1 and user terminal 2, and the server 1 receives user terminal 2 The dialogue audio data stream to be identified sent；The server 1 includes speech processing module 11,12 and of speech recognition character module Output module 13, the speech processing module 11 are configured as playing out dialogue audio data stream to be identified；Obtain user Terminal 2 is to the batch operations of speech roles and identifies that speech roles distribute；Voice data stream is marked by role；According to angle Color marker is partitioned into the voice data stream corresponding to different role；The speech recognition character module 12 is configured as different angles The voice data stream of color is identified as text information；The output module 13 is configured as output character information.

User terminal sends dialogue audio data stream to be identified to server, and server receives and obtains dialogue to be identified Voice data stream, conversation audio are the dialogic voice segment of two roles of A and B.User is edited by user terminal transmission and waits knowing The request of other conversation audio, server feed back conversation audio edit page, the speech processing module pair of server to user terminal Dialogue audio data stream to be identified plays out, and user judges that conversation audio role, user hear out one, judges that the words is A It says, then presses A role's control key on the user terminal voice edition page, speech processing module is by this section of video data stream Conversational character is labeled as A role, and user continues to play dialogue audio data stream, and user hears out one, judges that the words is that B is said , press B role's control key on user terminal edit page, speech processing module is by the conversational character mark of the section audio data flow It is denoted as B role, then proceedes to play, continues that role is marked according to the method described above, after conversation audio finishes, voice The audio data for being marked as different role is split by processing module, and user presses speech-to-text control key, speech recognition Voice data stream after segmentation is carried out voice and is converted to Text extraction by character module, identifies the corresponding word letter of voice Breath, output module export the text information identified.

The interactive system of the conversation audio role segmentation and identification word of the embodiment of the present invention, by obtaining user in user Operating interactive gesture in terminal obtains differentiation of the user to role, and server is according to user terminal to the differentiation of role into rower Note, segmentation, then the audio data stream of segmentation is converted into corresponding text information and is exported, it realizes automatically to the dialogue of different role Audio is split and text conversion, quickly, efficiently and accurately realizes conversation audio role segmentation and Text region.

As shown in Fig. 2, showing the of conversation audio role provided by the invention segmentation and the interactive system of identification word The functional block diagram of two embodiments, is different from the first embodiment in, and speech processing module 11 includes voice playing module 111, role's mark module 112 and voice divide module 113, and it is to be identified right that the voice playing module 111 is configured as playing Speech frequency data stream；Role's mark module 112 is configured as the audio for distributing information to broadcasting according to the speech roles Data flow carries out role's label, and records the time point of the corresponding voice data stream of role's label；Voice divides 113 quilt of module The voice data stream for being configured to the voice data stream of adjacent time point being marked as different role is split processing, to adjacent The voice data stream at time point is marked as the adjacent voice data stream of same role then without dividing processing, is partitioned into difference The corresponding voice data stream of role.

User terminal sends dialogue audio data stream to be identified to server, and server receives and obtains dialogue to be identified Voice data stream, conversation audio are the dialogic voice segment of two roles of A and B.User is edited by user terminal transmission and waits knowing The request of other conversation audio, server feed back conversation audio edit page to user terminal, and voice playing module is to be identified right Speech frequency data stream plays out, and user judges that role's ownership of conversation audio, user hear out one, judge that the words is that A is said , A role's control key is then pressed on the user terminal voice edition page, voice playing module suspends speech play, Jiao Sebiao Remember that the role of this section of video data stream is labeled as A role by module, and records user at the time point for pressing A role's control key. User continues to play dialogue audio data stream, and user hears out one, judges that the words is that B is said, in user terminal edit page On press B role's control key, voice playing module suspends speech play, and role's mark module marks the role of this section of audio data stream It is denoted as B role, and records user at the time point for pressing B role's control key.When voice segmentation module in server will be adjacent Between the voice data stream put be marked as the voice data stream of different role and be split processing, to the audio number of adjacent time point It is marked as belonging to the adjacent voice data stream of same role then without dividing processing according to stream, is partitioned into different role correspondence Voice data stream.The voice data stream of different role is identified as text information by the speech recognition character module in server； The corresponding text information of the voice data stream of each role is distributed to conversational character, output character information by output module.

As shown in figure 3, showing the of conversation audio role provided by the invention segmentation and the exchange method of identification word The flow chart of one embodiment, the interactive system of audio role segmentation and identification word of this method suitable for above-described embodiment, This method specifically includes following steps：

S1:User terminal sends dialogue audio data stream to be identified to server.

S2:Server receives and obtains the dialogue audio data stream to be identified of user terminal transmission.Conversation audio to be identified For different role dialogue audio data stream.

S3:Server obtains user terminal and flows into edlin request to dialogue audio data to be identified.

S4:Server plays out dialogue audio data stream to be identified.

S5:Server obtains user terminal to the batch operation of speech roles and identifies that speech roles distribute, will be to speech Frequency data stream is distributed by the role and carries out role's label to dialogue audio data stream, and records the corresponding audio of role's label The time point of data flow.

S6:Server is partitioned into the voice data stream corresponding to different role according to role's label.

Specifically, the voice data stream that the voice data stream of adjacent time point is marked as to different role is split place Reason, the adjacent voice data stream of same role is marked as then without dividing processing to the voice data stream of adjacent time point.

S7:Voice data stream corresponding to the different role is identified and is converted to text information by server.

S8:Server exports the text information.

The realization of this method is described in detail so that conversation audio includes the dialogic voice segment of A and B roles as an example below：

User terminal sends dialogue audio data stream to be identified to server, and server receives and obtains dialogue to be identified Voice data stream, user send the request for editing conversation audio to be identified by user terminal, and server is fed back to user terminal Conversation audio edit page, voice playing module play out dialogue audio data stream to be identified, and user judges conversation audio Role's ownership, user hears out one, judges that the words is that A is said, then press the angles A on the user terminal voice edition page Color control key, voice playing module suspend speech play, and the role of this section of video data stream is labeled as the angles A by role's mark module Color, and user is recorded at the time point for pressing A role's control key.User continues to play dialogue audio data stream, and user hears out one Sentence, judges that the words is that B is said, B role's control key is pressed on user terminal edit page, and voice playing module pause voice is broadcast It puts, the role of this section of audio data stream is labeled as B role by role's mark module, and is recorded user and pressed B role's control key Time point.The voice data stream of adjacent time point is marked as the audio of different role by the voice segmentation module in server Data flow is split processing, is marked as belonging to the adjacent tone frequency of same role to the voice data stream of adjacent time point According to stream then without dividing processing, it is partitioned into the corresponding voice data stream of different role.Speech recognition word mould in server The voice data stream of different role is identified as text information by block；Output module is by the corresponding text of the voice data stream of each role Word information distributes to conversational character, output character information.

The exchange method of the conversation audio role segmentation and identification word of the embodiment of the present invention, by obtaining user in user Operating interactive gesture in terminal obtains differentiation of the user to role, and server is according to user terminal to the differentiation of role into rower Note, segmentation, then the audio data stream of segmentation is converted into corresponding text information and is exported, it realizes automatically to the dialogue of different role Audio is split and text conversion, quickly, efficiently and accurately realizes conversation audio role segmentation and Text region.

As shown in figure 4, showing the structural schematic diagram of the first embodiment of mobile terminal provided by the invention, mobile terminal Including processor 31, input equipment 32, output equipment 33 and memory 34, the processor 31, input equipment 32, output equipment 33 and memory 34 be connected with each other, for the memory 34 for storing computer program, the computer program includes that program refers to It enables, the processor 31 is configured for calling described program instruction, the method for executing above-described embodiment description.

Mobile terminal provided in an embodiment of the present invention is obtained by obtaining the operating interactive gesture of user on the subscriber terminal Differentiation of the user to role, server are marked the differentiation of role according to user terminal, divide, then by the voice number of segmentation Corresponding text information output is changed into according to circulation, realization is split automatically to the conversation audio of different role and text conversion, Quickly, conversation audio role segmentation and Text region are efficiently and accurately realized.

The embodiments of the present invention also provide a kind of computer readable storage medium, the computer storage media is stored with Computer program, the computer program include program instruction, and described program instruction makes the processing when being executed by a processor The method that device executes above-described embodiment description.

Computer readable storage medium can be the internal storage unit of the terminal described in previous embodiment, such as terminal Hard disk or memory.The computer readable storage medium can also be the External memory equipment of the terminal, such as the terminal The plug-in type hard disk of upper outfit, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) block, flash card (Flash Card) etc..Further, the computer readable storage medium can also both include the end The internal storage unit at end also includes External memory equipment.The computer readable storage medium is for storing the computer journey Other programs needed for sequence and the terminal and data.The computer readable storage medium can be also used for temporarily storing The data that has exported or will export.

Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This A little functions are implemented in hardware or software actually, depend on the specific application and design constraint of technical solution.Specially Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not It is considered as beyond the scope of this invention.

It is apparent to those skilled in the art that for convenience of description and succinctly, the end of foregoing description The specific work process at end and unit, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.

In several embodiments provided herein, it should be understood that disclosed terminal and method, it can be by other Mode realize.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only For a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component can combine Or it is desirably integrated into another system, or some features can be ignored or not executed.In addition, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some interfaces, device or unit It connects, can also be electricity, mechanical or other form connections.

Finally illustrate, the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although with reference to compared with Good embodiment describes the invention in detail, it will be understood by those of ordinary skill in the art that, it can be to the skill of the present invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of technical solution of the present invention, should all be covered at this In the right of invention.

Claims

1. a kind of interactive system of conversation audio role segmentation and identification word, which is characterized in that whole including server and user End, the server receive the dialogue audio data stream to be identified that user terminal is sent；The server includes speech processes mould Block, speech recognition character module and output module, the speech processing module are configured as to dialogue audio data stream to be identified It plays out；User terminal is obtained to the batch operation of speech roles and identifies that speech roles distribute；By role to audio data Flow into line flag；It is partitioned into the voice data stream corresponding to different role according to role's label；The speech recognition character module It is configured as the voice data stream of different role being identified as text information；The output module is configured as output character letter Breath.

2. the interactive system of conversation audio role segmentation and identification word as described in claim 1, which is characterized in that institute's predicate Sound processing module includes voice playing module, and the voice playing module is configured as playing dialogue audio data stream to be identified.

3. the interactive system of conversation audio role segmentation and identification word as described in claim 1, which is characterized in that institute's predicate Sound processing module includes that role distributes identification module, and the role distributes identification module and is configured as obtaining user terminal to voice The batch operation of role simultaneously identifies that speech roles distribute information.

4. the interactive system of conversation audio role segmentation and identification word as claimed in claim 3, which is characterized in that institute's predicate Sound processing module further includes role's mark module, and role's mark module is configured as distributing information according to the speech roles Role's label is carried out to the voice data stream of broadcasting, and records the time point of the corresponding voice data stream of role's label.

5. the interactive system of conversation audio role segmentation and identification word as claimed in claim 4, which is characterized in that institute's predicate Sound processing module further includes voice segmentation module, and the voice segmentation module is configured as the voice data stream of adjacent time point The voice data stream for being marked as different role is split processing, is marked as to the voice data stream of adjacent time point same The adjacent voice data stream of role then without dividing processing, is partitioned into the corresponding voice data stream of different role.

6. a kind of exchange method of conversation audio role segmentation and identification word, which is characterized in that following steps are specifically included,

Server plays out dialogue audio data stream to be identified；

Server obtains user terminal to the batch operation of speech roles and identifies that speech roles distribute, by dialogue audio data stream Role's label is carried out to dialogue audio data stream by role distribution, and records the corresponding voice data stream of role's label Time point；

Server exports the text information.

7. the exchange method of audio role segmentation and identification word as claimed in claim 6, which is characterized in that the server The specific method that voice data stream corresponding to different role is partitioned into according to role's label includes：By the audio of adjacent time point The voice data stream that data flow is marked as different role is split processing, labeled to the voice data stream of adjacent time point For same role adjacent voice data stream then without dividing processing.

8. a kind of mobile terminal, including processor, input equipment, output equipment and memory, the processor, input equipment, Output equipment and memory are connected with each other, and for the memory for storing computer program, the computer program includes program Instruction, which is characterized in that the processor is configured for calling described program instruction, executes as claimed in claims 6 or 7 Method.

9. a kind of computer readable storage medium, which is characterized in that the computer storage media is stored with computer program, institute It includes program instruction to state computer program, and described program instruction makes the processor execute as right is wanted when being executed by a processor Seek the method described in 6 or 7.