CN108597521A - Audio role divides interactive system, method, terminal and the medium with identification word - Google Patents

Audio role divides interactive system, method, terminal and the medium with identification word Download PDF

Info

Publication number
CN108597521A
CN108597521A CN201810421520.8A CN201810421520A CN108597521A CN 108597521 A CN108597521 A CN 108597521A CN 201810421520 A CN201810421520 A CN 201810421520A CN 108597521 A CN108597521 A CN 108597521A
Authority
CN
China
Prior art keywords
role
data stream
voice data
module
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810421520.8A
Other languages
Chinese (zh)
Inventor
徐涌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou xinyuxinban Internet Information Service Co., Ltd
Original Assignee
徐涌
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 徐涌 filed Critical 徐涌
Priority to CN201810421520.8A priority Critical patent/CN108597521A/en
Publication of CN108597521A publication Critical patent/CN108597521A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses the interactive systems of a kind of conversation audio role segmentation and identification word, including server and user terminal, server includes speech processing module, speech recognition character module and output module, and speech processing module is configured as playing out dialogue audio data stream to be identified;User terminal is obtained to the batch operation of speech roles and identifies that speech roles distribute;Voice data stream is marked by role;It is partitioned into the voice data stream corresponding to different role according to role's label;Speech recognition character module is configured as the voice data stream of different role being identified as text information;Output module is configured as output character information.Server is marked the differentiation of role according to user terminal, divides, the audio data stream of segmentation corresponding text information is converted into again to export, realization is split automatically to the conversation audio of different role and text conversion, quickly, efficiently and accurately realizes conversation audio role segmentation and Text region.

Description

Audio role divides interactive system, method, terminal and the medium with identification word
Technical field
The present invention relates to audio frequency identification technique fields, and in particular to a kind of friendship of conversation audio role segmentation and identification word Mutual system, method, terminal and medium.
Background technology
Existing automatic identification conversational character simultaneously carries out the technology of voice segmentation and role's ownership that there is also precision is not high Problem inevitably there is a situation where to identify and cut inaccuracy, it is also necessary to manual cutting of arranging in pairs or groups voice and distribute role come into Row accurate adjustment, the interactive mode of existing manually implemented audio segmentation are predominantly arranged starting and ending in a section audio and divide Point, then audio is intercepted out, but can not the dialogue split be carried out role's ownership automatically, and voice is switched into text simultaneously Word content.That is, needing to realize segmentation voice, the affiliated role of distribution voice and the friendship that voice is switched to word content function Mutual mode is not yet integrated at present, is operated less efficient.
Invention content
For the defects in the prior art, one of the objects of the present invention is to provide a kind of conversation audio role segmentation and knowledges The interactive system of other word, realization is split automatically to the conversation audio of different role and text conversion, quick, efficient, accurate Really realize conversation audio role segmentation and Text region.
In a first aspect, the interactive system of conversation audio role segmentation and identification word provided in an embodiment of the present invention, including Server and user terminal, the server receive the dialogue audio data stream to be identified that user terminal is sent;The server Including speech processing module, speech recognition character module and output module, the speech processing module is configured as to be identified Dialogue audio data stream plays out;User terminal is obtained to the batch operation of speech roles and identifies that speech roles distribute;It presses Voice data stream is marked in role;It is partitioned into the voice data stream corresponding to different role according to role's label;Institute's predicate Sound identification character module is configured as the voice data stream of different role being identified as text information;The output module is configured For output character information.
Optionally, the speech processing module includes voice playing module, and the voice playing module is configured as playing Dialogue audio data stream to be identified.
Optionally, the speech processing module further includes role's mark module, and role's mark module is configured as root Information is distributed according to the speech roles, role's label is carried out to the voice data stream of broadcasting, and record the corresponding sound of role's label The time point of frequency data stream.
Optionally, the speech processing module further includes voice segmentation module, voice segmentation module be configured as by The voice data stream that the voice data stream of adjacent time point is marked as different role is split processing, to adjacent time point Voice data stream is marked as the adjacent voice data stream of same role then without dividing processing, is partitioned into different role correspondence Voice data stream.
Second aspect, the exchange method of audio role segmentation and identification word provided in an embodiment of the present invention, specifically includes Following steps:
Server receives and obtains the dialogue audio data stream to be identified of user terminal transmission;
Server obtains user terminal and flows into edlin request to the dialogue audio data to be identified;
Server plays out dialogue audio data stream to be identified;
Server obtains user terminal to the batch operation of speech roles and identifies that speech roles distribute, by conversation audio number Role's label is carried out to dialogue audio data stream by role distribution according to stream, and records the corresponding audio data of role's label The time point of stream;
Server is partitioned into the voice data stream corresponding to different role according to role's label;
Voice data stream corresponding to the different role is identified and is converted to text information by server;
Server exports the text information.
Optionally, the server is partitioned into the specific side of the voice data stream corresponding to different role according to role's label Method includes:The voice data stream that the voice data stream of adjacent time point is marked as to different role is split processing, to phase The voice data stream at adjacent time point is marked as the adjacent voice data stream of same role then without dividing processing.
The third aspect, mobile terminal provided in an embodiment of the present invention, including processor, input equipment, output equipment and deposit Reservoir, the processor, input equipment, output equipment and memory are connected with each other, and the memory is for storing computer journey Sequence, the computer program include program instruction, and the processor is configured for calling described program instruction, executes above-mentioned side Method.
Fourth aspect, computer readable storage medium provided in an embodiment of the present invention, the computer program include program Instruction, described program instruction make the processor execute the above method when being executed by a processor.
Beneficial effects of the present invention:
Interactive system, method, terminal and Jie of conversation audio role segmentation and identification word provided in an embodiment of the present invention Matter obtains differentiation of the user to role by obtaining the operating interactive gesture of user on the subscriber terminal, and server is according to user The differentiation of terminal-pair role carries out role's label, segmentation to dialogue audio data stream, then the audio data stream of segmentation is converted into Corresponding text information output, realization is split automatically to the conversation audio of different role and text conversion, quick, efficient, Accurately realize conversation audio role segmentation and Text region.
Description of the drawings
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element Or part is generally identified by similar reference numeral.In attached drawing, each element or part might not be drawn according to actual ratio.
Fig. 1 shows the first reality of a kind of conversation audio role segmentation and the interactive system of identification word provided by the invention Apply the functional block diagram of example;
Fig. 2 shows the second embodiments of conversation audio role provided by the invention segmentation and the interactive system of identification word Functional block diagram;
Fig. 3 shows the first embodiment of conversation audio role segmentation and the exchange method of identification word provided by the invention Flow chart;
Fig. 4 shows the structural schematic diagram of the first embodiment of mobile terminal provided by the invention.
Specific implementation mode
The embodiment of technical solution of the present invention is described in detail below in conjunction with attached drawing.Following embodiment is only used for Clearly illustrate technical scheme of the present invention, therefore be intended only as example, and the protection of the present invention cannot be limited with this Range.
It should be noted that unless otherwise indicated, technical term or scientific terminology used in this application should be this hair The ordinary meaning that bright one of ordinary skill in the art are understood.
As shown in Figure 1, showing the interactive system of a kind of conversation audio role segmentation and identification word provided by the invention First embodiment functional block diagram, which includes server 1 and user terminal 2, and the server 1 receives user terminal 2 The dialogue audio data stream to be identified sent;The server 1 includes speech processing module 11,12 and of speech recognition character module Output module 13, the speech processing module 11 are configured as playing out dialogue audio data stream to be identified;Obtain user Terminal 2 is to the batch operations of speech roles and identifies that speech roles distribute;Voice data stream is marked by role;According to angle Color marker is partitioned into the voice data stream corresponding to different role;The speech recognition character module 12 is configured as different angles The voice data stream of color is identified as text information;The output module 13 is configured as output character information.
User terminal sends dialogue audio data stream to be identified to server, and server receives and obtains dialogue to be identified Voice data stream, conversation audio are the dialogic voice segment of two roles of A and B.User is edited by user terminal transmission and waits knowing The request of other conversation audio, server feed back conversation audio edit page, the speech processing module pair of server to user terminal Dialogue audio data stream to be identified plays out, and user judges that conversation audio role, user hear out one, judges that the words is A It says, then presses A role's control key on the user terminal voice edition page, speech processing module is by this section of video data stream Conversational character is labeled as A role, and user continues to play dialogue audio data stream, and user hears out one, judges that the words is that B is said , press B role's control key on user terminal edit page, speech processing module is by the conversational character mark of the section audio data flow It is denoted as B role, then proceedes to play, continues that role is marked according to the method described above, after conversation audio finishes, voice The audio data for being marked as different role is split by processing module, and user presses speech-to-text control key, speech recognition Voice data stream after segmentation is carried out voice and is converted to Text extraction by character module, identifies the corresponding word letter of voice Breath, output module export the text information identified.
The interactive system of the conversation audio role segmentation and identification word of the embodiment of the present invention, by obtaining user in user Operating interactive gesture in terminal obtains differentiation of the user to role, and server is according to user terminal to the differentiation of role into rower Note, segmentation, then the audio data stream of segmentation is converted into corresponding text information and is exported, it realizes automatically to the dialogue of different role Audio is split and text conversion, quickly, efficiently and accurately realizes conversation audio role segmentation and Text region.
As shown in Fig. 2, showing the of conversation audio role provided by the invention segmentation and the interactive system of identification word The functional block diagram of two embodiments, is different from the first embodiment in, and speech processing module 11 includes voice playing module 111, role's mark module 112 and voice divide module 113, and it is to be identified right that the voice playing module 111 is configured as playing Speech frequency data stream;Role's mark module 112 is configured as the audio for distributing information to broadcasting according to the speech roles Data flow carries out role's label, and records the time point of the corresponding voice data stream of role's label;Voice divides 113 quilt of module The voice data stream for being configured to the voice data stream of adjacent time point being marked as different role is split processing, to adjacent The voice data stream at time point is marked as the adjacent voice data stream of same role then without dividing processing, is partitioned into difference The corresponding voice data stream of role.
User terminal sends dialogue audio data stream to be identified to server, and server receives and obtains dialogue to be identified Voice data stream, conversation audio are the dialogic voice segment of two roles of A and B.User is edited by user terminal transmission and waits knowing The request of other conversation audio, server feed back conversation audio edit page to user terminal, and voice playing module is to be identified right Speech frequency data stream plays out, and user judges that role's ownership of conversation audio, user hear out one, judge that the words is that A is said , A role's control key is then pressed on the user terminal voice edition page, voice playing module suspends speech play, Jiao Sebiao Remember that the role of this section of video data stream is labeled as A role by module, and records user at the time point for pressing A role's control key. User continues to play dialogue audio data stream, and user hears out one, judges that the words is that B is said, in user terminal edit page On press B role's control key, voice playing module suspends speech play, and role's mark module marks the role of this section of audio data stream It is denoted as B role, and records user at the time point for pressing B role's control key.When voice segmentation module in server will be adjacent Between the voice data stream put be marked as the voice data stream of different role and be split processing, to the audio number of adjacent time point It is marked as belonging to the adjacent voice data stream of same role then without dividing processing according to stream, is partitioned into different role correspondence Voice data stream.The voice data stream of different role is identified as text information by the speech recognition character module in server; The corresponding text information of the voice data stream of each role is distributed to conversational character, output character information by output module.
The interactive system of the conversation audio role segmentation and identification word of the embodiment of the present invention, by obtaining user in user Operating interactive gesture in terminal obtains differentiation of the user to role, and server is according to user terminal to the differentiation of role into rower Note, segmentation, then the audio data stream of segmentation is converted into corresponding text information and is exported, it realizes automatically to the dialogue of different role Audio is split and text conversion, quickly, efficiently and accurately realizes conversation audio role segmentation and Text region.
As shown in figure 3, showing the of conversation audio role provided by the invention segmentation and the exchange method of identification word The flow chart of one embodiment, the interactive system of audio role segmentation and identification word of this method suitable for above-described embodiment, This method specifically includes following steps:
S1:User terminal sends dialogue audio data stream to be identified to server.
S2:Server receives and obtains the dialogue audio data stream to be identified of user terminal transmission.Conversation audio to be identified For different role dialogue audio data stream.
S3:Server obtains user terminal and flows into edlin request to dialogue audio data to be identified.
S4:Server plays out dialogue audio data stream to be identified.
S5:Server obtains user terminal to the batch operation of speech roles and identifies that speech roles distribute, will be to speech Frequency data stream is distributed by the role and carries out role's label to dialogue audio data stream, and records the corresponding audio of role's label The time point of data flow.
S6:Server is partitioned into the voice data stream corresponding to different role according to role's label.
Specifically, the voice data stream that the voice data stream of adjacent time point is marked as to different role is split place Reason, the adjacent voice data stream of same role is marked as then without dividing processing to the voice data stream of adjacent time point.
S7:Voice data stream corresponding to the different role is identified and is converted to text information by server.
S8:Server exports the text information.
The realization of this method is described in detail so that conversation audio includes the dialogic voice segment of A and B roles as an example below:
User terminal sends dialogue audio data stream to be identified to server, and server receives and obtains dialogue to be identified Voice data stream, user send the request for editing conversation audio to be identified by user terminal, and server is fed back to user terminal Conversation audio edit page, voice playing module play out dialogue audio data stream to be identified, and user judges conversation audio Role's ownership, user hears out one, judges that the words is that A is said, then press the angles A on the user terminal voice edition page Color control key, voice playing module suspend speech play, and the role of this section of video data stream is labeled as the angles A by role's mark module Color, and user is recorded at the time point for pressing A role's control key.User continues to play dialogue audio data stream, and user hears out one Sentence, judges that the words is that B is said, B role's control key is pressed on user terminal edit page, and voice playing module pause voice is broadcast It puts, the role of this section of audio data stream is labeled as B role by role's mark module, and is recorded user and pressed B role's control key Time point.The voice data stream of adjacent time point is marked as the audio of different role by the voice segmentation module in server Data flow is split processing, is marked as belonging to the adjacent tone frequency of same role to the voice data stream of adjacent time point According to stream then without dividing processing, it is partitioned into the corresponding voice data stream of different role.Speech recognition word mould in server The voice data stream of different role is identified as text information by block;Output module is by the corresponding text of the voice data stream of each role Word information distributes to conversational character, output character information.
The exchange method of the conversation audio role segmentation and identification word of the embodiment of the present invention, by obtaining user in user Operating interactive gesture in terminal obtains differentiation of the user to role, and server is according to user terminal to the differentiation of role into rower Note, segmentation, then the audio data stream of segmentation is converted into corresponding text information and is exported, it realizes automatically to the dialogue of different role Audio is split and text conversion, quickly, efficiently and accurately realizes conversation audio role segmentation and Text region.
As shown in figure 4, showing the structural schematic diagram of the first embodiment of mobile terminal provided by the invention, mobile terminal Including processor 31, input equipment 32, output equipment 33 and memory 34, the processor 31, input equipment 32, output equipment 33 and memory 34 be connected with each other, for the memory 34 for storing computer program, the computer program includes that program refers to It enables, the processor 31 is configured for calling described program instruction, the method for executing above-described embodiment description.
Mobile terminal provided in an embodiment of the present invention is obtained by obtaining the operating interactive gesture of user on the subscriber terminal Differentiation of the user to role, server are marked the differentiation of role according to user terminal, divide, then by the voice number of segmentation Corresponding text information output is changed into according to circulation, realization is split automatically to the conversation audio of different role and text conversion, Quickly, conversation audio role segmentation and Text region are efficiently and accurately realized.
The embodiments of the present invention also provide a kind of computer readable storage medium, the computer storage media is stored with Computer program, the computer program include program instruction, and described program instruction makes the processing when being executed by a processor The method that device executes above-described embodiment description.
Computer readable storage medium can be the internal storage unit of the terminal described in previous embodiment, such as terminal Hard disk or memory.The computer readable storage medium can also be the External memory equipment of the terminal, such as the terminal The plug-in type hard disk of upper outfit, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) block, flash card (Flash Card) etc..Further, the computer readable storage medium can also both include the end The internal storage unit at end also includes External memory equipment.The computer readable storage medium is for storing the computer journey Other programs needed for sequence and the terminal and data.The computer readable storage medium can be also used for temporarily storing The data that has exported or will export.
Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This A little functions are implemented in hardware or software actually, depend on the specific application and design constraint of technical solution.Specially Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not It is considered as beyond the scope of this invention.
It is apparent to those skilled in the art that for convenience of description and succinctly, the end of foregoing description The specific work process at end and unit, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed terminal and method, it can be by other Mode realize.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only For a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component can combine Or it is desirably integrated into another system, or some features can be ignored or not executed.In addition, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some interfaces, device or unit It connects, can also be electricity, mechanical or other form connections.
Finally illustrate, the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although with reference to compared with Good embodiment describes the invention in detail, it will be understood by those of ordinary skill in the art that, it can be to the skill of the present invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of technical solution of the present invention, should all be covered at this In the right of invention.

Claims (9)

1. a kind of interactive system of conversation audio role segmentation and identification word, which is characterized in that whole including server and user End, the server receive the dialogue audio data stream to be identified that user terminal is sent;The server includes speech processes mould Block, speech recognition character module and output module, the speech processing module are configured as to dialogue audio data stream to be identified It plays out;User terminal is obtained to the batch operation of speech roles and identifies that speech roles distribute;By role to audio data Flow into line flag;It is partitioned into the voice data stream corresponding to different role according to role's label;The speech recognition character module It is configured as the voice data stream of different role being identified as text information;The output module is configured as output character letter Breath.
2. the interactive system of conversation audio role segmentation and identification word as described in claim 1, which is characterized in that institute's predicate Sound processing module includes voice playing module, and the voice playing module is configured as playing dialogue audio data stream to be identified.
3. the interactive system of conversation audio role segmentation and identification word as described in claim 1, which is characterized in that institute's predicate Sound processing module includes that role distributes identification module, and the role distributes identification module and is configured as obtaining user terminal to voice The batch operation of role simultaneously identifies that speech roles distribute information.
4. the interactive system of conversation audio role segmentation and identification word as claimed in claim 3, which is characterized in that institute's predicate Sound processing module further includes role's mark module, and role's mark module is configured as distributing information according to the speech roles Role's label is carried out to the voice data stream of broadcasting, and records the time point of the corresponding voice data stream of role's label.
5. the interactive system of conversation audio role segmentation and identification word as claimed in claim 4, which is characterized in that institute's predicate Sound processing module further includes voice segmentation module, and the voice segmentation module is configured as the voice data stream of adjacent time point The voice data stream for being marked as different role is split processing, is marked as to the voice data stream of adjacent time point same The adjacent voice data stream of role then without dividing processing, is partitioned into the corresponding voice data stream of different role.
6. a kind of exchange method of conversation audio role segmentation and identification word, which is characterized in that following steps are specifically included,
Server receives and obtains the dialogue audio data stream to be identified of user terminal transmission;
Server obtains user terminal and flows into edlin request to the dialogue audio data to be identified;
Server plays out dialogue audio data stream to be identified;
Server obtains user terminal to the batch operation of speech roles and identifies that speech roles distribute, by dialogue audio data stream Role's label is carried out to dialogue audio data stream by role distribution, and records the corresponding voice data stream of role's label Time point;
Server is partitioned into the voice data stream corresponding to different role according to role's label;
Voice data stream corresponding to the different role is identified and is converted to text information by server;
Server exports the text information.
7. the exchange method of audio role segmentation and identification word as claimed in claim 6, which is characterized in that the server The specific method that voice data stream corresponding to different role is partitioned into according to role's label includes:By the audio of adjacent time point The voice data stream that data flow is marked as different role is split processing, labeled to the voice data stream of adjacent time point For same role adjacent voice data stream then without dividing processing.
8. a kind of mobile terminal, including processor, input equipment, output equipment and memory, the processor, input equipment, Output equipment and memory are connected with each other, and for the memory for storing computer program, the computer program includes program Instruction, which is characterized in that the processor is configured for calling described program instruction, executes as claimed in claims 6 or 7 Method.
9. a kind of computer readable storage medium, which is characterized in that the computer storage media is stored with computer program, institute It includes program instruction to state computer program, and described program instruction makes the processor execute as right is wanted when being executed by a processor Seek the method described in 6 or 7.
CN201810421520.8A 2018-05-04 2018-05-04 Audio role divides interactive system, method, terminal and the medium with identification word Pending CN108597521A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810421520.8A CN108597521A (en) 2018-05-04 2018-05-04 Audio role divides interactive system, method, terminal and the medium with identification word

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810421520.8A CN108597521A (en) 2018-05-04 2018-05-04 Audio role divides interactive system, method, terminal and the medium with identification word

Publications (1)

Publication Number Publication Date
CN108597521A true CN108597521A (en) 2018-09-28

Family

ID=63620824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810421520.8A Pending CN108597521A (en) 2018-05-04 2018-05-04 Audio role divides interactive system, method, terminal and the medium with identification word

Country Status (1)

Country Link
CN (1) CN108597521A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110166818A (en) * 2018-11-30 2019-08-23 腾讯科技(深圳)有限公司 Wait match generation method, computer equipment and the storage medium of audio-video
CN112382288A (en) * 2020-11-11 2021-02-19 湖南常德牌水表制造有限公司 Method and system for debugging equipment by voice, computer equipment and storage medium
CN113192516A (en) * 2021-04-22 2021-07-30 平安科技(深圳)有限公司 Voice role segmentation method and device, computer equipment and storage medium
CN114339423A (en) * 2021-12-24 2022-04-12 咪咕文化科技有限公司 Short video generation method and device, computing equipment and computer readable storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770262A (en) * 2004-11-01 2006-05-10 英业达股份有限公司 Speech display system and method
CN102543063A (en) * 2011-12-07 2012-07-04 华南理工大学 Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers
CN102543080A (en) * 2010-12-24 2012-07-04 索尼公司 Audio editing system and audio editing method
CN105405439A (en) * 2015-11-04 2016-03-16 科大讯飞股份有限公司 Voice playing method and device
US20160217793A1 (en) * 2015-01-26 2016-07-28 Verint Systems Ltd. Acoustic signature building for a speaker from multiple sessions
CN105845129A (en) * 2016-03-25 2016-08-10 乐视控股(北京)有限公司 Method and system for dividing sentences in audio and automatic caption generation method and system for video files
CN105957531A (en) * 2016-04-25 2016-09-21 上海交通大学 Speech content extracting method and speech content extracting device based on cloud platform
CN106683661A (en) * 2015-11-05 2017-05-17 阿里巴巴集团控股有限公司 Role separation method and device based on voice
CN106782507A (en) * 2016-12-19 2017-05-31 平安科技(深圳)有限公司 The method and device of voice segmentation
CN106851407A (en) * 2017-01-24 2017-06-13 维沃移动通信有限公司 A kind of control method and terminal of video playback progress
CN107358945A (en) * 2017-07-26 2017-11-17 谢兵 A kind of more people's conversation audio recognition methods and system based on machine learning
CN107808659A (en) * 2017-12-02 2018-03-16 宫文峰 Intelligent sound signal type recognition system device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770262A (en) * 2004-11-01 2006-05-10 英业达股份有限公司 Speech display system and method
CN102543080A (en) * 2010-12-24 2012-07-04 索尼公司 Audio editing system and audio editing method
CN102543063A (en) * 2011-12-07 2012-07-04 华南理工大学 Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers
US20160217793A1 (en) * 2015-01-26 2016-07-28 Verint Systems Ltd. Acoustic signature building for a speaker from multiple sessions
CN105405439A (en) * 2015-11-04 2016-03-16 科大讯飞股份有限公司 Voice playing method and device
CN106683661A (en) * 2015-11-05 2017-05-17 阿里巴巴集团控股有限公司 Role separation method and device based on voice
CN105845129A (en) * 2016-03-25 2016-08-10 乐视控股(北京)有限公司 Method and system for dividing sentences in audio and automatic caption generation method and system for video files
CN105957531A (en) * 2016-04-25 2016-09-21 上海交通大学 Speech content extracting method and speech content extracting device based on cloud platform
CN106782507A (en) * 2016-12-19 2017-05-31 平安科技(深圳)有限公司 The method and device of voice segmentation
CN106851407A (en) * 2017-01-24 2017-06-13 维沃移动通信有限公司 A kind of control method and terminal of video playback progress
CN107358945A (en) * 2017-07-26 2017-11-17 谢兵 A kind of more people's conversation audio recognition methods and system based on machine learning
CN107808659A (en) * 2017-12-02 2018-03-16 宫文峰 Intelligent sound signal type recognition system device

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
余小清等: "一种改进型BIC话者改变检测算法 ", 《上海大学学报(自然科学版)》 *
曹洪林,李敬阳: "论声纹鉴定意见的表述形式", 《证据科学》 *
曹洪林等: "论声纹鉴定意见的表述形式 ", 《证据科学》 *
梁晓轩: "破解声音密码", 《检察风云》 *
檀蕊莲等: "说话人识别技术的研究进展", 《科技资讯》 *
郑铁然等: "基于预分割的说话人分割方法 ", 《通信学报》 *
马勇等: "说话人分割聚类研究进展 ", 《信号处理》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110166818A (en) * 2018-11-30 2019-08-23 腾讯科技(深圳)有限公司 Wait match generation method, computer equipment and the storage medium of audio-video
CN112382288A (en) * 2020-11-11 2021-02-19 湖南常德牌水表制造有限公司 Method and system for debugging equipment by voice, computer equipment and storage medium
CN112382288B (en) * 2020-11-11 2024-04-02 湖南常德牌水表制造有限公司 Method, system, computer device and storage medium for voice debugging device
CN113192516A (en) * 2021-04-22 2021-07-30 平安科技(深圳)有限公司 Voice role segmentation method and device, computer equipment and storage medium
CN113192516B (en) * 2021-04-22 2024-05-07 平安科技(深圳)有限公司 Voice character segmentation method, device, computer equipment and storage medium
CN114339423A (en) * 2021-12-24 2022-04-12 咪咕文化科技有限公司 Short video generation method and device, computing equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN108597521A (en) Audio role divides interactive system, method, terminal and the medium with identification word
US8315866B2 (en) Generating representations of group interactions
CN110166816B (en) Video editing method and system based on voice recognition for artificial intelligence education
CN105929980B (en) Method and apparatus for information input
CN103561229B (en) Meeting label generates and application process, device, system
CN105632498A (en) Method, device and system for generating conference record
CN109636345B (en) Intelligent management method and system for business handling workflow
WO2018130173A1 (en) Dubbing method, terminal device, server and storage medium
US20190221213A1 (en) Method for reducing turn around time in transcription
CN111128212A (en) Mixed voice separation method and device
CN109064532A (en) The automatic shape of the mouth as one speaks generation method of cartoon role and device
CN110610698A (en) Voice labeling method and device
CN112562677B (en) Conference voice transcription method, device, equipment and storage medium
CN112911332B (en) Method, apparatus, device and storage medium for editing video from live video stream
CN110312161A (en) A kind of video dubbing method, device and terminal device
CN111583932A (en) Sound separation method, device and equipment based on human voice model
CN111027093A (en) Access right control method and device, electronic equipment and storage medium
CN113076074B (en) Electronic blackboard writing reproduction method and system, electronic blackboard and readable medium
CN112289321B (en) Explanation synchronization video highlight processing method and device, computer equipment and medium
CN109376228A (en) A kind of information recommendation method, device, equipment and medium
CN105930323A (en) File generating method and apparatus
CN113033718B (en) Artificial intelligence data annotation task allocation method and device
EP3962073A1 (en) Online interview method and system
CN111312260A (en) Human voice separation method, device and equipment
CN110689924B (en) Knockout strategy screening method and system based on multiple knockout types

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200121

Address after: 510000 3-25-2, No. 309, Huangpu Avenue middle, Tianhe District, Guangzhou City, Guangdong Province

Applicant after: Guangzhou xinyuxinban Internet Information Service Co., Ltd

Address before: 511442 Panyu District Town, Guangzhou, Guangdong, 7 times 2001

Applicant before: Xu Yong

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180928