CN114220421A - Method and device for generating timestamp at word level, electronic equipment and storage medium - Google Patents

Method and device for generating timestamp at word level, electronic equipment and storage medium Download PDF

Info

Publication number
CN114220421A
CN114220421A CN202111547980.3A CN202111547980A CN114220421A CN 114220421 A CN114220421 A CN 114220421A CN 202111547980 A CN202111547980 A CN 202111547980A CN 114220421 A CN114220421 A CN 114220421A
Authority
CN
China
Prior art keywords
word
time corresponding
determining
probability
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111547980.3A
Other languages
Chinese (zh)
Inventor
范红亮
李轶杰
梁家恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202111547980.3A priority Critical patent/CN114220421A/en
Publication of CN114220421A publication Critical patent/CN114220421A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Telephone Function (AREA)

Abstract

The application relates to a method for generating time stamps at word level, an electronic device and a storage medium, wherein the method comprises the following steps: determining a probability peak for each word during a frame-by-frame decoding process; determining the time corresponding to the tail end point of each word according to the probability peak value of each word; determining the time corresponding to the head point of each character according to the time corresponding to the tail point of each character; and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word. According to the method, the probability peak value of each character is determined through the output score based on the deep neural network and the change rule of the score when each character is output in the decoding process, the time corresponding to the head end of each character and the time corresponding to the tail end of each character are determined according to the probability peak value of each character, the method for acquiring the word-level timestamp is provided, accurate timestamp information on the word level can be output, high-precision boundary information is obtained, and user experience is improved.

Description

Method and device for generating timestamp at word level, electronic equipment and storage medium
Technical Field
The present application relates to the field of timestamp technology, and in particular, to a method and an apparatus for generating a word-level timestamp, an electronic device, and a storage medium.
Background
The conventional kaldi-based speech recognition system can obtain boundary information of each word based on a lattice. Although the current popular end-to-end speech recognition system in the industry exceeds the traditional system in terms of recognition rate, time stamp information is not provided for many systems, or only rough time stamps are provided, such as judging word boundary information directly according to neural network scoring, and no relatively mature algorithm can obtain the time stamp information of each word at present.
Disclosure of Invention
Based on the problem that the timestamp information of each word can be obtained by a current set of relatively mature algorithms, the application provides a word-level timestamp generation method, an electronic device and a storage medium.
In a first aspect, an embodiment of the present application provides a method for generating a word-level timestamp, including:
determining a probability peak for each word during a frame-by-frame decoding process;
determining the time corresponding to the tail end point of each word according to the probability peak value of each word;
determining the time corresponding to the head point of each character according to the time corresponding to the tail point of each character;
and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word.
Further, in the above method for generating a word-level timestamp, determining a time corresponding to a tail point of each word according to a probability peak of each word includes:
comparing the probability peak value of each word with the current probability value of each word;
if the comparison result is that the difference between the probability peak value of each word and the current probability value of each word is greater than or equal to a preset threshold value;
and determining the time corresponding to the current probability value as the time corresponding to the tail end point.
Further, in the above method for generating a word-level timestamp, determining a time corresponding to a tail point of each word according to a probability peak of each word includes:
if the current character is continuously finished, the current character is a section of mute section, and the difference between the probability peak value of each character and the current probability value of each character is smaller than a preset threshold value;
and delaying the time corresponding to the probability peak value of each word by a first preset time, and determining the time corresponding to the tail end point of each word.
Further, in the above method for generating a word-level timestamp, a tail end point of each word determines a time corresponding to a head end point of each word, and the method includes:
and delaying the time corresponding to the tail end point of each character by a second preset time, and determining the time corresponding to the head end point of each character.
Further, the method for generating a word-level timestamp further includes:
and determining the time corresponding to the head point of each word according to the probability peak value of each word.
Further, in the above method for generating a word-level timestamp, determining a time corresponding to a head point of each word according to a probability peak of each word includes:
and the time corresponding to the probability peak value of each character is delayed for a first preset time, and the time corresponding to the head end of each character is determined.
Further, in the above method for generating a word-level timestamp, the probability peak is a log probability.
In a second aspect, an embodiment of the present application provides an apparatus for generating a timestamp at a word level, including:
a first determination module: determining the probability peak value of each word in the frame decoding process;
a second determination module: the time corresponding to the tail end point of each word is determined according to the probability peak value of each word;
a third determination module: the time corresponding to the head point of each word is determined according to the time corresponding to the tail point of each word;
a fourth determination module: and the time stamp is used for generating the time stamp of the word level according to the time corresponding to the head end point of each word and the time corresponding to the tail end point of each word.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor and a memory;
the processor is used for executing the generation method of the timestamp at the word level by calling the program or the instruction stored in the memory.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a program or instructions, and the program or instructions cause a computer to perform the method for generating a timestamp at a word level.
The embodiment of the application has the advantages that: the application relates to a method for generating time stamps at word level, an electronic device and a storage medium, wherein the method comprises the following steps: determining a probability peak for each word during a frame-by-frame decoding process; determining the time corresponding to the tail end point of each word according to the probability peak value of each word; determining the time corresponding to the head point of each character according to the time corresponding to the tail point of each character; and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word. According to the method, the probability peak value of each character is determined through the output score based on the deep neural network and the change rule of the score when each character is output in the decoding process, the time corresponding to the head end of each character and the time corresponding to the tail end of each character are determined according to the probability peak value of each character, the method for acquiring the word-level timestamp is provided, accurate timestamp information on the word level can be output, high-precision boundary information is obtained, and user experience is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the conventional technologies of the present application, the drawings used in the descriptions of the embodiments or the conventional technologies will be briefly introduced below, it is obvious that the drawings in the following descriptions are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a first schematic diagram illustrating a method for generating a word-level timestamp according to an embodiment of the present application;
fig. 2 is a schematic diagram illustrating a method for generating a word-level timestamp according to an embodiment of the present application;
fig. 3 is a schematic diagram three illustrating a method for generating a word-level timestamp according to an embodiment of the present application;
fig. 4 is a schematic diagram of an apparatus for generating a word-level timestamp according to an embodiment of the present application;
fig. 5 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the present application are described in detail below with reference to the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of embodiment in many different forms than that described herein and those skilled in the art will be able to make similar modifications without departing from the spirit of the application and therefore should not be limited to the specific embodiments disclosed below.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The technical background of the present application is first described below:
the neural network model of the end-to-end speech recognition engine typically outputs a matrix of T x M. Where T represents the number of frames of the audio and M represents the size of the dictionary. The element ixj in the matrix represents the probability of the model output j at time i, typically using log probability. Subsequently, on the basis of the matrix, a Decoding algorithm (such as CTC Prefix Beam Search, Time Sync Decoding, Align length Sync Decoding and the like) is used to obtain a final recognition result, and the boundary information timestamp of each word is obtained in the Decoding process.
In the process of decoding frame by frame, each frame has an optimal path, and the fraction of the optimal path is the sum of log probabilities of all the time when the path passes through. Normally, a word covers several frames, and from beginning to end, the probability of a whole path has a rough rule: from small to large and then smooth or jump to the next word. Since the information is initially small, the probability of matching this word is not very large, and as decoding time advances, it becomes more and more "like" the word, i.e., the probability increases. Subsequent probabilities may encounter the next word relatively smoothly or with a transition.
Fig. 1 is a first schematic diagram illustrating a method for generating a word-level timestamp according to an embodiment of the present application.
In a first aspect, an embodiment of the present application provides a method for generating a word-level timestamp, which, with reference to fig. 1, includes four steps S101 to S104:
s101: in the frame-by-frame decoding process, the probability peak for each word is determined.
Specifically, in the embodiment of the present application, in the frame-by-frame decoding process, the probability peak of each word is determined by determining the maximum log probability score when each word appears as the latest word.
S102: and determining the corresponding time of the tail end point of each word according to the probability peak value of each word.
Specifically, in the embodiment of the present application, after the maximum log probability score of each word is determined, the time corresponding to the tail end point of each word is determined according to the maximum log probability score of each word, and the time corresponding to the tail end point of each word is introduced in combination with specific steps below.
S103: and determining the time corresponding to the head point of each word according to the time corresponding to the tail point of each word.
Specifically, in the embodiment of the present application, after the time corresponding to the tail end point of each word is determined, the time corresponding to the tail end point of each word may be shifted forward by approximately one word time, so that the time corresponding to the head end point of each word may be determined, which is described below with reference to a specific example.
S104: and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word.
Specifically, in the embodiment of the present application, the time corresponding to the head point of each word and the time corresponding to the tail point of each word are determined, and the time stamp of each word can be determined according to the time between the head point and the tail point.
Fig. 2 is a schematic diagram illustrating a method for generating a word-level timestamp according to an embodiment of the present application.
Further, in the above method for generating a timestamp at a word level, determining a time corresponding to a tail point of each word according to a probability peak of each word, with reference to fig. 2, includes two steps S201 to S202:
s201: comparing the probability peak value of each word with the current probability value of each word;
s202: if the comparison result is that the difference between the probability peak value of each word and the current probability value of each word is greater than or equal to a preset threshold value;
s203: and determining the time corresponding to the current probability value as the time corresponding to the tail end point.
Specifically, in the embodiment of the present application, if the current word lasts for a period of time, then the next word is immediately jumped to. At the moment, a jump point of the current word needs to be found, a relative 0.1% threshold value is set by comparing the probability peak value of each word with the current probability value of each word, the probability value of the current word is compared with the probability peak value, and if the comparison result is that the difference between the probability peak value of each word and the current probability value of each word is more than or equal to the preset threshold value; and if the difference value is a preset threshold value such as 0.1%, determining that the time corresponding to the current probability value is the time corresponding to the tail end point.
Fig. 3 is a third schematic diagram of a method for generating a word-level timestamp according to an embodiment of the present application.
Further, in the above method for generating a timestamp at a word level, determining a time corresponding to a tail point of each word according to a probability peak of each word, with reference to fig. 3, includes two steps S301 to S302:
s301: if the current character is continuously finished, the current character is a section of mute section, and the difference between the probability peak value of each character and the current probability value of each character is smaller than a preset threshold value;
s302: and delaying the time corresponding to the probability peak value of each word by a first preset time, and determining the time corresponding to the tail end point of each word.
Specifically, in the embodiment of the present application, a mute segment is turned on after the current word is continuously ended. At this time, the score ratio probability peak value is reduced by no more than 0.1% of the preset threshold value and lasts for a long time, and the time of the probability peak value is delayed backwards for a first preset time, such as 120ms, which is about half of the time of a word, so that the time corresponding to the tail end point of each word is determined.
Further, in the above method for generating a word-level timestamp, a tail end point of each word determines a time corresponding to a head end point of each word, and the method includes:
and delaying the time corresponding to the tail end point of each character by a second preset time, and determining the time corresponding to the head end point of each character.
Specifically, in this embodiment of the application, after the time corresponding to the tail point of each word is determined, the time corresponding to the tail point may be further moved forward by a second preset time, such as 240ms, which is approximately the time of one word, so that the time corresponding to the head point of each word may be determined.
Further, the method for generating a word-level timestamp further includes:
and determining the time corresponding to the head point of each word according to the probability peak value of each word.
Specifically, in the embodiment of the present application, in addition to the above-described comparison between the probability peak value of each word and the current probability value of each word, the time corresponding to the head end point of each word may also be determined by the time corresponding to the probability peak value of each word.
Further, in the above method for generating a word-level timestamp, determining a time corresponding to a head point of each word according to a probability peak of each word includes:
and the time corresponding to the probability peak value of each character is delayed for a first preset time, and the time corresponding to the head end of each character is determined.
Specifically, in the embodiment of the present application, the time corresponding to the probability peak of each word is shifted forward by a first preset time of 120ms, and the time corresponding to the head point of each word is determined by the time of half a word.
Further, in the above method for generating a word-level timestamp, the probability peak is a log probability.
Specifically, in the embodiment of the present application, the neural network model of the end-to-end speech recognition engine generally outputs a matrix of T × M. Where T represents the number of frames of the audio and M represents the size of the dictionary. The element ixj in the matrix represents the probability of the model output j at time i, and typically a log probability is used, so the probability peak is the log probability.
Fig. 4 is a schematic diagram of an apparatus for generating a word-level timestamp according to an embodiment of the present application.
In a second aspect, an embodiment of the present application provides an apparatus for generating a word-level timestamp, which, in conjunction with fig. 4, includes:
the first determination module 401: for use in a frame decoding process, the probability peak for each word is determined.
Specifically, in the embodiment of the present application, in the frame-by-frame decoding process, the first determining module 401 determines the probability peak of each word, which is the maximum log probability score when each word appears as the latest word.
The second determination module 402: for determining the time corresponding to the tail point of each word from the probability peak of each word.
Specifically, in the embodiment of the present application, after determining the maximum log probability score of each word, the second determining module 402 determines the time corresponding to the tail end point of each word according to the maximum log probability score of each word, and the time corresponding to the tail end point of each word is introduced in combination with the above specific steps.
The third determination module 403: and the time corresponding to the head point of each word is determined according to the time corresponding to the tail point of each word.
Specifically, in the embodiment of the present application, after the time corresponding to the tail end point of each word is determined, the time corresponding to the tail end point of each word may be shifted forward by approximately one word time, and the third determining module 403 may determine the time corresponding to the head end point of each word, which is described above with reference to the specific example.
The fourth determination module 404: and the time stamp is used for generating the time stamp of the word level according to the time corresponding to the head end point of each word and the time corresponding to the tail end point of each word.
Specifically, in the embodiment of the present application, the time corresponding to the head point of each word and the time corresponding to the tail point of each word are determined, and the fourth determining module 404 may determine the timestamp of each word according to the time between the head point and the tail point.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor and a memory;
the processor is used for executing the generation method of the timestamp at the word level by calling the program or the instruction stored in the memory.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a program or instructions, and the program or instructions cause a computer to perform the method for generating a timestamp at a word level.
Fig. 5 is a schematic block diagram of an electronic device provided by an embodiment of the present disclosure.
As shown in fig. 5, the electronic apparatus includes: at least one processor 501, at least one memory 502, and at least one communication interface 503. The various components in the electronic device are coupled together by a bus system 504. A communication interface 503 for information transmission with an external device. It is understood that the bus system 504 is used to enable communications among the components. The bus system 504 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, the various buses are labeled as bus system 504 in fig. 5.
It will be appreciated that the memory 502 in this embodiment can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
In some embodiments, memory 502 stores elements, executable units or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system and an application program.
The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs, including various application programs such as a Media Player (Media Player), a Browser (Browser), etc., are used to implement various application services. A program for implementing any one of the word-level timestamp generation methods provided by the embodiments of the present application may be included in an application program.
In this embodiment of the present application, the processor 501 is configured to execute the steps of the embodiments of the method for generating a timestamp at a word level provided in this embodiment of the present application by calling a program or an instruction stored in the memory 502, which may be specifically a program or an instruction stored in an application program.
Determining a probability peak for each word during a frame-by-frame decoding process;
determining the time corresponding to the tail end point of each word according to the probability peak value of each word;
determining the time corresponding to the head point of each character according to the time corresponding to the tail point of each character;
and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word.
Any one of the methods for generating a word-level timestamp provided in the embodiments of the present application may be applied to the processor 501, or implemented by the processor 501. The processor 501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 501. The Processor 501 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of any one of the methods for generating a word-level timestamp provided in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software units in the hardware decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 502, and the processor 501 reads the information in the memory 502 and, in combination with its hardware, performs the steps of a method for generating a word-level time stamp.
Those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments instead of others, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments.
Those skilled in the art will appreciate that the description of each embodiment has a respective emphasis, and reference may be made to the related description of other embodiments for those parts of an embodiment that are not described in detail.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for generating a word-level timestamp, comprising:
determining a probability peak for each word during a frame-by-frame decoding process;
determining the time corresponding to the tail end point of each word according to the probability peak value of each word;
determining the time corresponding to the head point of each character according to the time corresponding to the tail point of each character;
and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word.
2. The method of claim 1, wherein said determining the time corresponding to the end point of each word according to the probability peak of each word comprises:
comparing the probability peak value of each word with the current probability value of each word;
if the comparison result is that the difference between the probability peak value of each word and the current probability value of each word is greater than or equal to a preset threshold value;
and determining that the time corresponding to the current probability value is the time corresponding to the tail end point.
3. The method of claim 1, wherein said determining the time corresponding to the end point of each word according to the probability peak of each word comprises:
if the current character is continuously finished, the current character is a section of mute section, and the difference between the probability peak value of each character and the current probability value of each character is smaller than a preset threshold value;
and delaying the time corresponding to the probability peak value of each word by a first preset time, and determining the time corresponding to the tail end point of each word.
4. The method of claim 1, wherein determining the time corresponding to the head end point of each word from the tail end point of each word comprises:
and delaying the time corresponding to the tail end point of each word by a second preset time, and determining the time corresponding to the head end point of each word.
5. The method of generating a word-level timestamp as claimed in claim 1, further comprising:
and determining the time corresponding to the head point of each word according to the probability peak value of each word.
6. The method of claim 5, wherein said determining a time corresponding to a head end of each word from said probability peak of each word comprises:
and delaying the time corresponding to the probability peak value of each word by a first preset time, and determining the time corresponding to the head end of each word.
7. The method of claim 1, wherein the probability peak is a log probability.
8. An apparatus for generating a word-level time stamp, comprising:
a first determination module: determining the probability peak value of each word in the frame decoding process;
a second determination module: the time corresponding to the tail end point of each word is determined according to the probability peak value of each word;
a third determination module: the time corresponding to the head point of each word is determined according to the time corresponding to the tail point of each word;
a fourth determination module: and the time stamp is used for generating the time stamp of the word level according to the time corresponding to the head end point of each word and the time corresponding to the tail end point of each word.
9. An electronic device, comprising: a processor and a memory;
the processor is used for executing a method for generating a word-level time stamp according to any one of claims 1 to 7 by calling a program or instructions stored in the memory.
10. A computer-readable storage medium storing a program or instructions for causing a computer to execute a method of generating a word-level time stamp according to any one of claims 1 to 7.
CN202111547980.3A 2021-12-16 2021-12-16 Method and device for generating timestamp at word level, electronic equipment and storage medium Pending CN114220421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111547980.3A CN114220421A (en) 2021-12-16 2021-12-16 Method and device for generating timestamp at word level, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111547980.3A CN114220421A (en) 2021-12-16 2021-12-16 Method and device for generating timestamp at word level, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114220421A true CN114220421A (en) 2022-03-22

Family

ID=80703487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111547980.3A Pending CN114220421A (en) 2021-12-16 2021-12-16 Method and device for generating timestamp at word level, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114220421A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115482809A (en) * 2022-09-19 2022-12-16 北京百度网讯科技有限公司 Keyword search method, keyword search device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115482809A (en) * 2022-09-19 2022-12-16 北京百度网讯科技有限公司 Keyword search method, keyword search device, electronic equipment and storage medium
CN115482809B (en) * 2022-09-19 2023-08-11 北京百度网讯科技有限公司 Keyword retrieval method, keyword retrieval device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110164435A (en) Audio recognition method, device, equipment and computer readable storage medium
CN108959388B (en) Information generation method and device
CN111435592B (en) Voice recognition method and device and terminal equipment
CN103514882A (en) Voice identification method and system
WO2023151424A1 (en) Method and apparatus for adjusting playback rate of audio picture of video
CN112861548A (en) Natural language generation and model training method, device, equipment and storage medium
CN114220421A (en) Method and device for generating timestamp at word level, electronic equipment and storage medium
US20170140751A1 (en) Method and device of speech recognition
CN110312161B (en) Video dubbing method and device and terminal equipment
CN115346517A (en) Streaming voice recognition method, device, equipment and storage medium
CN111681644B (en) Speaker segmentation method, device, equipment and storage medium
CN115174285A (en) Conference record generation method and device and electronic equipment
WO2003005343A1 (en) Fast search in speech recognition
CN110110294B (en) Dynamic reverse decoding method, device and readable storage medium
CN112397053A (en) Voice recognition method and device, electronic equipment and readable storage medium
CN111968616A (en) Training method and device of speech synthesis model, electronic equipment and storage medium
CN113268973B (en) Man-machine multi-turn conversation method and device
WO2024012040A1 (en) Method for speech generation and related device
CN113377917A (en) Multi-mode matching method and device, electronic equipment and storage medium
JP2003005787A (en) Voice recognition device and voice recognition program
CN114171003A (en) Re-scoring method and device for voice recognition system, electronic equipment and storage medium
CN118016070A (en) Speech recognition method, device, electronic equipment and storage medium
CN114155874A (en) Feature extraction method and device, electronic equipment and storage medium
CN114464173A (en) Acoustic model training method and device, electronic equipment and storage medium
CN118098219A (en) CTC-based end-to-end voice recognition model, decoding method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination