CN114220421A - Method and device for generating timestamp at word level, electronic equipment and storage medium - Google Patents
Method and device for generating timestamp at word level, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114220421A CN114220421A CN202111547980.3A CN202111547980A CN114220421A CN 114220421 A CN114220421 A CN 114220421A CN 202111547980 A CN202111547980 A CN 202111547980A CN 114220421 A CN114220421 A CN 114220421A
- Authority
- CN
- China
- Prior art keywords
- word
- time corresponding
- determining
- probability
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 230000008569 process Effects 0.000 claims abstract description 15
- 238000013528 artificial neural network Methods 0.000 abstract description 3
- 230000008859 change Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 230000003111 delayed effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G10L15/05—Word boundary detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Telephone Function (AREA)
Abstract
The application relates to a method for generating time stamps at word level, an electronic device and a storage medium, wherein the method comprises the following steps: determining a probability peak for each word during a frame-by-frame decoding process; determining the time corresponding to the tail end point of each word according to the probability peak value of each word; determining the time corresponding to the head point of each character according to the time corresponding to the tail point of each character; and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word. According to the method, the probability peak value of each character is determined through the output score based on the deep neural network and the change rule of the score when each character is output in the decoding process, the time corresponding to the head end of each character and the time corresponding to the tail end of each character are determined according to the probability peak value of each character, the method for acquiring the word-level timestamp is provided, accurate timestamp information on the word level can be output, high-precision boundary information is obtained, and user experience is improved.
Description
Technical Field
The present application relates to the field of timestamp technology, and in particular, to a method and an apparatus for generating a word-level timestamp, an electronic device, and a storage medium.
Background
The conventional kaldi-based speech recognition system can obtain boundary information of each word based on a lattice. Although the current popular end-to-end speech recognition system in the industry exceeds the traditional system in terms of recognition rate, time stamp information is not provided for many systems, or only rough time stamps are provided, such as judging word boundary information directly according to neural network scoring, and no relatively mature algorithm can obtain the time stamp information of each word at present.
Disclosure of Invention
Based on the problem that the timestamp information of each word can be obtained by a current set of relatively mature algorithms, the application provides a word-level timestamp generation method, an electronic device and a storage medium.
In a first aspect, an embodiment of the present application provides a method for generating a word-level timestamp, including:
determining a probability peak for each word during a frame-by-frame decoding process;
determining the time corresponding to the tail end point of each word according to the probability peak value of each word;
determining the time corresponding to the head point of each character according to the time corresponding to the tail point of each character;
and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word.
Further, in the above method for generating a word-level timestamp, determining a time corresponding to a tail point of each word according to a probability peak of each word includes:
comparing the probability peak value of each word with the current probability value of each word;
if the comparison result is that the difference between the probability peak value of each word and the current probability value of each word is greater than or equal to a preset threshold value;
and determining the time corresponding to the current probability value as the time corresponding to the tail end point.
Further, in the above method for generating a word-level timestamp, determining a time corresponding to a tail point of each word according to a probability peak of each word includes:
if the current character is continuously finished, the current character is a section of mute section, and the difference between the probability peak value of each character and the current probability value of each character is smaller than a preset threshold value;
and delaying the time corresponding to the probability peak value of each word by a first preset time, and determining the time corresponding to the tail end point of each word.
Further, in the above method for generating a word-level timestamp, a tail end point of each word determines a time corresponding to a head end point of each word, and the method includes:
and delaying the time corresponding to the tail end point of each character by a second preset time, and determining the time corresponding to the head end point of each character.
Further, the method for generating a word-level timestamp further includes:
and determining the time corresponding to the head point of each word according to the probability peak value of each word.
Further, in the above method for generating a word-level timestamp, determining a time corresponding to a head point of each word according to a probability peak of each word includes:
and the time corresponding to the probability peak value of each character is delayed for a first preset time, and the time corresponding to the head end of each character is determined.
Further, in the above method for generating a word-level timestamp, the probability peak is a log probability.
In a second aspect, an embodiment of the present application provides an apparatus for generating a timestamp at a word level, including:
a first determination module: determining the probability peak value of each word in the frame decoding process;
a second determination module: the time corresponding to the tail end point of each word is determined according to the probability peak value of each word;
a third determination module: the time corresponding to the head point of each word is determined according to the time corresponding to the tail point of each word;
a fourth determination module: and the time stamp is used for generating the time stamp of the word level according to the time corresponding to the head end point of each word and the time corresponding to the tail end point of each word.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor and a memory;
the processor is used for executing the generation method of the timestamp at the word level by calling the program or the instruction stored in the memory.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a program or instructions, and the program or instructions cause a computer to perform the method for generating a timestamp at a word level.
The embodiment of the application has the advantages that: the application relates to a method for generating time stamps at word level, an electronic device and a storage medium, wherein the method comprises the following steps: determining a probability peak for each word during a frame-by-frame decoding process; determining the time corresponding to the tail end point of each word according to the probability peak value of each word; determining the time corresponding to the head point of each character according to the time corresponding to the tail point of each character; and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word. According to the method, the probability peak value of each character is determined through the output score based on the deep neural network and the change rule of the score when each character is output in the decoding process, the time corresponding to the head end of each character and the time corresponding to the tail end of each character are determined according to the probability peak value of each character, the method for acquiring the word-level timestamp is provided, accurate timestamp information on the word level can be output, high-precision boundary information is obtained, and user experience is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the conventional technologies of the present application, the drawings used in the descriptions of the embodiments or the conventional technologies will be briefly introduced below, it is obvious that the drawings in the following descriptions are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a first schematic diagram illustrating a method for generating a word-level timestamp according to an embodiment of the present application;
fig. 2 is a schematic diagram illustrating a method for generating a word-level timestamp according to an embodiment of the present application;
fig. 3 is a schematic diagram three illustrating a method for generating a word-level timestamp according to an embodiment of the present application;
fig. 4 is a schematic diagram of an apparatus for generating a word-level timestamp according to an embodiment of the present application;
fig. 5 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the present application are described in detail below with reference to the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of embodiment in many different forms than that described herein and those skilled in the art will be able to make similar modifications without departing from the spirit of the application and therefore should not be limited to the specific embodiments disclosed below.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The technical background of the present application is first described below:
the neural network model of the end-to-end speech recognition engine typically outputs a matrix of T x M. Where T represents the number of frames of the audio and M represents the size of the dictionary. The element ixj in the matrix represents the probability of the model output j at time i, typically using log probability. Subsequently, on the basis of the matrix, a Decoding algorithm (such as CTC Prefix Beam Search, Time Sync Decoding, Align length Sync Decoding and the like) is used to obtain a final recognition result, and the boundary information timestamp of each word is obtained in the Decoding process.
In the process of decoding frame by frame, each frame has an optimal path, and the fraction of the optimal path is the sum of log probabilities of all the time when the path passes through. Normally, a word covers several frames, and from beginning to end, the probability of a whole path has a rough rule: from small to large and then smooth or jump to the next word. Since the information is initially small, the probability of matching this word is not very large, and as decoding time advances, it becomes more and more "like" the word, i.e., the probability increases. Subsequent probabilities may encounter the next word relatively smoothly or with a transition.
Fig. 1 is a first schematic diagram illustrating a method for generating a word-level timestamp according to an embodiment of the present application.
In a first aspect, an embodiment of the present application provides a method for generating a word-level timestamp, which, with reference to fig. 1, includes four steps S101 to S104:
s101: in the frame-by-frame decoding process, the probability peak for each word is determined.
Specifically, in the embodiment of the present application, in the frame-by-frame decoding process, the probability peak of each word is determined by determining the maximum log probability score when each word appears as the latest word.
S102: and determining the corresponding time of the tail end point of each word according to the probability peak value of each word.
Specifically, in the embodiment of the present application, after the maximum log probability score of each word is determined, the time corresponding to the tail end point of each word is determined according to the maximum log probability score of each word, and the time corresponding to the tail end point of each word is introduced in combination with specific steps below.
S103: and determining the time corresponding to the head point of each word according to the time corresponding to the tail point of each word.
Specifically, in the embodiment of the present application, after the time corresponding to the tail end point of each word is determined, the time corresponding to the tail end point of each word may be shifted forward by approximately one word time, so that the time corresponding to the head end point of each word may be determined, which is described below with reference to a specific example.
S104: and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word.
Specifically, in the embodiment of the present application, the time corresponding to the head point of each word and the time corresponding to the tail point of each word are determined, and the time stamp of each word can be determined according to the time between the head point and the tail point.
Fig. 2 is a schematic diagram illustrating a method for generating a word-level timestamp according to an embodiment of the present application.
Further, in the above method for generating a timestamp at a word level, determining a time corresponding to a tail point of each word according to a probability peak of each word, with reference to fig. 2, includes two steps S201 to S202:
s201: comparing the probability peak value of each word with the current probability value of each word;
s202: if the comparison result is that the difference between the probability peak value of each word and the current probability value of each word is greater than or equal to a preset threshold value;
s203: and determining the time corresponding to the current probability value as the time corresponding to the tail end point.
Specifically, in the embodiment of the present application, if the current word lasts for a period of time, then the next word is immediately jumped to. At the moment, a jump point of the current word needs to be found, a relative 0.1% threshold value is set by comparing the probability peak value of each word with the current probability value of each word, the probability value of the current word is compared with the probability peak value, and if the comparison result is that the difference between the probability peak value of each word and the current probability value of each word is more than or equal to the preset threshold value; and if the difference value is a preset threshold value such as 0.1%, determining that the time corresponding to the current probability value is the time corresponding to the tail end point.
Fig. 3 is a third schematic diagram of a method for generating a word-level timestamp according to an embodiment of the present application.
Further, in the above method for generating a timestamp at a word level, determining a time corresponding to a tail point of each word according to a probability peak of each word, with reference to fig. 3, includes two steps S301 to S302:
s301: if the current character is continuously finished, the current character is a section of mute section, and the difference between the probability peak value of each character and the current probability value of each character is smaller than a preset threshold value;
s302: and delaying the time corresponding to the probability peak value of each word by a first preset time, and determining the time corresponding to the tail end point of each word.
Specifically, in the embodiment of the present application, a mute segment is turned on after the current word is continuously ended. At this time, the score ratio probability peak value is reduced by no more than 0.1% of the preset threshold value and lasts for a long time, and the time of the probability peak value is delayed backwards for a first preset time, such as 120ms, which is about half of the time of a word, so that the time corresponding to the tail end point of each word is determined.
Further, in the above method for generating a word-level timestamp, a tail end point of each word determines a time corresponding to a head end point of each word, and the method includes:
and delaying the time corresponding to the tail end point of each character by a second preset time, and determining the time corresponding to the head end point of each character.
Specifically, in this embodiment of the application, after the time corresponding to the tail point of each word is determined, the time corresponding to the tail point may be further moved forward by a second preset time, such as 240ms, which is approximately the time of one word, so that the time corresponding to the head point of each word may be determined.
Further, the method for generating a word-level timestamp further includes:
and determining the time corresponding to the head point of each word according to the probability peak value of each word.
Specifically, in the embodiment of the present application, in addition to the above-described comparison between the probability peak value of each word and the current probability value of each word, the time corresponding to the head end point of each word may also be determined by the time corresponding to the probability peak value of each word.
Further, in the above method for generating a word-level timestamp, determining a time corresponding to a head point of each word according to a probability peak of each word includes:
and the time corresponding to the probability peak value of each character is delayed for a first preset time, and the time corresponding to the head end of each character is determined.
Specifically, in the embodiment of the present application, the time corresponding to the probability peak of each word is shifted forward by a first preset time of 120ms, and the time corresponding to the head point of each word is determined by the time of half a word.
Further, in the above method for generating a word-level timestamp, the probability peak is a log probability.
Specifically, in the embodiment of the present application, the neural network model of the end-to-end speech recognition engine generally outputs a matrix of T × M. Where T represents the number of frames of the audio and M represents the size of the dictionary. The element ixj in the matrix represents the probability of the model output j at time i, and typically a log probability is used, so the probability peak is the log probability.
Fig. 4 is a schematic diagram of an apparatus for generating a word-level timestamp according to an embodiment of the present application.
In a second aspect, an embodiment of the present application provides an apparatus for generating a word-level timestamp, which, in conjunction with fig. 4, includes:
the first determination module 401: for use in a frame decoding process, the probability peak for each word is determined.
Specifically, in the embodiment of the present application, in the frame-by-frame decoding process, the first determining module 401 determines the probability peak of each word, which is the maximum log probability score when each word appears as the latest word.
The second determination module 402: for determining the time corresponding to the tail point of each word from the probability peak of each word.
Specifically, in the embodiment of the present application, after determining the maximum log probability score of each word, the second determining module 402 determines the time corresponding to the tail end point of each word according to the maximum log probability score of each word, and the time corresponding to the tail end point of each word is introduced in combination with the above specific steps.
The third determination module 403: and the time corresponding to the head point of each word is determined according to the time corresponding to the tail point of each word.
Specifically, in the embodiment of the present application, after the time corresponding to the tail end point of each word is determined, the time corresponding to the tail end point of each word may be shifted forward by approximately one word time, and the third determining module 403 may determine the time corresponding to the head end point of each word, which is described above with reference to the specific example.
The fourth determination module 404: and the time stamp is used for generating the time stamp of the word level according to the time corresponding to the head end point of each word and the time corresponding to the tail end point of each word.
Specifically, in the embodiment of the present application, the time corresponding to the head point of each word and the time corresponding to the tail point of each word are determined, and the fourth determining module 404 may determine the timestamp of each word according to the time between the head point and the tail point.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor and a memory;
the processor is used for executing the generation method of the timestamp at the word level by calling the program or the instruction stored in the memory.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a program or instructions, and the program or instructions cause a computer to perform the method for generating a timestamp at a word level.
Fig. 5 is a schematic block diagram of an electronic device provided by an embodiment of the present disclosure.
As shown in fig. 5, the electronic apparatus includes: at least one processor 501, at least one memory 502, and at least one communication interface 503. The various components in the electronic device are coupled together by a bus system 504. A communication interface 503 for information transmission with an external device. It is understood that the bus system 504 is used to enable communications among the components. The bus system 504 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, the various buses are labeled as bus system 504 in fig. 5.
It will be appreciated that the memory 502 in this embodiment can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
In some embodiments, memory 502 stores elements, executable units or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system and an application program.
The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs, including various application programs such as a Media Player (Media Player), a Browser (Browser), etc., are used to implement various application services. A program for implementing any one of the word-level timestamp generation methods provided by the embodiments of the present application may be included in an application program.
In this embodiment of the present application, the processor 501 is configured to execute the steps of the embodiments of the method for generating a timestamp at a word level provided in this embodiment of the present application by calling a program or an instruction stored in the memory 502, which may be specifically a program or an instruction stored in an application program.
Determining a probability peak for each word during a frame-by-frame decoding process;
determining the time corresponding to the tail end point of each word according to the probability peak value of each word;
determining the time corresponding to the head point of each character according to the time corresponding to the tail point of each character;
and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word.
Any one of the methods for generating a word-level timestamp provided in the embodiments of the present application may be applied to the processor 501, or implemented by the processor 501. The processor 501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 501. The Processor 501 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of any one of the methods for generating a word-level timestamp provided in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software units in the hardware decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 502, and the processor 501 reads the information in the memory 502 and, in combination with its hardware, performs the steps of a method for generating a word-level time stamp.
Those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments instead of others, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments.
Those skilled in the art will appreciate that the description of each embodiment has a respective emphasis, and reference may be made to the related description of other embodiments for those parts of an embodiment that are not described in detail.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A method for generating a word-level timestamp, comprising:
determining a probability peak for each word during a frame-by-frame decoding process;
determining the time corresponding to the tail end point of each word according to the probability peak value of each word;
determining the time corresponding to the head point of each character according to the time corresponding to the tail point of each character;
and generating a time stamp of the word level according to the time corresponding to the head point of each word and the time corresponding to the tail point of each word.
2. The method of claim 1, wherein said determining the time corresponding to the end point of each word according to the probability peak of each word comprises:
comparing the probability peak value of each word with the current probability value of each word;
if the comparison result is that the difference between the probability peak value of each word and the current probability value of each word is greater than or equal to a preset threshold value;
and determining that the time corresponding to the current probability value is the time corresponding to the tail end point.
3. The method of claim 1, wherein said determining the time corresponding to the end point of each word according to the probability peak of each word comprises:
if the current character is continuously finished, the current character is a section of mute section, and the difference between the probability peak value of each character and the current probability value of each character is smaller than a preset threshold value;
and delaying the time corresponding to the probability peak value of each word by a first preset time, and determining the time corresponding to the tail end point of each word.
4. The method of claim 1, wherein determining the time corresponding to the head end point of each word from the tail end point of each word comprises:
and delaying the time corresponding to the tail end point of each word by a second preset time, and determining the time corresponding to the head end point of each word.
5. The method of generating a word-level timestamp as claimed in claim 1, further comprising:
and determining the time corresponding to the head point of each word according to the probability peak value of each word.
6. The method of claim 5, wherein said determining a time corresponding to a head end of each word from said probability peak of each word comprises:
and delaying the time corresponding to the probability peak value of each word by a first preset time, and determining the time corresponding to the head end of each word.
7. The method of claim 1, wherein the probability peak is a log probability.
8. An apparatus for generating a word-level time stamp, comprising:
a first determination module: determining the probability peak value of each word in the frame decoding process;
a second determination module: the time corresponding to the tail end point of each word is determined according to the probability peak value of each word;
a third determination module: the time corresponding to the head point of each word is determined according to the time corresponding to the tail point of each word;
a fourth determination module: and the time stamp is used for generating the time stamp of the word level according to the time corresponding to the head end point of each word and the time corresponding to the tail end point of each word.
9. An electronic device, comprising: a processor and a memory;
the processor is used for executing a method for generating a word-level time stamp according to any one of claims 1 to 7 by calling a program or instructions stored in the memory.
10. A computer-readable storage medium storing a program or instructions for causing a computer to execute a method of generating a word-level time stamp according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111547980.3A CN114220421A (en) | 2021-12-16 | 2021-12-16 | Method and device for generating timestamp at word level, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111547980.3A CN114220421A (en) | 2021-12-16 | 2021-12-16 | Method and device for generating timestamp at word level, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114220421A true CN114220421A (en) | 2022-03-22 |
Family
ID=80703487
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111547980.3A Pending CN114220421A (en) | 2021-12-16 | 2021-12-16 | Method and device for generating timestamp at word level, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114220421A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115482809A (en) * | 2022-09-19 | 2022-12-16 | 北京百度网讯科技有限公司 | Keyword search method, keyword search device, electronic equipment and storage medium |
-
2021
- 2021-12-16 CN CN202111547980.3A patent/CN114220421A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115482809A (en) * | 2022-09-19 | 2022-12-16 | 北京百度网讯科技有限公司 | Keyword search method, keyword search device, electronic equipment and storage medium |
CN115482809B (en) * | 2022-09-19 | 2023-08-11 | 北京百度网讯科技有限公司 | Keyword retrieval method, keyword retrieval device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110164435A (en) | Audio recognition method, device, equipment and computer readable storage medium | |
CN108959388B (en) | Information generation method and device | |
CN111435592B (en) | Voice recognition method and device and terminal equipment | |
CN103514882A (en) | Voice identification method and system | |
WO2023151424A1 (en) | Method and apparatus for adjusting playback rate of audio picture of video | |
CN112861548A (en) | Natural language generation and model training method, device, equipment and storage medium | |
CN114220421A (en) | Method and device for generating timestamp at word level, electronic equipment and storage medium | |
US20170140751A1 (en) | Method and device of speech recognition | |
CN110312161B (en) | Video dubbing method and device and terminal equipment | |
CN115346517A (en) | Streaming voice recognition method, device, equipment and storage medium | |
CN111681644B (en) | Speaker segmentation method, device, equipment and storage medium | |
CN115174285A (en) | Conference record generation method and device and electronic equipment | |
WO2003005343A1 (en) | Fast search in speech recognition | |
CN110110294B (en) | Dynamic reverse decoding method, device and readable storage medium | |
CN112397053A (en) | Voice recognition method and device, electronic equipment and readable storage medium | |
CN111968616A (en) | Training method and device of speech synthesis model, electronic equipment and storage medium | |
CN113268973B (en) | Man-machine multi-turn conversation method and device | |
WO2024012040A1 (en) | Method for speech generation and related device | |
CN113377917A (en) | Multi-mode matching method and device, electronic equipment and storage medium | |
JP2003005787A (en) | Voice recognition device and voice recognition program | |
CN114171003A (en) | Re-scoring method and device for voice recognition system, electronic equipment and storage medium | |
CN118016070A (en) | Speech recognition method, device, electronic equipment and storage medium | |
CN114155874A (en) | Feature extraction method and device, electronic equipment and storage medium | |
CN114464173A (en) | Acoustic model training method and device, electronic equipment and storage medium | |
CN118098219A (en) | CTC-based end-to-end voice recognition model, decoding method and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |