CN109300472A - A kind of audio recognition method, device, equipment and medium - Google Patents
A kind of audio recognition method, device, equipment and medium Download PDFInfo
- Publication number
- CN109300472A CN109300472A CN201811572238.6A CN201811572238A CN109300472A CN 109300472 A CN109300472 A CN 109300472A CN 201811572238 A CN201811572238 A CN 201811572238A CN 109300472 A CN109300472 A CN 109300472A
- Authority
- CN
- China
- Prior art keywords
- metadata
- speech recognition
- real time
- trained
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000004590 computer program Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 description 11
- 238000012545 processing Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 7
- 238000013507 mapping Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000005291 magnetic effect Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 235000019634 flavors Nutrition 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of audio recognition method, device, equipment and media.The described method includes: obtaining voice request;Speech recognition is carried out to the voice request based on preparatory trained speech recognition system, obtains the corresponding intent information of the voice request;Wherein, the trained speech recognition system in advance carries out trained in real time obtain based on the training metadata obtained in real time.The accuracy of speech recognition can be improved by using above-mentioned audio recognition method.
Description
Technical field
The present embodiments relate to technical field of voice recognition more particularly to a kind of audio recognition method, device, equipment and
Medium.
Background technique
With the development of technology of Internet of things, intelligent control has become developing direction from now on, and voice control technology is intelligence
Most important aspect is controlled, with the continuous research and development of the relevant technologies, voice control technology is applied to various electronic equipments and has been taken
Obtained initial achievements.
But the recognition accuracy of current speech recognition technology is not also high.
Summary of the invention
The embodiment of the present invention provides a kind of audio recognition method, device, equipment and medium, and language can be improved by the method
The accuracy rate of sound identification.
In a first aspect, the embodiment of the invention provides a kind of audio recognition methods, which comprises
Obtain voice request;
Speech recognition is carried out to the voice request based on preparatory trained speech recognition system, the voice is obtained and asks
Seek corresponding intent information;
Wherein, the trained speech recognition system in advance is trained in real time based on the training metadata obtained in real time
It obtains.
Second aspect, the embodiment of the invention also provides a kind of speech recognition equipment, described device includes:
Module is obtained, for obtaining voice request;
Identification module, for carrying out speech recognition to the voice request based on preparatory trained speech recognition system,
Obtain the corresponding intent information of the voice request;
Wherein, the trained speech recognition system in advance is trained in real time based on the training metadata obtained in real time
It obtains.
The third aspect, the embodiment of the invention also provides a kind of electronic equipment, which includes:
One or more processors;
Storage device, for storing multiple programs;
When at least one of the multiple program by one or more of processors execute when so that it is one or
Multiple processors realize audio recognition method provided by above-mentioned first aspect.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer
Program, the program realize audio recognition method described in above-mentioned first aspect when being executed by processor.
A kind of audio recognition method provided in an embodiment of the present invention, by based on the training metadata obtained in real time to voice
Identifying system is trained, and preparatory trained speech recognition system is obtained, due to the speech recognition system obtained under which
It is quasi- to have higher identification to the speech recognition system that each emerging metadata has carried out timely study, therefore obtained under which
Exactness carries out voice to the voice request based on the trained speech recognition system in advance when obtaining voice request
Identification, obtains the corresponding intent information of the voice request, above-mentioned speech recognition schemes improve the accuracy of speech recognition.
Detailed description of the invention
Fig. 1 is a kind of audio recognition method flow diagram that the embodiment of the present invention one provides;
Fig. 2 is a kind of trained metadata collecting process schematic that the embodiment of the present invention one provides;
Fig. 3 is a kind of structural schematic diagram of speech recognition equipment provided by Embodiment 2 of the present invention;
Fig. 4 is the hardware structural diagram for a kind of electronic equipment that the embodiment of the present invention three provides.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawing to of the invention specific real
Example is applied to be described in further detail.It is understood that specific embodiment described herein is used only for explaining the present invention,
Rather than limitation of the invention.
It also should be noted that only the parts related to the present invention are shown for ease of description, in attached drawing rather than
Full content.It should be mentioned that some exemplary embodiments are described before exemplary embodiment is discussed in greater detail
At the processing or method described as flow chart.Although operations (or step) are described as the processing of sequence by flow chart,
It is that many of these operations can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be by again
It arranges.The processing can be terminated when its operations are completed, it is also possible to have the additional step being not included in attached drawing.
The processing can correspond to method, function, regulation, subroutine, subprogram etc..
Embodiment one
Fig. 1 is a kind of audio recognition method flow diagram that the embodiment of the present invention one provides, language provided in this embodiment
Voice recognition method is suitable for the case where controlling by voice each smart machine, the smart machine for example, intelligence
Speaker, smart television, smart phone or intelligent vehicle-carried equipment etc..The audio recognition method is executed by speech recognition equipment,
Described device is generally integrated in terminal by the realization of software and/or hardware, such as intelligent sound box, smart television, smart phone
Or intelligent vehicle-carried equipment etc..Referring specifically to shown in Fig. 1, the audio recognition method includes the following steps:
Step 110 obtains voice request.
Specifically, can pick up input unit by voice obtains the voice request;Such as it is obtained by voice remote controller
The voice request that user issues TV;Or user is obtained to intelligent sound box hair by the phonetic incepting microphone of intelligent sound box
Voice request out.
The voice request is different and different according to the equipment of control, such as when the equipment of control is smart television, institute
Stating voice request can be " please be turned up/turn down volume ", " please be switched to sports channel " or " please play the pass of certain movie star protagonist
In spy war film " etc. be directed to TV functions some requests.When the equipment of control is intelligent sound box, the voice request tool
Body can be with music, the on-demand request of video, encyclopaedia, such as " please play happy birthday song " etc. asks for some of function of loudspeaker box
It asks.
Step 120 carries out speech recognition to the voice request based on preparatory trained speech recognition system, obtains institute
State the corresponding intent information of voice request.
Wherein, the trained speech recognition system in advance is trained in real time based on the training metadata obtained in real time
It obtains.With the development of the times, many foreign words with times flavour, buzzword (such as workplace little Bai, small green hand) are by people
Be widely used;And making rapid progress with movie and television contents, a large amount of new video datas can be all generated daily;Therefore, in order to mention
The accuracy of high speech recognition, speech recognition system need preferentially to the foreign word, buzzword and new video data etc.
New content is learnt.Know in view of this, how to obtain trained metadata abundant in time at speech recognition system can be improved
The key of other accuracy.
Further, the method also includes: in real time obtain training metadata, specifically include:
Web service framework, which is based on, in target network infrastructure builds trained metadata collecting platform;
By the website trained metadata collecting platform calls application interface API real time access OTT, with realization pair
The acquisition of training metadata.
Wherein, the target network infrastructure includes the network infrastructure of the offers such as Amazon, Alibaba.It is described
Web service framework for example can be with are as follows: Spring Cloud.The website OTT provides various newest videos, buzzword, external
The data service business such as language, therefore by building trained metadata collecting platform in target network infrastructure, it can be achieved that right
Newest trained metadata is targetedly collected in time, and then realizes the collection of abundant training metadata.The training member
The field information that data include includes: that program title (such as: there are sons and daughters in family, prolong auspiciousness strategy, happy base camp etc.), plot are retouched
It states, type (such as variety, describing love affairs, ancient costume etc.), performer, play staff, studio information, image poster, customer rating information, version
Information and issuing date etc..The trained metadata includes a variety of sides in one or more language and same language
Speech.
Further, pass through the trained metadata collecting platform calls application interface API real time access OTT net
It stands, to realize the acquisition to training metadata, comprising:
The website API real time access OTT is called by the trained metadata collecting platform, obtains video based on preset rules
Label information;
The metadata of the video is parsed according to the video tab information.
The video type emphasis that each Video service quotient on the website OTT provides is different, and some Video service quotient lay particular emphasis on
The service of film, TV play, variety etc.;Some Video service quotient lay particular emphasis on the service of original, information, documentary film etc..The view
Frequency marking label information includes the essential informations such as video name, video type and video link.
It is further, described to obtain video tab information based on preset rules, comprising:
Obtain the video tab information of setting quantity;
Alternatively, obtaining video tab information according to the renewal time of video, for example renewal time is obtained every time away from working as
Video tab information within the preceding one day time.
Further, the method also includes:
The training metadata obtained in real time is based on different language and generates file destination;Alternatively, the training that will be obtained in real time
Metadata is based on various regions dialect and generates file destination;The file destination is uploaded to speech recognition platforms in real time, based on real
When the training metadata that obtains speech recognition system is trained in real time.
Wherein, the file destination includes the file based at least two language, such as one is generated based on Chinese
File destination, another kind are the file destinations generated based on English;The file destination can also include raw based on the local dialect
At file.The file destination specifically can be XML file or Json file.
The file destination is uploaded to speech recognition platforms in real time, is specifically as follows:
The full release of the file destination is uploaded to speech recognition platforms in real time;
Alternatively, the difference metadata between the file destination of version is uploaded to language by the file destination of current version and before
Sound identifying platform uploads to avoid the repetition for repeating metadata, saves and upload flow, improves uploading speed;Wherein, the target
File includes file version information.
It further, can be to the conjunction of metadata in file destination when speech recognition platforms receive the file destination
Method is verified, and corresponding generate indicates that verification passes through or verifies the report of failure, to improve the quality of metadata.Specifically
Method of calibration can using CRC algorithm carry out.Speech recognition platforms can also be based further on the metadata foundation transmitted and know
Know map, and the knowledge mapping of foundation is bound in the training of speech recognition system, knowledge mapping can be speech recognition process
More context relations are introduced, for example, can release the director of the film by film title by knowledge mapping, act the leading role, and can
Further by acting the leading role other films releasing the protagonist and drilling.For example, user's request " plays the shadow that Liu Dehua fights about spy
Piece " actively recommends the films such as user's " Infernal Affairs ", " nature's mystery Fuchun Village figure " then according to the knowledge mapping of metadata.
Metadata is trained by collecting foreign word, popular word etc., and by being based on each place dialect, each languages to collection
The metadata arrived carries out conversion process, and the identification function of speech recognition system can be made to grow with each passing hour, fully understand the voice of user
Request, embodies high intelligence degree.
On the application scenarios of smart machine, commonly there is programm name or access title, general language table can not be used
It states, such as access HDMI1, YPbPr, Component, AV1 or Composite1;Programm name such as SBS1, Channel7 etc.;Net
Station name such as www.sohu.com;www.zaobao.com;Wealth net www.18.com.cn etc., by converging these metadata
Always to data collection platform, and uploads to speech recognition platforms and carry out intelligent Understanding, machine training, to improve speech recognition system
Recognition accuracy, identified by automatic language, natural language understanding and be intended to output, be conducive to improve voice control TV function
Can, accomplish " phonetic function is through ".
A kind of audio recognition method provided in this embodiment, by based on the training metadata obtained in real time to speech recognition
System is trained, and preparatory trained speech recognition system is obtained, since the speech recognition system obtained under which is to each
Emerging metadata has carried out timely study, therefore the speech recognition system obtained under which has higher identification accurately
Degree carries out voice knowledge to the voice request based on the trained speech recognition system in advance when obtaining voice request
Not, the corresponding intent information of the voice request is obtained, above-mentioned speech recognition schemes improve the accuracy of speech recognition.
Further, on the basis of the above embodiments, a kind of trained metadata collecting process signal shown in Figure 2
Figure, metadata collecting platform 210 collects video metadata from multiple movie and television contents service providers 200, and is received by exotic vocabulary
Collection program 201 is collected exotic vocabulary metadata (such as the words such as mini, taxi), right by popular word collection procedure 202
Popular word metadata (such as workplace green hand) is collected, by smart machine term collection procedure 203 to smart machine term
Metadata (such as access HDMI1, YPbPr, Component, AV1 or Composite1;Programm name such as SBS1,
Channel7 it) is collected;And exported the metadata come is collected according to setting format, generation meta data file 220 (including base
In multilingual meta data file, the meta data file based on more the local dialects), further the meta data file is uploaded to
Speech recognition platforms 300, and it is stored in the specified memory space 230 of speech recognition platforms 300, speech recognition platforms 300 are based on
The metadata of storage establishes knowledge mapping 240, and is trained in real time in conjunction with knowledge mapping 240 to speech recognition system 250, when
260 when receiving the voice request of user, the voice request is identified by trained speech recognition system 250,
Obtain the corresponding intent information 270 of the voice request.
By constructing metadata collecting platform, the video metadata that real-time collecting Video service quotient updates, and press multizone
(i.e. multi-party speech), multilingual mode, output meta data file supply speech recognition platforms carry out natural language recognition, natural language
Speech understands that realizing enables speech recognition system to be trained study based on new metadata in time, and combines metadata
Knowledge mapping is understood, thus the more accurately intention of identification user speech request, more convenient user local language reality
The access of existing program and video resource, improves the intelligence degree of speech recognition system, improves user experience.Pass through increase
Collection to exotic vocabulary, popular word, website and smart machine the machine control metadata increases speech recognition system training
The type of metadata improves the suitable application area of speech recognition system.
Embodiment two
Fig. 3 is a kind of structural schematic diagram of speech recognition equipment provided by Embodiment 2 of the present invention, shown in Figure 3, institute
Stating device includes: to obtain module 310 and identification module 320;
Wherein, module 310 is obtained, for obtaining voice request;Identification module 320, for based on preparatory trained language
Sound identifying system carries out speech recognition to the voice request, obtains the corresponding intent information of the voice request;Wherein, described
Preparatory trained speech recognition system carries out training in real time based on the training metadata obtained in real time and obtains.
Further, described device further include:
Metadata obtains module, for obtaining training metadata in real time;
Wherein, the trained metadata includes video metadata, popular word metadata, exotic vocabulary metadata, Yi Jizhi
At least one of energy equipment the machine control metadata.
Further, the metadata acquisition module includes:
Unit is built, is put down for building trained metadata collecting based on Spring Cloud in target network infrastructure
Platform;
Acquiring unit, for passing through the trained metadata collecting platform calls application interface API real time access OTT
Website, to realize the acquisition to training metadata.
Further, the acquiring unit includes:
Subelement is obtained, for calling the website API real time access OTT by the trained metadata collecting platform, is based on
Preset rules obtain video tab information;
Parsing subunit, for parsing the metadata of the video according to the video tab information.
Further, the acquisition subelement is specifically used for:
Obtain the video tab information of setting quantity;
Alternatively, the renewal time according to video obtains video tab information.
Further, described device further include:
Generation module, the training metadata for that will obtain in real time are based on different language and generate file destination;Alternatively, by real
When the training metadata that obtains be based on various regions dialect and generate file destination;
Uploading module, for the file destination to be uploaded to speech recognition platforms in real time, based on the instruction obtained in real time
Practice metadata to train speech recognition system in real time.
Further, the uploading module is specifically used for:
The full release of the file destination is uploaded to speech recognition platforms in real time;
Alternatively, the difference metadata between the file destination of version is uploaded to language by the file destination of current version and before
Sound identifying platform;
Wherein, the file destination includes file version information.
Speech recognition equipment provided in this embodiment, by based on the training metadata obtained in real time to speech recognition system
It is trained, obtains preparatory trained speech recognition system, since the speech recognition system obtained under which newly goes out to each
The language that the metadata of existing metadata and different language, different geographical dialect has carried out timely study, therefore obtained under which
Sound identifying system has higher recognition accuracy, when obtaining voice request, based on the trained speech recognition in advance
System carries out speech recognition to the voice request, obtains the corresponding intent information of the voice request, is known by above-mentioned voice
Other scheme improves the accuracy of speech recognition.
Embodiment three
Fig. 4 is the structural schematic diagram for a kind of electronic equipment that the embodiment of the present invention five provides.Fig. 4, which is shown, to be suitable for being used in fact
The block diagram of the example electronic device 12 of existing embodiment of the present invention.The electronic equipment 12 that Fig. 4 is shown is only an example, no
The function and use scope for coping with the embodiment of the present invention bring any restrictions.
As shown in figure 4, electronic equipment 12 is showed in the form of universal computing device.The component of electronic equipment 12 may include
But be not limited to: one or more processor or processing unit 16, system storage 28, connect different system components (including
System storage 28 and processing unit 16) bus 18.
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)
Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Electronic equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be electric
The usable medium that sub- equipment 12 accesses, including volatile and non-volatile media, moveable and immovable medium.
System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (RAM) 30 and/or cache memory 32.Electronic equipment 12 may further include other removable/not removable
Dynamic, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for read and write can not
Mobile, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").Although not shown in fig 4, Ke Yiti
For the disc driver for being read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to moving non-volatile light
The CD drive of disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver
It can be connected by one or more data media interfaces with bus 18.Memory 28 may include that at least one program produces
Product, the program product have one group of (such as acquisition module 310 and identification module 320 of speech recognition equipment) program module, this
A little program modules are configured to perform the function of various embodiments of the present invention.
Program with one group of (such as acquisition module 310 and identification module 320 of speech recognition equipment) program module 42/
Utility 40 can store in such as memory 28, and such program module 42 includes but is not limited to operating system, one
Or multiple application programs, other program modules and program data, each of these examples or certain combination in may
Realization including network environment.Program module 42 usually executes function and/or method in embodiment described in the invention.
Electronic equipment 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.)
Communication, can also be enabled a user to one or more equipment interact with the electronic equipment 12 communicate, and/or with make the electricity
Any equipment (such as network interface card, modem etc.) that sub- equipment 12 can be communicated with one or more of the other calculating equipment
Communication.This communication can be carried out by input/output (I/O) interface 22.Also, electronic equipment 12 can also be suitable by network
Orchestration 20 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet)
Communication.As shown, network adapter 20 is communicated by bus 18 with other modules of electronic equipment 12.Although should be understood that
It is not shown in the figure, other hardware and/or software module can be used in conjunction with electronic equipment 12, including but not limited to: microcode is set
Standby driver, redundant processing unit, external disk drive array, RAID system, tape drive and data backup storage system
System etc..
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and
Data processing, such as realize audio recognition method provided by the embodiment of the present invention, this method comprises:
Obtain voice request;
Speech recognition is carried out to the voice request based on preparatory trained speech recognition system, the voice is obtained and asks
Seek corresponding intent information;
Wherein, the trained speech recognition system in advance is trained in real time based on the training metadata obtained in real time
It obtains.
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and
Data processing, such as realize audio recognition method provided by the embodiment of the present invention.
Certainly, it will be understood by those skilled in the art that processor can also realize it is provided by any embodiment of the invention
The technical solution of audio recognition method.
Example IV
The embodiment of the present invention four additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should
The audio recognition method as provided by the embodiment of the present invention is realized when program is executed by processor, this method comprises:
Obtain voice request;
Speech recognition is carried out to the voice request based on preparatory trained speech recognition system, the voice is obtained and asks
Seek corresponding intent information;
Wherein, the trained speech recognition system in advance is trained in real time based on the training metadata obtained in real time
It obtains.
Certainly, a kind of computer readable storage medium provided by the embodiment of the present invention, the computer program stored thereon
The method operation being not limited to the described above, can also be performed the phase in audio recognition method provided by any embodiment of the invention
Close operation.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media
Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool
There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires
(ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-
ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage
Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device
Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion
Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.?
Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or
Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service
It is connected for quotient by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (10)
1. a kind of audio recognition method characterized by comprising
Obtain voice request;
Speech recognition is carried out to the voice request based on preparatory trained speech recognition system, obtains the voice request pair
The intent information answered;
Wherein, the trained speech recognition system in advance is carried out trained in real time based on the training metadata that obtains in real time
It arrives.
2. the method according to claim 1, wherein the method also includes:
Training metadata is obtained in real time;
Wherein, the trained metadata includes that video metadata, popular word metadata, exotic vocabulary metadata and intelligence are set
At least one of standby the machine control metadata.
3. according to the method described in claim 2, it is characterized in that, the real-time acquisition training metadata, comprising:
Web service framework, which is based on, in target network infrastructure builds trained metadata collecting platform;
By the website trained metadata collecting platform calls application interface API real time access OTT, to realize to training
The acquisition of metadata.
4. according to the method described in claim 3, applying journey it is characterized in that, calling by the trained metadata collecting platform
The website sequence interface API real time access OTT, to realize the acquisition to training metadata, comprising:
The website API real time access OTT is called by the trained metadata collecting platform, obtains video tab based on preset rules
Information;
The metadata of the video is parsed according to the video tab information.
5. according to the method described in claim 4, it is characterized in that, described obtain video tab information, packet based on preset rules
It includes:
Obtain the video tab information of setting quantity;
Alternatively, the renewal time according to video obtains video tab information.
6. according to the method described in claim 2, it is characterized by further comprising:
The training metadata obtained in real time is based on different language and generates file destination;Alternatively, by the first number of the training obtained in real time
File destination is generated according to based on various regions dialect;The file destination is uploaded to speech recognition platforms in real time, to be based on obtaining in real time
The training metadata taken trains speech recognition system in real time.
7. according to the method described in claim 6, being put down it is characterized in that, the file destination is uploaded to speech recognition in real time
Platform, comprising:
The full release of the file destination is uploaded to speech recognition platforms in real time;
Know alternatively, the file destination of current version and before the difference metadata between the file destination of version are uploaded to voice
Other platform;
Wherein, the file destination includes file version information.
8. a kind of speech recognition equipment characterized by comprising
Module is obtained, for obtaining voice request;
Identification module is obtained for carrying out speech recognition to the voice request based on preparatory trained speech recognition system
The corresponding intent information of the voice request;
Wherein, the trained speech recognition system in advance is carried out trained in real time based on the training metadata that obtains in real time
It arrives.
9. a kind of electronic equipment, which is characterized in that the electronic equipment further include:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
The now audio recognition method as described in any in claim 1-7.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The audio recognition method as described in any in claim 1-7 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811572238.6A CN109300472A (en) | 2018-12-21 | 2018-12-21 | A kind of audio recognition method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811572238.6A CN109300472A (en) | 2018-12-21 | 2018-12-21 | A kind of audio recognition method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109300472A true CN109300472A (en) | 2019-02-01 |
Family
ID=65142934
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811572238.6A Pending CN109300472A (en) | 2018-12-21 | 2018-12-21 | A kind of audio recognition method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109300472A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111128183A (en) * | 2019-12-19 | 2020-05-08 | 北京搜狗科技发展有限公司 | Speech recognition method, apparatus and medium |
CN113412516A (en) * | 2019-02-06 | 2021-09-17 | 谷歌有限责任公司 | Voice query QoS based on client-computed content metadata |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101382937A (en) * | 2008-07-01 | 2009-03-11 | 深圳先进技术研究院 | Multimedia resource processing method based on speech recognition and on-line teaching system thereof |
CN103916709A (en) * | 2013-01-07 | 2014-07-09 | 三星电子株式会社 | Server and method for controlling server |
CN105095195A (en) * | 2015-07-03 | 2015-11-25 | 北京京东尚科信息技术有限公司 | Method and system for human-machine questioning and answering based on knowledge graph |
CN105654945A (en) * | 2015-10-29 | 2016-06-08 | 乐视致新电子科技(天津)有限公司 | Training method of language model, apparatus and equipment thereof |
US20160351187A1 (en) * | 2015-06-01 | 2016-12-01 | Dell Software, Inc. | Method and Apparatus to Extrapolate Sarcasm and Irony Using Multi-Dimensional Machine Learning Based Linguistic Analysis |
-
2018
- 2018-12-21 CN CN201811572238.6A patent/CN109300472A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101382937A (en) * | 2008-07-01 | 2009-03-11 | 深圳先进技术研究院 | Multimedia resource processing method based on speech recognition and on-line teaching system thereof |
CN103916709A (en) * | 2013-01-07 | 2014-07-09 | 三星电子株式会社 | Server and method for controlling server |
US20160351187A1 (en) * | 2015-06-01 | 2016-12-01 | Dell Software, Inc. | Method and Apparatus to Extrapolate Sarcasm and Irony Using Multi-Dimensional Machine Learning Based Linguistic Analysis |
CN105095195A (en) * | 2015-07-03 | 2015-11-25 | 北京京东尚科信息技术有限公司 | Method and system for human-machine questioning and answering based on knowledge graph |
CN105654945A (en) * | 2015-10-29 | 2016-06-08 | 乐视致新电子科技(天津)有限公司 | Training method of language model, apparatus and equipment thereof |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113412516A (en) * | 2019-02-06 | 2021-09-17 | 谷歌有限责任公司 | Voice query QoS based on client-computed content metadata |
CN113412516B (en) * | 2019-02-06 | 2024-04-05 | 谷歌有限责任公司 | Method and system for processing automatic speech recognition ASR request |
CN111128183A (en) * | 2019-12-19 | 2020-05-08 | 北京搜狗科技发展有限公司 | Speech recognition method, apparatus and medium |
CN111128183B (en) * | 2019-12-19 | 2023-03-17 | 北京搜狗科技发展有限公司 | Speech recognition method, apparatus and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11158102B2 (en) | Method and apparatus for processing information | |
CN111599343B (en) | Method, apparatus, device and medium for generating audio | |
CN108133707B (en) | Content sharing method and system | |
WO2021083071A1 (en) | Method, device, and medium for speech conversion, file generation, broadcasting, and voice processing | |
JP6681450B2 (en) | Information processing method and device | |
CN108597509A (en) | Intelligent sound interacts implementation method, device, computer equipment and storage medium | |
KR20210040882A (en) | Method and apparatus for generating video | |
JP2019211747A (en) | Voice concatenative synthesis processing method and apparatus, computer equipment and readable medium | |
CN109754783A (en) | Method and apparatus for determining the boundary of audio sentence | |
US10783884B2 (en) | Electronic device-awakening method and apparatus, device and computer-readable storage medium | |
US20140028780A1 (en) | Producing content to provide a conversational video experience | |
WO2021227308A1 (en) | Video resource generation method and apparatus | |
KR20200027331A (en) | Voice synthesis device | |
WO2023197749A9 (en) | Background music insertion time point determining method and apparatus, device, and storage medium | |
CN112581965A (en) | Transcription method, device, recording pen and storage medium | |
CN109300472A (en) | A kind of audio recognition method, device, equipment and medium | |
JP2024536014A (en) | Optimizing Lip Sync for Natural Language Translation Video | |
CN108847066A (en) | A kind of content of courses reminding method, device, server and storage medium | |
CN114023309A (en) | Speech recognition system, related method, device and equipment | |
WO2023061229A1 (en) | Video generation method and device | |
CN111739510A (en) | Information processing method, information processing apparatus, vehicle, and computer storage medium | |
CN113282770A (en) | Multimedia recommendation system and method | |
JP2020173776A (en) | Method and device for generating video | |
CN113823300B (en) | Voice processing method and device, storage medium and electronic equipment | |
CN113409767B (en) | Voice processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190201 |