CN107704549A - Voice search method, device and computer equipment - Google Patents

Voice search method, device and computer equipment Download PDF

Info

Publication number
CN107704549A
CN107704549A CN201710884466.6A CN201710884466A CN107704549A CN 107704549 A CN107704549 A CN 107704549A CN 201710884466 A CN201710884466 A CN 201710884466A CN 107704549 A CN107704549 A CN 107704549A
Authority
CN
China
Prior art keywords
type
voice
voice messaging
model
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710884466.6A
Other languages
Chinese (zh)
Inventor
袁胜龙
马啸空
蒋兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201710884466.6A priority Critical patent/CN107704549A/en
Publication of CN107704549A publication Critical patent/CN107704549A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention proposes a kind of voice search method, device and computer equipment, and wherein method includes:Obtain voice messaging to be searched;Feature extraction is carried out to voice messaging, obtains the characteristic information in voice messaging;Characteristic information is identified using each Type model, determines the type of voice messaging;Type includes:Male voice, female voice and child's voice;Type model includes:Male voice Type model, female voice Type model and child's voice Type model;Characteristic information is identified the identification model according to corresponding to using the type of voice messaging, obtains text message corresponding to voice messaging;Text message scans for according to corresponding to voice messaging, obtain search result corresponding with voice messaging, so as to carry out type identification to the characteristic information in voice messaging using each Type model, and characteristic information is identified using the identification model of type, male voice, female voice and child's voice can be targetedly identified, improve the accuracy rate of speech recognition.

Description

Voice search method, device and computer equipment
Technical field
The present invention relates to communication technical field, more particularly to a kind of voice search method, device and computer equipment.
Background technology
At present, when carrying out phonetic search, mainly using general speech recognition modeling identification voice messaging, voice is obtained Text message corresponding to information, is scanned for using text message, obtains search result corresponding with voice messaging.However, by Difference be present between male voice, female voice and child's voice, cause the recognition accuracy of speech recognition modeling low.
The content of the invention
It is contemplated that at least solves one of technical problem in correlation technique to a certain extent.
Therefore, first purpose of the present invention is to propose a kind of voice search method, for solving to know in the prior art The problem of other accuracy rate is low.
Second object of the present invention is to propose a kind of voice searching device.
Third object of the present invention is to propose a kind of computer equipment.
Fourth object of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
The 5th purpose of the present invention is to propose a kind of computer program product.
For the above-mentioned purpose, first aspect present invention embodiment proposes a kind of voice search method, including:
Obtain voice messaging to be searched;
Feature extraction is carried out to the voice messaging, obtains the characteristic information in the voice messaging;
The characteristic information is identified using each Type model, determines the type of the voice messaging;The class Type includes:Male voice, female voice and child's voice;The Type model includes:Male voice Type model, female voice Type model and child's voice type Model;
The characteristic information is identified the identification model according to corresponding to using the type of the voice messaging, obtains institute State text message corresponding to voice messaging;The identification model includes:Male voice identification model, female voice identification model and child's voice are known Other model;
Scanned for according to text message corresponding to the voice messaging, obtain search knot corresponding with the voice messaging Fruit.
Further, it is described that the characteristic information is identified using each Type model, determine the voice messaging Type, including:
The characteristic information is identified using each Type model, the voice messaging is obtained and belongs to various types of Scoring;
Various types of scorings are belonged to according to the voice messaging, determine the type of the voice messaging.
Further, before acquisition voice messaging to be searched, in addition to:
Obtain the first training data of each type;First training data of each type includes:With the class At least one voice messaging corresponding to type;
According to the first training data of each type, corresponding Type model is trained.
Further, before acquisition voice messaging to be searched, in addition to:
The second training data of each type is obtained, the second training data of each type includes:With the class At least one voice messaging corresponding to type, and text message corresponding to the voice messaging;
According to the second training data of each type, corresponding identification model is trained.
Further, second training data for obtaining each type, including:
Obtain the 3rd training data;3rd training data includes:At least one voice messaging, and the voice Text message corresponding to information;
Feature extraction is carried out to each voice messaging in the 3rd training data, obtains the feature letter in each voice messaging Breath;
The characteristic information in each voice messaging is identified using each Type model, determines the class of each voice messaging Type;
According to the type of each voice messaging, type mark is carried out to each voice messaging in the 3rd training data, obtained To the second training data of each type.
Further, it is described that feature extraction is carried out to the voice messaging, obtain the characteristic information in the voice messaging Before, in addition to:
To the voice messaging carry out activity sound detection, the invalid components in the voice messaging are removed;It is described it is invalid into Dividing includes:Jing Yin composition and background noise composition.
Further, the characteristic information is mel cepstrum coefficients feature.
The voice search method of the embodiment of the present invention, by obtaining voice messaging to be searched;Voice messaging is carried out special Sign extraction, obtains the characteristic information in voice messaging;Characteristic information is identified using each Type model, determines that voice is believed The type of breath;Type includes:Male voice, female voice and child's voice;Type model includes:Male voice Type model, female voice Type model and Child's voice Type model;Characteristic information is identified the identification model according to corresponding to using the type of voice messaging, obtains voice Text message corresponding to information;Identification model includes:Male voice identification model, female voice identification model and child's voice identification model;Root Scanned for according to text message corresponding to voice messaging, obtain search result corresponding with voice messaging, so as to using each Individual Type model carries out type identification to the characteristic information in voice messaging, and uses the identification model of type to characteristic information It is identified, male voice, female voice and child's voice can be targetedly identified, improve the accuracy rate of speech recognition.
For the above-mentioned purpose, second aspect of the present invention embodiment proposes a kind of voice searching device, including:
Acquisition module, for obtaining voice messaging to be searched;
Extraction module, for carrying out feature extraction to the voice messaging, obtain the characteristic information in the voice messaging;
Identification module, for the characteristic information to be identified using each Type model, determine the voice messaging Type;The type includes:Male voice, female voice and child's voice;The Type model includes:Male voice Type model, female voice class pattern Type and child's voice Type model;
The identification module, be additionally operable to according to the type of the voice messaging use corresponding to identification model to the feature Information is identified, and obtains text message corresponding to the voice messaging;The identification model includes:Male voice identification model, female Sound identification model and child's voice identification model;
Search module, scanned for for the text message according to corresponding to the voice messaging, obtain and believe with the voice Search result corresponding to breath.
Further, the identification module is specifically used for,
The characteristic information is identified using each Type model, the voice messaging is obtained and belongs to various types of Scoring;
Various types of scorings are belonged to according to the voice messaging, determine the type of the voice messaging.
Further, described device also includes:First training module;
The acquisition module, it is additionally operable to obtain the first training data of each type;First training of each type Data include:At least one voice messaging corresponding with the type;
First training module, for the first training data according to each type, corresponding Type model is carried out Training.
Further, described device also includes:Second training module;
The acquisition module, it is additionally operable to obtain the second training data of each type, the second training of each type Data include:At least one voice messaging corresponding with the type, and text message corresponding to the voice messaging;
Second training module, for the second training data according to each type, corresponding identification model is carried out Training.
Further, the acquisition module is specifically used for,
Obtain the 3rd training data;3rd training data includes:At least one voice messaging, and the voice Text message corresponding to information;
Feature extraction is carried out to each voice messaging in the 3rd training data, obtains the feature letter in each voice messaging Breath;
The characteristic information in each voice messaging is identified using each Type model, determines the class of each voice messaging Type;
According to the type of each voice messaging, type mark is carried out to each voice messaging in the 3rd training data, obtained To the second training data of each type.
The voice searching device of the embodiment of the present invention, by obtaining voice messaging to be searched;Voice messaging is carried out special Sign extraction, obtains the characteristic information in voice messaging;Characteristic information is identified using each Type model, determines that voice is believed The type of breath;Type includes:Male voice, female voice and child's voice;Type model includes:Male voice Type model, female voice Type model and Child's voice Type model;Characteristic information is identified the identification model according to corresponding to using the type of voice messaging, obtains voice Text message corresponding to information;Identification model includes:Male voice identification model, female voice identification model and child's voice identification model;Root Scanned for according to text message corresponding to voice messaging, obtain search result corresponding with voice messaging, so as to using each Individual Type model carries out type identification to the characteristic information in voice messaging, and uses the identification model of type to characteristic information It is identified, male voice, female voice and child's voice can be targetedly identified, improve the accuracy rate of speech recognition.
For the above-mentioned purpose, third aspect present invention embodiment proposes a kind of computer equipment, including memory, processing Device and storage on a memory and the computer program that can run on a processor, during the computing device described program, reality Now method as described above.
To achieve these goals, fourth aspect present invention embodiment proposes a kind of computer-readable storage of non-transitory Medium, computer program is stored thereon with, the program realizes method as described above when being executed by processor.
For the above-mentioned purpose, fifth aspect present invention embodiment proposes a kind of computer program product, when the calculating When instruction in machine program product is by computing device, method as described above is performed.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
Of the invention above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments Substantially and it is readily appreciated that, wherein:
Fig. 1 is a kind of schematic flow sheet of voice search method provided in an embodiment of the present invention;
Fig. 2 is the schematic flow sheet of another voice search method provided in an embodiment of the present invention;
Fig. 3 is a kind of structural representation of voice searching device provided in an embodiment of the present invention;
Fig. 4 is the structural representation of another voice searching device provided in an embodiment of the present invention;
Fig. 5 is the structural representation of another voice searching device provided in an embodiment of the present invention;
Fig. 6 is a kind of block diagram of computer equipment provided in an embodiment of the present invention.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings voice search method, device and the computer equipment of the embodiment of the present invention are described.
Fig. 1 is a kind of schematic flow sheet of voice search method provided in an embodiment of the present invention.As shown in figure 1, the voice Searching method comprises the following steps:
S101, obtain voice messaging to be searched.
The executive agent of voice search method provided by the invention is voice searching device, and voice searching device specifically can be with For the hardware or software on the hardware device such as terminal device or server.Voice searching device obtains language to be searched The mode of message breath can be the voice messaging obtained by collections such as microphone arrays;Or receive what other equipment was sent Voice messaging.
S102, feature extraction is carried out to voice messaging, obtain the characteristic information in voice messaging.
In the present embodiment, voice searching device can extract the mel cepstrum coefficients (Mel-Frequency in voice messaging Cepstral Coefficients, MFCC) feature.
It should be noted that before step 102, voice searching device can be first to voice messaging carry out activity sound detection (Voice Activity Detection, VAD), remove the invalid components in voice messaging;Invalid components include:Jing Yin composition With background noise composition.
S103, using each Type model characteristic information is identified, determines the type of voice messaging;Type includes: Male voice, female voice and child's voice;Type model includes:Male voice Type model, female voice Type model and child's voice Type model.
In the present embodiment, step 103 can specifically include:Characteristic information is identified using each Type model, obtained Voice messaging is taken to belong to various types of scorings;Various types of scorings are belonged to according to voice messaging, determine the class of voice messaging Type.Specifically, voice searching device can input characteristic information respectively male voice Type model, female voice Type model and child's voice Type model, it is defeated to obtain the scoring of male voice Type model output, the scoring of female voice Type model output and child's voice Type model The scoring gone out;Scoring, the scoring of female voice Type model output and the output of child's voice Type model that male voice Type model is exported Scoring belong to the scoring of male voice type as voice messaging successively, voice messaging belong to female voice type scoring and voice letter Breath belongs to the scoring of child's voice type;By the corresponding highest that scores, and the difference between highest scoring and other scorings is more than default The type of value is defined as the type of voice messaging.Wherein, Type model can be mixed Gauss model.
In addition, it is also necessary to illustrate, can if the difference between highest scoring and other scorings is less than preset value To obtain universal identification model, characteristic information is identified using universal identification model, obtains text corresponding to voice messaging Information.Wherein, universal identification model is to be trained according to all types of at least one voice messagings and corresponding text message The identification model arrived.
Further, before step 101, voice searching device also needs to obtain male voice Type model, female voice Type model And child's voice Type model, obtaining the process of male voice Type model, female voice Type model and child's voice Type model mainly includes: Obtain the first training data of each type;First training data of each type includes:It is corresponding with type at least one Voice messaging;According to the first training data of each type, corresponding Type model is trained.
In the present embodiment, in the case that type corresponding to each voice messaging is handmarking in the first training data, language Sound searcher can use part to be trained by the voice messaging of type mark to Type model;Then after using training Type model other are identified without the voice messaging of type mark, obtain other voices without type mark The type of information, so as to reduce handmarking's amount and cost.
Characteristic information is identified for S104, the identification model according to corresponding to using the type of voice messaging, obtains voice Text message corresponding to information;Identification model includes:Male voice identification model, female voice identification model and child's voice identification model.
Wherein, identification model can be deep neural network DNN models or shot and long term Memory Neural Networks LSTM models.
S105, text message scans for according to corresponding to voice messaging, obtains search result corresponding with voice messaging.
The voice search method of the embodiment of the present invention, by obtaining voice messaging to be searched;Voice messaging is carried out special Sign extraction, obtains the characteristic information in voice messaging;Characteristic information is identified using each Type model, determines that voice is believed The type of breath;Type includes:Male voice, female voice and child's voice;Type model includes:Male voice Type model, female voice Type model and Child's voice Type model;Characteristic information is identified the identification model according to corresponding to using the type of voice messaging, obtains voice Text message corresponding to information;Identification model includes:Male voice identification model, female voice identification model and child's voice identification model;Root Scanned for according to text message corresponding to voice messaging, obtain search result corresponding with voice messaging, so as to using each Individual Type model carries out type identification to the characteristic information in voice messaging, and uses the identification model of type to characteristic information It is identified, male voice, female voice and child's voice can be targetedly identified, improve the accuracy rate of speech recognition.
Fig. 2 is the schematic flow sheet of another voice search method provided in an embodiment of the present invention.As shown in Fig. 2 in Fig. 1 On the basis of illustrated embodiment, before step 101, described method can also comprise the following steps:
S106, the second training data for obtaining each type, the second training data of each type include:With type pair At least one voice messaging answered, and text message corresponding to voice messaging.
Wherein, the process of voice searching device execution step 106 is specifically as follows, and obtains the 3rd training data;3rd instruction Practicing data includes:At least one voice messaging, and text message corresponding to voice messaging;To each in the 3rd training data Voice messaging carries out feature extraction, obtains the characteristic information in each voice messaging;Using each Type model to each voice messaging In characteristic information be identified, determine the type of each voice messaging;According to the type of each voice messaging, to the 3rd training data In each voice messaging carry out type mark, obtain the second training data of each type.
S107, the second training data according to each type, are trained to corresponding identification model.
The voice search method of the embodiment of the present invention, by first obtaining the second training data of each type, each type The second training data include:At least one voice messaging corresponding with type, and text message corresponding to voice messaging; According to the second training data of each type, corresponding identification model is trained;Then voice messaging to be searched is obtained; Feature extraction is carried out to voice messaging, obtains the characteristic information in voice messaging;Characteristic information is entered using each Type model Row identification, determine the type of voice messaging;Type includes:Male voice, female voice and child's voice;Type model includes:Male voice Type model, Female voice Type model and child's voice Type model;The identification model according to corresponding to using the type of voice messaging is entered to characteristic information Row identification, obtains text message corresponding to voice messaging;Identification model includes:Male voice identification model, female voice identification model and Child's voice identification model;Text message scans for according to corresponding to voice messaging, obtains search result corresponding with voice messaging, So as to carry out type identification, and the identification using type to the characteristic information in voice messaging using each Type model Characteristic information is identified model, male voice, female voice and child's voice can be targetedly identified, improve speech recognition Accuracy rate.
Fig. 3 is a kind of structural representation of voice searching device provided in an embodiment of the present invention.As shown in figure 3, including:
Acquisition module 31, extraction module 32, identification module 33 and search module 34.
Wherein, acquisition module 31, for obtaining voice messaging to be searched;
Extraction module 32, for carrying out feature extraction to the voice messaging, obtain the feature letter in the voice messaging Breath;
Identification module 33, for the characteristic information to be identified using each Type model, determine the voice letter The type of breath;The type includes:Male voice, female voice and child's voice;The Type model includes:Male voice Type model, female voice type Model and child's voice Type model;
The identification module 33, be additionally operable to according to the type of the voice messaging use corresponding to identification model to the spy Reference breath is identified, and obtains text message corresponding to the voice messaging;The identification model includes:Male voice identification model, Female voice identification model and child's voice identification model;
Search module 34, scan for, obtain and the voice for the text message according to corresponding to the voice messaging Search result corresponding to information.
Voice searching device provided by the invention is specifically as follows on the hardware device such as terminal device or server Hardware or software.The mode that voice searching device obtains voice messaging to be searched can be to pass through microphone array etc. Gather the voice messaging obtained;Or receive the voice messaging that other equipment is sent.
In the present embodiment, voice searching device can extract the mel cepstrum coefficients (Mel-Frequency in voice messaging Cepstral Coefficients, MFCC) feature.
It should be noted that before extraction module 32 carries out feature extraction to the voice messaging, voice searching device can First to voice messaging carry out activity sound detection (Voice Activity Detection, VAD), to remove the nothing in voice messaging Imitate composition;Invalid components include:Jing Yin composition and background noise composition.
In the present embodiment, identification module 33 is specifically used for, and characteristic information is identified using each Type model, obtains Voice messaging belongs to various types of scorings;Various types of scorings are belonged to according to voice messaging, determine the type of voice messaging. Specifically, identification module 33 can input characteristic information respectively male voice Type model, female voice Type model and child's voice type Model, obtain the scoring of male voice Type model output, the scoring of female voice Type model output and the output of child's voice Type model Scoring;Scoring, the scoring of female voice Type model output and the commenting for child's voice Type model output that male voice Type model is exported Point belong to the scoring of male voice type as voice messaging successively, voice messaging belongs to scoring and the voice messaging category of female voice type In the scoring of child's voice type;By the corresponding highest that scores, and the difference between highest scoring and other scorings is more than preset value Type is defined as the type of voice messaging.
Further, with reference to reference to figure 4, on the basis of embodiment illustrated in fig. 3, described device also includes:First instruction Practice module 35;
The acquisition module 31, it is additionally operable to obtain the first training data of each type;First instruction of each type Practicing data includes:At least one voice messaging corresponding with the type;
First training module 35, for the first training data according to each type, corresponding Type model is entered Row training.
The voice searching device of the embodiment of the present invention, by obtaining voice messaging to be searched;Voice messaging is carried out special Sign extraction, obtains the characteristic information in voice messaging;Characteristic information is identified using each Type model, determines that voice is believed The type of breath;Type includes:Male voice, female voice and child's voice;Type model includes:Male voice Type model, female voice Type model and Child's voice Type model;Characteristic information is identified the identification model according to corresponding to using the type of voice messaging, obtains voice Text message corresponding to information;Identification model includes:Male voice identification model, female voice identification model and child's voice identification model;Root Scanned for according to text message corresponding to voice messaging, obtain search result corresponding with voice messaging, so as to using each Individual Type model carries out type identification to the characteristic information in voice messaging, and uses the identification model of type to characteristic information It is identified, male voice, female voice and child's voice can be targetedly identified, improve the accuracy rate of speech recognition.
Further, with reference to reference to figure 5, on the basis of embodiment illustrated in fig. 3, described device also includes:Second instruction Practice module 36;
The acquisition module 31, it is additionally operable to obtain the second training data of each type, the second instruction of each type Practicing data includes:At least one voice messaging corresponding with the type, and text message corresponding to the voice messaging;
Second training module 36, for the second training data according to each type, corresponding identification model is entered Row training.
Wherein, the acquisition module 31 is specifically used for,
Obtain the 3rd training data;3rd training data includes:At least one voice messaging, and the voice Text message corresponding to information;
Feature extraction is carried out to each voice messaging in the 3rd training data, obtains the feature letter in each voice messaging Breath;
The characteristic information in each voice messaging is identified using each Type model, determines the class of each voice messaging Type;
According to the type of each voice messaging, type mark is carried out to each voice messaging in the 3rd training data, obtained To the second training data of each type.
The voice searching device of the embodiment of the present invention, by first obtaining the second training data of each type, each type The second training data include:At least one voice messaging corresponding with type, and text message corresponding to voice messaging; According to the second training data of each type, corresponding identification model is trained;Then voice messaging to be searched is obtained; Feature extraction is carried out to voice messaging, obtains the characteristic information in voice messaging;Characteristic information is entered using each Type model Row identification, determine the type of voice messaging;Type includes:Male voice, female voice and child's voice;Type model includes:Male voice Type model, Female voice Type model and child's voice Type model;The identification model according to corresponding to using the type of voice messaging is entered to characteristic information Row identification, obtains text message corresponding to voice messaging;Identification model includes:Male voice identification model, female voice identification model and Child's voice identification model;Text message scans for according to corresponding to voice messaging, obtains search result corresponding with voice messaging, So as to carry out type identification, and the identification using type to the characteristic information in voice messaging using each Type model Characteristic information is identified model, male voice, female voice and child's voice can be targetedly identified, improve speech recognition Accuracy rate.
It should be noted that the foregoing explanation to voice search method embodiment is also applied for the voice of the embodiment Searcher, here is omitted.
In order to realize above-described embodiment, the present invention also proposes a kind of computer equipment, including:Memory, processor and deposit On a memory and the computer program that can run on a processor, during the computing device described program, realization is as above for storage Described method.
In order to realize above-described embodiment, the present invention also proposes a kind of non-transitorycomputer readable storage medium, deposited thereon Computer program is contained, the program realizes method as described above when being executed by processor.
In order to realize above-described embodiment, the present invention also proposes a kind of computer program product, when the computer program produces When instruction in product is by computing device, method as described above is performed.
Fig. 6 shows the block diagram suitable for being used for the exemplary computer device for realizing the application embodiment.What Fig. 6 was shown Computer equipment 12 is only an example, should not bring any restrictions to the function and use range of the embodiment of the present application.
As shown in fig. 6, computer equipment 12 is showed in the form of universal computing device.The component of computer equipment 12 can be with Including but not limited to:One or more processor or processing unit 16, system storage 28, connect different system component The bus 18 of (including system storage 28 and processing unit 16).
Bus 18 represents the one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.Lift For example, these architectures include but is not limited to industry standard architecture (Industry Standard Architecture;Hereinafter referred to as:ISA) bus, MCA (Micro Channel Architecture;Below Referred to as:MAC) bus, enhanced isa bus, VESA (Video Electronics Standards Association;Hereinafter referred to as:VESA) local bus and periphery component interconnection (Peripheral Component Interconnection;Hereinafter referred to as:PCI) bus.
Computer equipment 12 typically comprises various computing systems computer-readable recording medium.These media can be it is any can be by The usable medium that computer equipment 12 accesses, including volatibility and non-volatile media, moveable and immovable medium.
Memory 28 can include the computer system readable media of form of volatile memory, such as random access memory Device (Random Access Memory;Hereinafter referred to as:RAM) 30 and/or cache memory 52.Computer equipment 12 can be with Further comprise other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, Storage system 54 can be used for reading and writing immovable, non-volatile magnetic media, and (Fig. 6 do not show, commonly referred to as " hard drive Device ").Although not shown in Fig. 6, it can provide for being driven to the disk that may move non-volatile magnetic disk (such as " floppy disk ") read-write Dynamic device, and to removable anonvolatile optical disk (such as:Compact disc read-only memory (Compact Disc Read Only Memory;Hereinafter referred to as:CD-ROM), digital multi read-only optical disc (Digital Video Disc Read Only Memory;Hereinafter referred to as:DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving Device can be connected by one or more data media interfaces with bus 18.Memory 28 can include at least one program and produce Product, the program product have one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application The function of embodiment.
Program/utility 40 with one group of (at least one) program module 42, such as memory 28 can be stored in In, such program module 42 include but is not limited to operating system, one or more application program, other program modules and Routine data, the realization of network environment may be included in each or certain combination in these examples.Program module 42 is usual Perform the function and/or method in embodiments described herein.
Computer equipment 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 Deng) communication, it can also enable a user to the equipment communication interacted with the computer system/server 12 with one or more, and/ Or any equipment (example with enabling the computer system/server 12 to be communicated with one or more of the other computing device Such as network interface card, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 22.Also, calculate Machine equipment 12 can also pass through network adapter 20 and one or more network (such as LAN (Local Area Network;Hereinafter referred to as:LAN), wide area network (Wide Area Network;Hereinafter referred to as:WAN) and/or public network, example Such as internet) communication.As illustrated, network adapter 20 is communicated by bus 18 with other modules of computer equipment 12.Should When understanding, although not shown in the drawings, can combine computer equipment 12 does not use other hardware and/or software module, including but not It is limited to:Microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and Data backup storage system etc..
Processing unit 16 is stored in program in system storage 28 by operation, so as to perform various function application and Data processing, such as realize the method referred in previous embodiment.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description Point is contained at least one embodiment or example of the present invention.In this manual, to the schematic representation of above-mentioned term not Identical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with office Combined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this area Art personnel can be tied the different embodiments or example and the feature of different embodiments or example described in this specification Close and combine.
In addition, term " first ", " second " are only used for describing purpose, and it is not intended that instruction or hint relative importance Or the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the invention, " multiple " are meant that at least two, such as two, three It is individual etc., unless otherwise specifically defined.
Any process or method described otherwise above description in flow chart or herein is construed as, and represents to include Module, fragment or the portion of the code of the executable instruction of one or more the step of being used to realize custom logic function or process Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system including the system of processor or other can be held from instruction The system of row system, device or equipment instruction fetch and execute instruction) use, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass Defeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipment Put.The more specifically example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium, which can even is that, to print the paper of described program thereon or other are suitable Medium, because can then enter edlin, interpretation or if necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, software that multiple steps or method can be performed in memory and by suitable instruction execution system with storage Or firmware is realized.Such as, if realized with hardware with another embodiment, following skill well known in the art can be used Any one of art or their combination are realized:With the logic gates for realizing logic function to data-signal from Logic circuit is dissipated, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile Journey gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method carries Suddenly it is that by program the hardware of correlation can be instructed to complete, described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, can also That unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated mould Block can both be realized in the form of hardware, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and as independent production marketing or in use, can also be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..Although have been shown and retouch above Embodiments of the invention are stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as the limit to the present invention System, one of ordinary skill in the art can be changed to above-described embodiment, change, replace and become within the scope of the invention Type.

Claims (15)

  1. A kind of 1. voice search method, it is characterised in that including:
    Obtain voice messaging to be searched;
    Feature extraction is carried out to the voice messaging, obtains the characteristic information in the voice messaging;
    The characteristic information is identified using each Type model, determines the type of the voice messaging;The type bag Include:Male voice, female voice and child's voice;The Type model includes:Male voice Type model, female voice Type model and child's voice class pattern Type;
    The characteristic information is identified the identification model according to corresponding to using the type of the voice messaging, obtains institute's predicate Text message corresponding to message breath;The identification model includes:Male voice identification model, female voice identification model and child's voice identification mould Type;
    Scanned for according to text message corresponding to the voice messaging, obtain search result corresponding with the voice messaging.
  2. 2. according to the method for claim 1, it is characterised in that described that the characteristic information is entered using each Type model Row identification, the type of the voice messaging is determined, including:
    The characteristic information is identified using each Type model, the voice messaging is obtained and belongs to various types of and comment Point;
    Various types of scorings are belonged to according to the voice messaging, determine the type of the voice messaging.
  3. 3. according to the method for claim 1, it is characterised in that before acquisition voice messaging to be searched, in addition to:
    Obtain the first training data of each type;First training data of each type includes:With the type pair At least one voice messaging answered;
    According to the first training data of each type, corresponding Type model is trained.
  4. 4. according to the method for claim 1, it is characterised in that before acquisition voice messaging to be searched, in addition to:
    The second training data of each type is obtained, the second training data of each type includes:With the type pair At least one voice messaging answered, and text message corresponding to the voice messaging;
    According to the second training data of each type, corresponding identification model is trained.
  5. 5. according to the method for claim 4, it is characterised in that second training data for obtaining each type, including:
    Obtain the 3rd training data;3rd training data includes:At least one voice messaging, and the voice messaging Corresponding text message;
    Feature extraction is carried out to each voice messaging in the 3rd training data, obtains the characteristic information in each voice messaging;
    The characteristic information in each voice messaging is identified using each Type model, determines the type of each voice messaging;
    According to the type of each voice messaging, type mark is carried out to each voice messaging in the 3rd training data, obtained each Second training data of individual type.
  6. 6. according to the method for claim 1, it is characterised in that it is described that feature extraction is carried out to the voice messaging, obtain Before characteristic information in the voice messaging, in addition to:
    To the voice messaging carry out activity sound detection, the invalid components in the voice messaging are removed;The invalid components bag Include:Jing Yin composition and background noise composition.
  7. 7. according to the method for claim 1, it is characterised in that the characteristic information is mel cepstrum coefficients feature.
  8. A kind of 8. voice searching device, it is characterised in that including:
    Acquisition module, for obtaining voice messaging to be searched;
    Extraction module, for carrying out feature extraction to the voice messaging, obtain the characteristic information in the voice messaging;
    Identification module, for the characteristic information to be identified using each Type model, determine the class of the voice messaging Type;The type includes:Male voice, female voice and child's voice;The Type model includes:Male voice Type model, female voice Type model with And child's voice Type model;
    The identification module, be additionally operable to according to the type of the voice messaging use corresponding to identification model to the characteristic information It is identified, obtains text message corresponding to the voice messaging;The identification model includes:Male voice identification model, female voice are known Other model and child's voice identification model;
    Search module, scan for, obtain and the voice messaging pair for the text message according to corresponding to the voice messaging The search result answered.
  9. 9. device according to claim 8, it is characterised in that the identification module is specifically used for,
    The characteristic information is identified using each Type model, the voice messaging is obtained and belongs to various types of and comment Point;
    Various types of scorings are belonged to according to the voice messaging, determine the type of the voice messaging.
  10. 10. device according to claim 8, it is characterised in that also include:First training module;
    The acquisition module, it is additionally operable to obtain the first training data of each type;First training data of each type Include:At least one voice messaging corresponding with the type;
    First training module, for the first training data according to each type, corresponding Type model is trained.
  11. 11. device according to claim 8, it is characterised in that also include:Second training module;
    The acquisition module, it is additionally operable to obtain the second training data of each type, the second training data of each type Include:At least one voice messaging corresponding with the type, and text message corresponding to the voice messaging;
    Second training module, for the second training data according to each type, corresponding identification model is trained.
  12. 12. device according to claim 11, it is characterised in that the acquisition module is specifically used for,
    Obtain the 3rd training data;3rd training data includes:At least one voice messaging, and the voice messaging Corresponding text message;
    Feature extraction is carried out to each voice messaging in the 3rd training data, obtains the characteristic information in each voice messaging;
    The characteristic information in each voice messaging is identified using each Type model, determines the type of each voice messaging;
    According to the type of each voice messaging, type mark is carried out to each voice messaging in the 3rd training data, obtained each Second training data of individual type.
  13. 13. a kind of computer equipment, it is characterised in that on a memory and can handled including memory, processor and storage The computer program run on device, during the computing device described program, realize the side as described in any in claim 1-7 Method.
  14. 14. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, it is characterised in that the program The method as described in any in claim 1-7 is realized when being executed by processor.
  15. 15. a kind of computer program product, it is characterised in that when the instruction in the computer program product is by computing device When, perform the method as described in any in claim 1-7.
CN201710884466.6A 2017-09-26 2017-09-26 Voice search method, device and computer equipment Pending CN107704549A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710884466.6A CN107704549A (en) 2017-09-26 2017-09-26 Voice search method, device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710884466.6A CN107704549A (en) 2017-09-26 2017-09-26 Voice search method, device and computer equipment

Publications (1)

Publication Number Publication Date
CN107704549A true CN107704549A (en) 2018-02-16

Family

ID=61174486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710884466.6A Pending CN107704549A (en) 2017-09-26 2017-09-26 Voice search method, device and computer equipment

Country Status (1)

Country Link
CN (1) CN107704549A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109171644A (en) * 2018-06-22 2019-01-11 平安科技(深圳)有限公司 Health control method, device, computer equipment and storage medium based on voice recognition
CN109410946A (en) * 2019-01-11 2019-03-01 百度在线网络技术(北京)有限公司 A kind of method, apparatus of recognition of speech signals, equipment and storage medium
CN111291168A (en) * 2018-12-07 2020-06-16 北大方正集团有限公司 Book retrieval method and device and readable storage medium
CN112998709A (en) * 2021-02-25 2021-06-22 西安交通大学 Depression degree detection method using audio data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923854A (en) * 2010-08-31 2010-12-22 中国科学院计算技术研究所 Interactive speech recognition system and method
CN102262644A (en) * 2010-05-25 2011-11-30 索尼公司 Search Apparatus, Search Method, And Program
CN102280106A (en) * 2010-06-12 2011-12-14 三星电子株式会社 VWS method and apparatus used for mobile communication terminal
CN103310788A (en) * 2013-05-23 2013-09-18 北京云知声信息技术有限公司 Voice information identification method and system
CN104239459A (en) * 2014-09-02 2014-12-24 百度在线网络技术(北京)有限公司 Voice search method, voice search device and voice search system
CN106548773A (en) * 2016-11-04 2017-03-29 百度在线网络技术(北京)有限公司 Child user searching method and device based on artificial intelligence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262644A (en) * 2010-05-25 2011-11-30 索尼公司 Search Apparatus, Search Method, And Program
CN102280106A (en) * 2010-06-12 2011-12-14 三星电子株式会社 VWS method and apparatus used for mobile communication terminal
CN101923854A (en) * 2010-08-31 2010-12-22 中国科学院计算技术研究所 Interactive speech recognition system and method
CN103310788A (en) * 2013-05-23 2013-09-18 北京云知声信息技术有限公司 Voice information identification method and system
CN104239459A (en) * 2014-09-02 2014-12-24 百度在线网络技术(北京)有限公司 Voice search method, voice search device and voice search system
CN106548773A (en) * 2016-11-04 2017-03-29 百度在线网络技术(北京)有限公司 Child user searching method and device based on artificial intelligence

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109171644A (en) * 2018-06-22 2019-01-11 平安科技(深圳)有限公司 Health control method, device, computer equipment and storage medium based on voice recognition
CN111291168A (en) * 2018-12-07 2020-06-16 北大方正集团有限公司 Book retrieval method and device and readable storage medium
CN109410946A (en) * 2019-01-11 2019-03-01 百度在线网络技术(北京)有限公司 A kind of method, apparatus of recognition of speech signals, equipment and storage medium
CN112998709A (en) * 2021-02-25 2021-06-22 西安交通大学 Depression degree detection method using audio data

Similar Documents

Publication Publication Date Title
JP6799574B2 (en) Method and device for determining satisfaction with voice dialogue
CN109887497A (en) Modeling method, device and the equipment of speech recognition
CN108829894B (en) Spoken word recognition and semantic recognition method and device
CN108009293A (en) Video tab generation method, device, computer equipment and storage medium
CN107945792A (en) Method of speech processing and device
CN107678561A (en) Phonetic entry error correction method and device based on artificial intelligence
CN107767870A (en) Adding method, device and the computer equipment of punctuation mark
CN108986793A (en) translation processing method, device and equipment
CN110021308A (en) Voice mood recognition methods, device, computer equipment and storage medium
CN108280061A (en) Text handling method based on ambiguity entity word and device
CN107919130A (en) Method of speech processing and device based on high in the clouds
CN107704549A (en) Voice search method, device and computer equipment
CN108170773A (en) Media event method for digging, device, computer equipment and storage medium
CN110197658A (en) Method of speech processing, device and electronic equipment
CN108170792A (en) Question and answer bootstrap technique, device and computer equipment based on artificial intelligence
CN106653052A (en) Virtual human face animation generation method and device
CN107305541A (en) Speech recognition text segmentation method and device
CN111161739B (en) Speech recognition method and related product
CN110033760A (en) Modeling method, device and the equipment of speech recognition
CN107515862A (en) Voice translation method, device and server
CN109491902A (en) Interactive testing method, apparatus and system
CN108563655A (en) Text based event recognition method and device
CN107610702A (en) Terminal device standby wakeup method, apparatus and computer equipment
CN108182246A (en) Sensitive word detection filter method, device and computer equipment
CN108319720A (en) Man-machine interaction method, device based on artificial intelligence and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination