CN107331400A

CN107331400A - A kind of Application on Voiceprint Recognition performance improvement method, device, terminal and storage medium

Info

Publication number: CN107331400A
Application number: CN201710741564.4A
Authority: CN
Inventors: 高聪
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-08-25
Filing date: 2017-08-25
Publication date: 2017-11-07

Abstract

The invention discloses a kind of Application on Voiceprint Recognition performance improvement method, device, terminal and storage medium, wherein, this method includes：Obtain the voice open command of user's input；Determine whether the voice open command matches with default guiding text；If matching, extracts the corresponding vocal print feature of the voice open command；The vocal print feature of extraction is matched with predetermined sample vocal print feature, if the match is successful, performs and opens operation, wherein the sample vocal print feature is extracted in advance from the semantic voice messaging for the guiding text.Personalized speech of the invention by obtaining user, the personalized speech information extraction according to user obtains the sample vocal print feature of user, and performing follow-up unlatching according to the matching result of the voice open command of user and sample vocal print feature operates.So as to no longer be limited by speech samples amount, fault tolerant mechanism is improved, the accuracy rate and Consumer's Experience of Application on Voiceprint Recognition is improved.

Description

A kind of Application on Voiceprint Recognition performance improvement method, device, terminal and storage medium

Technical field

The present embodiments relate to sound groove recognition technology in e field, more particularly to a kind of Application on Voiceprint Recognition performance improvement method, dress Put, terminal and storage medium.

Background technology

Sound groove recognition technology in e belongs to one kind of biological identification technology, is one and is spoken human physiology and row according to reaction in voice The speech parameter being characterized recognizes the technology of voice words person's identity.Because everyone phonatory organ is in terms of size and form It is not quite similar, therefore vocal print also just turns into a kind of means of identification for differentiating speaker's identity.

With the fast development of speech recognition technology, increasing intelligent electric appliance is increased using sound groove recognition technology in e The Consumer's Experience of strong user, user can lock personal account according to sound groove recognition technology in e, and private category is carried out to personal account Property definition, therefore user can use voice to rapidly enter device systems and obtain personal account information and function.Therefore, vocal print The degree of accuracy of identification is very crucial.

The content of the invention

The embodiments of the invention provide a kind of Application on Voiceprint Recognition performance improvement method, device, terminal and storage medium, Neng Gouzeng Plus speech samples amount, the accuracy of Application on Voiceprint Recognition is improved, strengthens Consumer's Experience.

In a first aspect, the embodiments of the invention provide a kind of Application on Voiceprint Recognition performance improvement method, including：

Obtain the voice open command of user's input；

Determine whether the voice open command matches with default guiding text；

If matching, extracts the corresponding vocal print feature of the voice open command；

The vocal print feature of extraction is matched with predetermined sample vocal print feature, if the match is successful, held Row opens operation, wherein the sample vocal print feature is extracted in advance from the semantic voice messaging for the guiding text.

Second aspect, the embodiments of the invention provide a kind of Application on Voiceprint Recognition performance boost device, including：

Phonetic order acquisition module, the voice open command for obtaining user's input；

Sound identification module, for determining whether the voice open command matches with default guiding text；

Vocal print feature extraction module, for when the voice open command is with default guiding text matches, extracting institute The corresponding vocal print feature of predicate sound open command；

Vocal print feature matching module, for the vocal print feature of extraction and predetermined sample vocal print feature to be carried out Matching, if the match is successful, performs and opens operation, wherein the sample vocal print feature is from the semantic language for the guiding text Extracted in advance in message breath.

The third aspect, the embodiments of the invention provide a kind of terminal, including：

One or more processors；

Memory, for storing one or more programs,

When one or more of programs are by one or more of computing devices so that one or more of processing Device realizes the Application on Voiceprint Recognition performance improvement method described in any embodiment of the present invention.

Fourth aspect, the embodiments of the invention provide a kind of computer-readable recording medium, is stored thereon with computer journey Sequence, realizes the Application on Voiceprint Recognition performance improvement method described in any embodiment of the present invention when the program is executed by processor.

A kind of Application on Voiceprint Recognition performance improvement method provided in an embodiment of the present invention, device, terminal and storage medium, by obtaining Take the personalized of family input and guide voice, the personalized guiding voice according to user extracts the sample vocal print spy for obtaining user Levy, and the corresponding vocal print feature of voice open command is matched according to sample vocal print feature.Due to guiding the content of text Can be by user's sets itself, personalization guiding voice improves fault tolerant mechanism, the degree of accuracy of Application on Voiceprint Recognition is improved, so as to carry The high degree of accuracy of sample vocal print feature, correspondingly, improves the degree of accuracy of follow-up vocal print feature matching, improves user's body Test.

Brief description of the drawings

Fig. 1 is a kind of flow chart for Application on Voiceprint Recognition performance improvement method that the embodiment of the present invention one is provided；

Fig. 2 is a kind of flow chart for Application on Voiceprint Recognition performance improvement method that the embodiment of the present invention two is provided；

Fig. 3 is a kind of structural representation for Application on Voiceprint Recognition performance boost device that the embodiment of the present invention three is provided；

Fig. 4 is a kind of structural representation for terminal that the embodiment of the present invention four is provided.

Embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that, in order to just Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.

Embodiment one

Fig. 1 is a kind of flow chart for Application on Voiceprint Recognition performance improvement method that the embodiment of the present invention one is provided, and the present embodiment can Situation suitable for controlling smart machine by phonetic order, this method can be performed by Application on Voiceprint Recognition performance boost device, The Application on Voiceprint Recognition performance boost device can be realized by the way of software and/or hardware.With reference to Fig. 1, this method specifically can be with Including as follows：

S110, the voice open command for obtaining user's input.

Intelligent terminal can be detected that intelligent terminal in a dormant state detects residing ring in real time to surrounding environment When there is phonetic order in border, the voice open command of user's input is obtained.Intelligent terminal is support interactive voice with multimedia The smart machine of function, such as has the function of supporting in terms of audio, video, data, can be intelligent robot, intelligent sound box Deng.

S120, determine voice open command with it is default guiding text whether match；If matching, continues executing with S130； Otherwise, execution S160 is redirected.

Wherein, guiding text refers to that the voice that validated user is pre-set wakes up the corresponding text of instruction, and voice, which wakes up, to be referred to Make for controlling intelligent terminal to be in a dormant state switched to running status.For example, using intelligent terminal in validated user During such as intelligent terminal first by use, the personalized voice of prompting user's input wakes up and instructed, voice is waken up and referred to Order carries out the guiding text that semantic analysis obtains personalization.

If specifically, voice open command and guiding text matches success, active user is probably the conjunction of intelligent terminal Method user, continues executing with subsequent operation；If voice open command and guiding text matches failure, active user will not be legal User, can directly shield the voice open command.

S130, the corresponding vocal print feature of extraction voice open command.

S140, the vocal print feature of extraction matched with predetermined sample vocal print feature, wherein sample vocal print is special Levy is extracted in advance from the semantic voice messaging for guiding text；If the match is successful, S150 is continued executing with；Otherwise, jump Turn to perform S160.

Wherein, the determination of the sample vocal print feature can include：During voiceprint registration, provide a user in recording Pass passage；Show personalized speech input prompting message；The personalized speech content that user inputs is analyzed, obtains described The sample vocal print feature of user.

It should be noted that during voiceprint registration, the personalized speech content that user inputs is not especially limited, Guiding content of text is not especially limited, it is allowed to which user uses personalized guiding text.Also, to personalized speech Quantity and volume are also not construed as limiting, and user carries out repeatedly guiding voice typing with can not limiting number of times using multiple usual volumes. In the range of certain amount, the personalized speech quantity that user inputs during voiceprint registration is more, and personalized speech is entered The degree of accuracy for the sample vocal print feature that row analysis is determined is higher.Sample vocal print feature is not limited by speech samples amount in the present embodiment System, improves fault tolerant mechanism, so as to improve the degree of accuracy of sample vocal print feature.

Operation is opened in S150, execution.

S160, without any operation.

It is further to note that intelligent terminal can have multiple validated users, different validated users are to that should have sample sound Line feature and guiding text, then the incidence relation being also stored between guiding text and sample vocal print feature in intelligent terminal, or Person is stored with validated user with guiding the mapping relations between text, and validated user and sample vocal print feature.

By taking the opening process of intelligent sound box as an example, the corresponding guiding texts of user A are " intelligent sound box that please start me ", and Extract the sample vocal print feature for obtaining user A.Guiding text corresponding to user B is " intelligent sound box quickly starts ", and is extracted Obtain user B sample vocal print feature.User C does not store any open command, guiding text and sample sound to the intelligent sound box Line feature.During intelligent sound box use, if it is " intelligent sound box that please start me " that user A, which says content to intelligent sound box, Voice open command, now voice open command and guiding text matches success, and " intelligent sound box that please start me " is corresponding The match is successful for current vocal print feature and user A sample vocal print feature, and intelligent sound box starts.

However, when user A says the voice open command that content is " intelligent sound box quickly starts " to intelligent sound box, though Right voice open command and user B guiding text matches success, but current vocal print feature and user B sample vocal print feature It fails to match, and intelligent sound box starts failure.

The technical scheme of the present embodiment, by obtaining the personalized guiding voice that user inputs, the personalization according to user Guiding voice extracts the sample vocal print feature for obtaining user, and according to sample vocal print feature to the corresponding vocal print of voice open command Feature is matched.Due to guiding the content of text can be by user's sets itself, personalization guiding voice improves fault-tolerant machine System, improves the degree of accuracy of Application on Voiceprint Recognition, so as to improve the degree of accuracy of sample vocal print feature, correspondingly, improves follow-up sound The degree of accuracy of line characteristic matching, improves Consumer's Experience.

Embodiment two

There is provided a kind of update method of sample vocal print feature on the basis of above-described embodiment one for the present embodiment.Fig. 2 is The flow chart for a kind of Application on Voiceprint Recognition performance improvement method that the embodiment of the present invention two is provided, as shown in Fig. 2 this method specifically can be with Including following：

S210, when detecting vocal print update event, obtain user input current speech information

Wherein, it is triggered detecting default vocal print more new button, or when detecting the presence of sample vocal print feature Between length be more than default time span threshold value when, generate vocal print update event.

S220, current speech information is identified, extraction obtains current vocal print feature.

S230, according to current vocal print feature and predetermined sample vocal print feature, obtain new sample vocal print feature.

Exemplary, S230 can include：Determine that the current vocal print feature and the predetermined sample vocal print are special Whether identical owning user is levied, if identical, using predetermined coefficient to the current vocal print feature and predetermined Sample vocal print feature is merged, and obtains the new sample vocal print feature.Wherein, coefficient can be experience set in advance Value.

Exemplary, it is determined that whether current vocal print feature and predetermined sample vocal print feature owning user are identical, can With including：It is determined that current similarity between vocal print feature and predetermined sample vocal print feature, presets if similarity is more than Similarity threshold, it is determined that current vocal print feature is identical with predetermined sample vocal print feature owning user.

S240, the voice open command for obtaining user's input.

S250, determine voice open command with it is default guiding text whether match.

If S260, matching, the corresponding vocal print feature of voice open command is extracted.

S270, the vocal print feature of extraction matched with new sample vocal print feature, if the match is successful, performed Open operation.

Exemplary, it is triggered detecting default vocal print more new button, or detect depositing for sample vocal print feature When time span is more than default time span threshold value, the vocal print update event is generated.

Because everyone phonatory organ is not quite similar in terms of size and form, and vary, therefore work as constantly , it is necessary to be carried out more to sample vocal print feature when the existence time length of sample vocal print feature is more than default time span threshold value Newly, vocal print update event is generated, to ensure the accuracy rate of Application on Voiceprint Recognition.

The technical scheme of the present embodiment, when detecting vocal print update event, passes through voice messaging identification and vocal print feature Matching, to judge whether the user profile of active user is consistent with equipment user now, when user is consistent, using advance The coefficient of determination is merged to current vocal print feature and predetermined sample vocal print feature, obtains new sample vocal print special Levy, complete the renewal of sample vocal print feature.Regularly updating for the sample vocal print feature in smart machine is ensured with this, sound is improved The accuracy rate of line identification.

Embodiment three

Fig. 3 is a kind of structural representation for Application on Voiceprint Recognition performance boost device that the embodiment of the present invention three is provided, this implementation Example is applicable to control the situation of smart machine by phonetic order, can perform the vocal print knowledge that any embodiment of the present invention is provided The method of other performance boost.With reference to Fig. 3, the concrete structure of the device is as follows：

Phonetic order acquisition module 310, the voice open command for obtaining user's input；

Sound identification module 320, for determining whether voice open command matches with default guiding text；

Vocal print feature extraction module 330, for when voice open command is with default guiding text matches, extracting voice The corresponding vocal print feature of open command；

Vocal print feature matching module 340, for the vocal print feature of extraction and predetermined sample vocal print feature to be carried out Matching, if the match is successful, performs and opens operation, and wherein sample vocal print feature is from the semantic voice messaging for guiding text Extract in advance.

Further, the device includes sample vocal print feature determining module 350, specifically for：

During voiceprint registration, recording uploading channel is provided a user；

Show personalized speech input prompting message；

The personalized speech content that user inputs is analyzed, the sample vocal print feature of user is obtained.

Further, the device also includes sample vocal print update module 360, specifically for：

When detecting vocal print update event, the current speech information of user's input is obtained；

Current speech information is identified, extraction obtains current vocal print feature；

According to current vocal print feature and predetermined sample vocal print feature, new sample vocal print feature is obtained.

On the basis of such scheme, sample vocal print update module 360, specifically for：

It is determined that whether current vocal print feature and predetermined sample vocal print feature owning user are identical, if identical, adopt Current vocal print feature and predetermined sample vocal print feature are merged with predetermined coefficient, new sample sound is obtained Line feature.

Preferably, it is determined that current similarity between vocal print feature and predetermined sample vocal print feature, if similar Degree is more than default similarity threshold, it is determined that current vocal print feature and the predetermined sample vocal print feature owning user It is identical.

Further, the device also includes vocal print update event generation module 370, specifically for：

It is triggered detecting default vocal print more new button, or detects the existence time length of sample vocal print feature During more than default time span threshold value, vocal print update event is generated.

The technical scheme of the present embodiment, by the mutual cooperation between modules, realize speech recognition, voice print matching, The functions such as user's identification, the determination of sample vocal print and the renewal of sample vocal print, have reached lifting fault tolerant mechanism, have improved Application on Voiceprint Recognition The effect of accuracy rate and Consumer's Experience.

Example IV

Fig. 4 is a kind of structural representation for terminal that the embodiment of the present invention four is provided, and Fig. 4 is shown suitable for being used for realizing this The block diagram of the exemplary terminal of invention embodiment.The terminal that Fig. 4 is shown/and it is only an example, should not be to present invention implementation Example function and using range band any limitation.

The terminal 12 that Fig. 4 is shown is only an example, should not be come to the function of the embodiment of the present invention and using range band Any limitation.

As shown in figure 4, terminal 12 is showed in the form of universal computing device.The component of terminal 12 can include but not limit In：One or more processor or processing unit 16, system storage 28, connection different system component (including system is deposited Reservoir 28 and processing unit 16) bus 18.

Bus 18 represents the one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.Lift For example, these architectures include but is not limited to industry standard architecture (ISA) bus, MCA (MAC) Bus, enhanced isa bus, VESA's (VESA) local bus and periphery component interconnection (PCI) bus.

Terminal 12 typically comprises various computing systems computer-readable recording medium.These media can be it is any can be by terminal 12 The usable medium of access, including volatibility and non-volatile media, moveable and immovable medium.

System storage 28 can include the computer system readable media of form of volatile memory, such as arbitrary access Memory (RAM) 30 and/or cache memory 32.Terminal 12 may further include it is other it is removable/nonremovable, Volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for read-write it is irremovable , non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").Although not shown in Fig. 4, use can be provided In the disc driver to may move non-volatile magnetic disk (such as " floppy disk ") read-write, and to may move anonvolatile optical disk The CD drive of (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver can To be connected by one or more data media interfaces with bus 18.Memory 28 can include at least one program product, The program product has one group of (for example, at least one) program module, and these program modules are configured to perform each implementation of the invention The function of example.

Program/utility 40 with one group of (at least one) program module 42, can be stored in such as memory 28 In, such program module 42 include but is not limited to operating system, one or more application program, other program modules and The realization of network environment is potentially included in each or certain combination in routine data, these examples.Program module 42 is usual Perform the function and/or method in embodiment described in the invention.

Terminal 12 can also communicate with one or more external equipments 14 (such as keyboard, sensing equipment, display 24), Can also enable a user to the equipment communication interacted with the terminal 12 with one or more, and/or with enable the terminal 12 with Any equipment (such as network interface card, modem etc.) communication that one or more of the other computing device is communicated.It is this logical Letter can be carried out by input/output (I/O) interface 22.Also, terminal 12 can also by network adapter 20 and one or The multiple networks of person (such as LAN (LAN), wide area network (WAN) and/or public network, such as internet) communicate.As illustrated, Network adapter 20 is communicated by bus 18 with other modules of terminal 12.It should be understood that although not shown in the drawings, can combine Terminal 12 uses other hardware and/or software module, includes but is not limited to：Microcode, device driver, redundant processing unit, External disk drive array, RAID system, tape drive and data backup storage system etc..

Processing unit 16 is stored in program in system storage 28 by operation, thus perform various function application and Data processing, for example, realize the Application on Voiceprint Recognition performance improvement method that the embodiment of the present invention is provided.

Embodiment five

The embodiment of the present invention five also provides a kind of computer-readable recording medium, be stored thereon with computer program (or For computer executable instructions), it is used to perform a kind of Application on Voiceprint Recognition performance improvement method, the party when program is executed by processor Method includes：

Obtain the voice open command of user's input；

Determine whether voice open command matches with default guiding text；

If matching, the corresponding vocal print feature of voice open command is extracted；

The vocal print feature of extraction is matched with predetermined sample vocal print feature, if the match is successful, execution is opened Operation is opened, wherein sample vocal print feature is extracted in advance from the semantic voice messaging for guiding text.

The computer-readable storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any combination above.The more specifically example (non exhaustive list) of computer-readable recording medium includes：Tool There are the electrical connections of one or more wires, portable computer diskette, hard disk, random access memory (RAM), read-only storage (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be it is any include or storage program tangible medium, the program can be commanded execution system, device or device Using or it is in connection.

Computer-readable signal media can be included in a base band or as the data-signal of carrier wave part propagation, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limit In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for Used by instruction execution system, device or device or program in connection.

The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but do not limit In wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.

It can be write with one or more programming languages or its combination for performing the computer that the present invention is operated Program code, described program design language includes object oriented program language-such as Java, Smalltalk, C++, Also include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with Fully perform, partly perform on the user computer on the user computer, as independent software kit execution, a portion Divide part execution or the execution completely on remote computer or server on the remote computer on the user computer. Be related in the situation of remote computer, remote computer can be by the network of any kind --- including LAN (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (is for example carried using Internet service Come for business by Internet connection).

Note, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art it is various it is obvious change, Readjust and substitute without departing from protection scope of the present invention.Therefore, although the present invention is carried out by above example It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.

Claims

1. a kind of Application on Voiceprint Recognition performance improvement method, it is characterised in that including：

Obtain the voice open command of user's input；

Determine whether the voice open command matches with default guiding text；

The vocal print feature of extraction is matched with predetermined sample vocal print feature, if the match is successful, execution is opened Operation is opened, wherein the sample vocal print feature is extracted in advance from the semantic voice messaging for the guiding text.

2. according to the method described in claim 1, it is characterised in that the determination of the sample vocal print feature includes：

Show personalized speech input prompting message；

The personalized speech content that user inputs is analyzed, the sample vocal print feature of the user is obtained.

3. according to the method described in claim 1, it is characterised in that also include：

The current speech information is identified, extraction obtains current vocal print feature；

According to the current vocal print feature and the predetermined sample vocal print feature, new sample vocal print feature is obtained.

4. method according to claim 3, it is characterised in that according to the current vocal print feature and described predetermined Sample vocal print feature, obtains the new sample vocal print feature, including：

Determine whether the current vocal print feature and the predetermined sample vocal print feature owning user are identical, if identical, Then the current vocal print feature and predetermined sample vocal print feature are merged using predetermined coefficient, institute is obtained State new sample vocal print feature.

5. method according to claim 4, it is characterised in that determine the current vocal print feature and described predetermined Whether sample vocal print feature owning user is identical, including：

The similarity between the current vocal print feature and the predetermined sample vocal print feature is determined, if similarity is more than Default similarity threshold, it is determined that the current vocal print feature and the predetermined sample vocal print feature owning user phase Together.

6. method according to claim 3, it is characterised in that

It is triggered detecting default vocal print more new button, or detects the existence time length of sample vocal print feature and is more than During default time span threshold value, the vocal print update event is generated.

7. a kind of Application on Voiceprint Recognition performance boost device, it is characterised in that including：

Vocal print feature extraction module, for when the voice open command is with default guiding text matches, extracting institute's predicate The corresponding vocal print feature of sound open command；

A vocal print feature matching module, for the vocal print feature of extraction and predetermined sample vocal print feature to be carried out Match somebody with somebody, if the match is successful, perform and open operation, wherein the sample vocal print feature is from the semantic voice for the guiding text Extracted in advance in information.

8. device according to claim 7, it is characterised in that including sample vocal print feature determining module, the sample sound Line characteristic determination module specifically for：

Show personalized speech input prompting message；

9. device according to claim 7, it is characterised in that also include：Sample vocal print update module, the sample vocal print Update module specifically for：

10. device according to claim 9, it is characterised in that the sample vocal print update module specifically for：

11. device according to claim 10, it is characterised in that the sample vocal print update module specifically for：

12. device according to claim 9, it is characterised in that also including vocal print update event generation module, the vocal print Update event generation module specifically for：

13. a kind of terminal, it is characterised in that including：

One or more processors；

Memory, for storing one or more programs,

When one or more of programs are by one or more of computing devices so that one or more of processors are real The existing Application on Voiceprint Recognition performance improvement method as any one of claim 1 to 6.

14. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The Application on Voiceprint Recognition performance improvement method as any one of claim 1 to 6 is realized during execution.