CN111653284B - Interaction and identification method, device, terminal equipment and computer storage medium - Google Patents

Interaction and identification method, device, terminal equipment and computer storage medium Download PDF

Info

Publication number
CN111653284B
CN111653284B CN201910119857.8A CN201910119857A CN111653284B CN 111653284 B CN111653284 B CN 111653284B CN 201910119857 A CN201910119857 A CN 201910119857A CN 111653284 B CN111653284 B CN 111653284B
Authority
CN
China
Prior art keywords
voiceprint
sound
intelligent
users
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910119857.8A
Other languages
Chinese (zh)
Other versions
CN111653284A (en
Inventor
张平
肖兵兵
邢冬杰
秦京
孙尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910119857.8A priority Critical patent/CN111653284B/en
Publication of CN111653284A publication Critical patent/CN111653284A/en
Application granted granted Critical
Publication of CN111653284B publication Critical patent/CN111653284B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the application provides an interaction and identification method, a device, terminal equipment and a computer storage medium, wherein the interaction method comprises the following steps: acquiring sound acquired by the intelligent equipment, and determining a voiceprint library associated with the intelligent equipment, wherein a voiceprint template corresponding to a user of the intelligent equipment is stored in the voiceprint library; and identifying the voiceprints of the collected sounds according to the voiceprint templates stored in the voiceprint library so as to determine the user corresponding to the collected sounds from the users of the intelligent equipment, so that the intelligent terminal provides service content corresponding to the user. According to the scheme provided by the embodiment of the application, the voiceprint templates corresponding to the users of the intelligent equipment are stored through the voiceprint library associated with the intelligent equipment, so that the number of the voiceprint templates for identification is reduced, the identification efficiency is improved, the users corresponding to the collected sound are accurately determined from the users of the intelligent equipment, and the situation of identification errors is avoided.

Description

Interaction and identification method, device, terminal equipment and computer storage medium
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to an interaction and identification method, an interaction and identification device, terminal equipment and a computer storage medium.
Background
In recent years, with the development of intelligent technology, smart home has become more and more popular in people's life. Meanwhile, as the most direct method for interaction with people is interaction through voice, the sound box becomes a main cut-in point of the modern intelligent home.
When a user uses the sound box, most of the users register the voice print template corresponding to the user through the sound box, and the sound box can identify the user corresponding to the voice through the voice print template in the subsequent interaction process so as to provide customized service for the user. In general, when a voiceprint template is registered by a loudspeaker box, the loudspeaker box stores the voiceprint template in a server corresponding to the loudspeaker box, so that the server has a large database storing all voiceprint templates of all loudspeaker boxes.
However, when a user corresponding to the sound is identified, the sound collected by the sound box needs to be uploaded to the server and matched with all the voiceprint templates in the large database one by one, and the matching efficiency is poor due to the fact that more voiceprint templates are stored in the database.
In addition, since all the voiceprint templates are stored in one large database of the server, when the loudspeaker boxes are used, there is a situation that recognition errors are caused by the fact that the using distances of the loudspeaker boxes are relatively close. For example, voiceprint templates of the two users a and b are stored in a large database, and when the voice boxes of the two users are in short distance, the user identified by the voice box of the user a is the user b, and the user identified by the voice box of the user b is the user a, namely, an identification error occurs, so that a customized service error provided to the user through the voice box is caused, and the experience of the user is poor.
Disclosure of Invention
In view of the above, embodiments of the present application provide an interaction and recognition method, apparatus, terminal device, and computer storage medium, so as to solve any one of the above problems.
According to a first aspect of an embodiment of the present application, there is provided an interaction method, including: acquiring sound acquired by intelligent equipment, and determining a voiceprint library associated with the intelligent equipment, wherein the voiceprint library is used for storing voiceprint templates corresponding to users of the intelligent equipment; and identifying the voiceprint of the collected sound according to the voiceprint template stored in the voiceprint library so as to determine the user corresponding to the collected sound from the users of the intelligent equipment, so that the intelligent terminal provides service content corresponding to the user.
According to a second aspect of an embodiment of the present application, there is provided an identification method, including: acquiring sound acquired by intelligent equipment, and determining a voiceprint library associated with the intelligent equipment, wherein the voiceprint library is used for storing voiceprint templates corresponding to users of the intelligent equipment; and identifying the voiceprint of the collected sound according to the voiceprint template stored in the voiceprint library so as to determine the user corresponding to the collected sound from the users of the intelligent equipment.
According to a third aspect of an embodiment of the present application, there is provided an interaction device, comprising: the intelligent device comprises an acquisition module, a voice recognition module and a voice recognition module, wherein the acquisition module is used for acquiring voice acquired by the intelligent device and determining a voice print library associated with the intelligent device, and the voice print library is used for storing voice print templates corresponding to users of the intelligent device; and the interaction module is used for identifying the voiceprint of the collected sound according to the voiceprint template stored in the voiceprint library so as to determine the user corresponding to the collected sound from the users of the intelligent equipment, so that the intelligent terminal provides service content corresponding to the user.
According to a fourth aspect of an embodiment of the present application, there is provided an identification device, including: the intelligent device comprises an acquisition module, a voice recognition module and a voice recognition module, wherein the acquisition module is used for acquiring voice acquired by the intelligent device and determining a voice print library associated with the intelligent device, and the voice print library is used for storing voice print templates corresponding to users of the intelligent device; and the identification module is used for identifying the voiceprint of the collected sound according to the voiceprint template stored in the voiceprint library so as to determine the user corresponding to the collected sound from the users of the intelligent equipment.
According to a fifth aspect of an embodiment of the present application, there is provided a terminal device including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to the interaction method as described above or operations corresponding to the identification method as described above.
According to a sixth aspect of embodiments of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements an interaction method as described above or an identification method as described above.
According to the scheme provided by the embodiment of the application, the voiceprint templates corresponding to the users of the intelligent equipment are stored through the voiceprint library associated with the intelligent equipment, so that the number of the voiceprint templates for recognition is reduced, the recognition efficiency is improved, the users corresponding to the collected sounds are accurately determined from the users of the intelligent equipment, and the situation of recognition errors is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a flow chart of steps of an interaction method according to a first embodiment of the present application;
FIG. 2 is a flow chart of the steps of an interaction method according to a second embodiment of the present application;
FIG. 3 is a flowchart illustrating a step of registering a voiceprint template according to a third embodiment of the present application;
fig. 4 is a schematic step flow diagram of an interaction method according to a third embodiment of the present application;
FIG. 5 is a flowchart illustrating steps of an identification method according to a fourth embodiment of the present application;
FIG. 6 is a block diagram of an interactive device according to a fifth embodiment of the present application;
fig. 7 is a block diagram of an identification device according to a sixth embodiment of the present application;
fig. 8 is a schematic structural diagram of a terminal device according to a seventh embodiment of the present application.
Detailed Description
In order to better understand the technical solutions in the embodiments of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the present application, shall fall within the scope of protection of the embodiments of the present application.
The implementation of the embodiments of the present application will be further described below with reference to the accompanying drawings.
Example 1
Referring to fig. 1, a flowchart of steps of an interaction method according to a first embodiment of the present application is shown.
The method comprises the following steps:
s102, acquiring sound collected by the intelligent equipment and determining a voiceprint library associated with the intelligent equipment.
In this embodiment, the smart device may specifically be any smart device capable of collecting sound, for example, a smart television, a mobile phone, a smart car, a home device, a washing machine through sound control, a refrigerator, a camera, a bulb, a vehicle-mounted sound control navigator, a sound box through sound control, and the like, which is not limited in this embodiment.
The following description will be given by taking an intelligent device as a sound box for illustration, but the application is not limited thereto, and those skilled in the art can directly determine the scheme of applying the technical scheme of the application to other intelligent devices according to the following description for the sound box, which is also within the protection scope of the application.
In this embodiment, the sound box may be an intelligent sound box, which further has the functions of connecting wifi, acquiring and playing songs, providing weather inquiry, asking and answering, playing news, and the like through voice interaction with the user.
In a specific use scenario, the collected sound may be a sound collected by the sound box during interaction of the sound box with the user. A plurality of users can be corresponding to a sound box, when a certain user interacts with the sound box, the user inputs sound to the sound box, and after the sound box collects the sound, the user corresponding to the collected sound can be identified through the scheme, and service content corresponding to the user is provided.
In this embodiment, the voiceprint template is a voiceprint template recorded when the user registers before using the sound box, that is, the voiceprint template corresponds to the user one by one, and can be used as an identity of the user.
The voiceprint templates are stored after being input so as to form a voiceprint library, and when the voiceprint templates are stored through the voiceprint library, the voiceprint templates matched with the same sound box can be stored in one voiceprint library, so that the voiceprint library associated with the sound box, namely, the voiceprint templates corresponding to users for storing the sound box in the voiceprint library, is formed.
For example, when registering a voiceprint template, the voiceprint template may be recorded by a speaker, at which time the voiceprint templates recorded by the same speaker may be stored in the same voiceprint library to form a voiceprint library associated with the speaker.
In addition, in actual life, the number of the sound boxes is multiple, and in this case, if the number of the voiceprint libraries is also multiple, for a certain sound box, before identifying the user corresponding to the collected sound, it is necessary to determine the voiceprint library associated with the sound box, so that the voiceprint template corresponding to the collected sound can be accurately determined in the voiceprint library through step S104.
In addition, the order of "acquiring sound collected by the smart device" and "determining the voiceprint library associated with the smart device" in step S102 is not limited, as long as it can be executed before step S104.
S104, identifying voiceprints of the collected sounds according to the voiceprint templates stored in the voiceprint library, so as to determine a user corresponding to the collected sounds from users of the intelligent equipment, and enabling the intelligent terminal to provide service content corresponding to the user.
In this embodiment, one voiceprint library is associated with each speaker, however, one voiceprint library may be shared by multiple speakers. After the voiceprint library associated with the sound box is determined, the voiceprint library can be identified in a voiceprint template matched with the sound box, so that a user corresponding to the collected sound is determined. After determining the user corresponding to the collected sound, the service content corresponding to the user can be provided according to the semantic content of the collected sound and the user corresponding to the identified sound, for example, feedback voice corresponding to the user is played to the user, so that interaction is completed.
In addition, because only the voiceprint templates corresponding to the users of the sound boxes are stored in the voiceprint library, the number of the voiceprint templates stored in the voiceprint library associated with the sound boxes provided by the embodiment is far smaller than that of the voiceprint templates stored in the large database in the prior art, so that the recognition speed of collected sounds is improved.
Moreover, according to the above-mentioned knowledge, each audio amplifier has an associated voiceprint library, only stores the voiceprint template that matches this audio amplifier in the voiceprint library, and does not store the voiceprint template that is irrelevant to this audio amplifier, therefore, when discernment, can only discern in the user that associates with audio amplifier to confirm the user accurately, in order to avoid the condition that because all voiceprint templates all store in a big database of server in the prior art, and the use distance of a plurality of audio amplifiers is nearer, the misidentification that leads to.
According to the scheme provided by the embodiment, through acquiring the sound collected by the intelligent equipment and determining the voiceprint library associated with the intelligent equipment, the voiceprint library is used for storing the voiceprint template corresponding to the user of the intelligent equipment; and identifying the voiceprint of the collected sound according to the voiceprint template stored in the voiceprint library so as to determine the user corresponding to the collected sound from the users of the intelligent equipment, so that the intelligent terminal provides service content corresponding to the user. According to the scheme provided by the embodiment, the voiceprint templates corresponding to the users of the intelligent equipment are stored through the voiceprint library associated with the intelligent equipment, so that the number of the voiceprint templates for identification is reduced, the identification efficiency is improved, the users corresponding to the collected sounds are accurately determined from the users of the intelligent equipment, and the situation of identification errors is avoided.
Example two
Referring to fig. 2, a flowchart of steps of an interaction method according to a second embodiment of the present application is shown.
The method comprises the following steps:
s202, collecting sound through the sound box.
In this embodiment, the sound box may collect sound through a microphone device disposed thereon and a sound collection algorithm disposed therein.
In this embodiment, the types of microphone devices disposed on different models of speakers may be different, for example, the type of microphone device of the speaker box product X1 is six microphones, the type of microphone device of the speaker box product C1 is two microphones, and the type of microphone device of the speaker box product M1 is three microphones.
Further, the plurality of sound boxes may share one server, and different sound boxes may also correspond to different servers, which is not limited in this embodiment. The server may store sound collection algorithms corresponding to the various types of microphone devices, respectively. Before the loudspeaker box is used, a corresponding sound collection algorithm can be deployed for the corresponding loudspeaker box through a corresponding server of the loudspeaker box according to the type of the microphone device.
In addition, in this embodiment, according to the scene when the intelligent device collects sound, the sound collection threshold value when the sound is collected is adjusted, so as to improve the accuracy of the collected sound, make the voiceprint determined according to the collected sound more accurate, and then when the voiceprint template matched with the voiceprint of the collected sound is determined in the voiceprint library through the subsequent step S208, hit more easily, and the recognition result is more accurate.
Specifically, adjusting the sound collection threshold when collecting sound may be, for example: four persons including 'father', 'mother', 'Bao 1', 'Bao 2' are included in a family, wherein the sound box product M1 can be arranged in a bedroom where 'father', 'mother' is located, the user collected by the sound box is determined to be adult male or female according to the collection scene of the sound box, the sound is clear and the sound volume is large, and the sound collection threshold corresponding to the sound volume can be improved; or, the sound box product X1 may be set in the bedroom where "bao 1" and "bao 2" are located, if it is determined that the user collected by the sound box is a child according to the collection scene of the sound box, the sound collection threshold corresponding to the sound volume may be reduced if the sound volume is small and the speaking is unclear.
S204, adding the equipment identifier of the loudspeaker box to the sound collected by the loudspeaker box.
In this embodiment, the device identifier of the sound box is used to identify the device type of the sound box, for example, as shown in fig. 3, where three sound box products X1, C1, and M1 are included, and then the three sound box products may correspond to three different device identifiers. Specifically, the product number of the sound box can be directly used as the equipment identifier of the sound box.
In this embodiment, the accuracy of the voiceprint library associated with the sound box can be verified through the added device identifier, and the added device identifier can also be used for determining other sound boxes sharing the voiceprint library with the sound box; if the voice print library is not stored in the local memory of the sound box, but is shared with other sound boxes, the voice print library associated with the sound box can be determined through the added equipment identifier; if the voiceprint library is stored in the server, in step S206, the voiceprint library associated with the sound box may be determined through the added device identifier.
S206, determining a voiceprint library associated with the sound box through the equipment identifier.
In one implementation manner of this embodiment, when the voiceprint library is stored in advance, the voiceprint library associated with the sound box may be stored in the local memory of the sound box, so that the voiceprint library corresponding to the sound box is isolated from the voiceprint templates except for the voiceprint templates matched with the sound box by the sound box. When executing step S206, the voiceprint library may be directly read from the local memory of the sound box, and whether the read voiceprint library is accurate or not is determined through the device identifier, so as to determine the voiceprint library associated with the sound box through the device identifier. Of course, the voiceprint library may be read directly from the local memory, but is not verified by the device identifier, which is not limited in this embodiment. The voiceprint library is stored in the local memory of the loudspeaker box associated with the voiceprint library, so that the mode of determining the voiceprint library is more convenient, the possibility of voiceprint leakage of a user can be reduced, and the privacy of the user is protected; meanwhile, compared with a large database formed by storing the voiceprint library in the server in the prior art, the voiceprint library is stored in the local memory of the loudspeaker box, and the management service of managing the redundant information of the large database, which is provided by the server, is not needed, so that the resources of the server are saved.
In another implementation manner of this embodiment, the voiceprint library associated with the sound box may be stored in the server, and when step S206 is performed, the voiceprint library may be determined from the server through the device identifier. For example, when a user uses a sound box, account information created by the user can be stored in a server, and when the sound box associated voiceprint library is determined, the sound box associated voiceprint library can be determined according to the account information of the user, whether the read sound box associated voiceprint library is accurate or not is determined through equipment identification, so that the sound box associated voiceprint library is determined through the equipment identification. If the same account number corresponds to a plurality of sound boxes, the sound boxes can be numbered in advance, and the sound box numbers under the account number information and the corresponding relation data of the sound box numbers and the sound box library are stored in the server, so that the sound box-associated sound box library is determined according to the account number information and the sound box numbers.
In addition, it should be noted that, the voiceprint owned by the user is unique, and because the sound collecting devices of the sound boxes of different types are different, the voiceprint collected by the different sound collecting devices aiming at the same sound is different, so that the voiceprint templates of the same user in the sound boxes of different types are different, namely, the voiceprint templates corresponding to the same user are different in a plurality of voiceprint libraries corresponding to the intelligent devices of the same user but different types. For example, as shown in fig. 3, there are three sound box products X1, C1, M1 in a family, and the family member includes four people "dad", "mom", "baby 1", "baby 2", and then the four people are all users of the three sound box products, and in the voiceprint library associated with the three sound box products, the voiceprint templates of "dad" of the user are different.
Further, in this embodiment, a plurality of sound boxes with the same user and the same model may share one voiceprint library. When the voice box is specifically used, if the voice boxes are replaced by users, if the models of the voice boxes before and after replacement are the same, the voice print library of the voice box before replacement can be directly used as the voice print library of the voice box after replacement, so that the users can normally use the voice boxes without registering again, and the user experience is improved.
S208, invoking a computing resource of a server corresponding to the sound box, and determining the voiceprint template matched with the voiceprint of the collected sound in the voiceprint library through the computing resource so as to determine a user corresponding to the collected sound from users matched with the sound box.
In this embodiment, the computing resource of the server is invoked to perform recognition, so that the lower limit of the computing resource required during the manufacturing of the sound box can be reduced, and the manufacturing cost of the sound box is further reduced.
In addition, if the voiceprint is stored in the local memory of the associated speaker, S208 further includes, before execution: encrypting the voiceprint library stored in the local storage of the loudspeaker box through the secret key corresponding to the loudspeaker box; uploading the collected sound and the encrypted voiceprint library to the server to call the computing resource of the server, so as to determine the voiceprint template matched with the voiceprint of the collected sound in the voiceprint library through the computing resource. The possibility of leakage of the voiceprint template of the user can be further reduced by encrypting the secret key corresponding to the sound box and uploading the secret key to the server.
Further, in this embodiment, before the collected sound is uploaded to the server, the collected sound may be processed through the sound box, and voiceprint information corresponding to the collected sound is determined and uploaded to the server, so as to reduce the amount of information uploaded, and improve the uploading speed.
According to the scheme provided by the embodiment, the voiceprint templates corresponding to the users of the sound boxes are stored through the voiceprints associated with the sound boxes, so that the number of the voiceprint templates for recognition is reduced, the recognition efficiency is improved, the users corresponding to the collected sounds are accurately determined from the users of the sound boxes, and the situation of recognition errors is avoided; the sound collection threshold value is adjusted according to the scene when the sound is collected by the sound box, so that the accuracy of the collected sound is improved; by calling the computing resources of the server, the manufacturing cost of the loudspeaker box is reduced.
Example III
The embodiment of the application is combined with fig. 3 and fig. 4 to describe a specific use scenario of the interaction scheme provided by the embodiment of the application. In this embodiment, the usage scenario is specifically a usage scenario including a plurality of speakers in one family.
First, referring to fig. 3, a flowchart illustrating a step of registering a voiceprint template according to a third embodiment of the present application is shown.
As shown in fig. 3, the left side of fig. 3 shows that the people in one home include four people "dad", "mom", "bao 1", "bao 2", which are all users.
When registering, each person inputs voice in each sound box, the microphone devices of the sound boxes collect the input voice, the family comprises a plurality of sound boxes, each sound box corresponds to different microphone devices, for example, as shown in fig. 3, a sound box product X1 corresponds to a six-microphone type microphone, a sound box product C1 corresponds to a two-microphone type microphone, and a sound box product M1 corresponds to a three-microphone type microphone. After the microphone device collects the recorded voice, the voice is transmitted to the corresponding sound box, the sound box processes the recorded voice to determine a voiceprint template corresponding to the user, and the determined voiceprint template is stored in a voiceprint library in a local memory of the sound box, so that the registration of the voiceprint template of the user is completed; voiceprint templates corresponding to four persons, "father", "mother", "Bao 1" and "Bao 2" are stored in the voiceprint library of each sound box. And after the voiceprint template is stored in the voiceprint library, the registration process is completed.
Next, referring to fig. 4, a schematic flow chart of steps of an interaction provided in a third embodiment of the present application is shown.
As shown in fig. 4, in actual use, one or more of the sound boxes collect sound, specifically, sound box product M1 collects sound through a two-microphone type microphone device, and sound box product X1 collects sound through a six-microphone type microphone product.
After the sound is collected, the sound box product M1 identifies in a local voiceprint library of the associated M1 sound box by calling the computing resource of the server, and determines that a voiceprint template matched with the voiceprint of the collected sound is a voiceprint template corresponding to 'Bao 1'; the loudspeaker box product X1 is identified in the local voiceprint library of the associated X1 loudspeaker box by calling the computing resource of the server, and any voiceprint template matched with the voiceprint of the collected sound is not determined. Then, the user of the collected sound may be determined to be "treasured 1".
After determining that the user corresponding to the collected sound is "Bao 1", the service content corresponding to the user "Bao 1" can be provided through the sound box, for example, the user "Bao 1" requires to play songs, and then the user can play the baby songs for "Bao 1" through the sound box.
The interaction methods of the first to third embodiments may be performed by any suitable intelligent terminal device having data processing capabilities, including but not limited to: mobile terminals (e.g., tablet computers, cell phones, etc.) and PCs.
Example IV
Referring to fig. 5, a flowchart of steps of an identification method according to a fourth embodiment of the present application is shown.
The method comprises the following steps:
s302, acquiring sound acquired by the intelligent equipment, and determining a voiceprint library associated with the intelligent equipment, wherein the voiceprint library is used for storing voiceprint templates corresponding to users of the intelligent equipment.
S304, identifying voiceprints of the collected sounds according to the voiceprint templates stored in the voiceprint library, so as to determine a user corresponding to the collected sounds from users of the intelligent equipment.
The specific implementation manner of this embodiment is similar to that of the first and second embodiments, and the description of this embodiment is omitted here.
In addition, the identification scheme provided in this embodiment may be used in the foregoing embodiment, so that the intelligent terminal provides service content corresponding to the user, and may also be used in other aspects, for example, verifying whether the identity of the user is accurate through voiceprint, thereby providing corresponding rights, which is not limited in this embodiment.
According to the scheme provided by the embodiment, through acquiring the sound collected by the intelligent equipment and determining the voiceprint library associated with the intelligent equipment, the voiceprint library is used for storing the voiceprint template corresponding to the user of the intelligent equipment; and identifying the voiceprint of the collected sound according to the voiceprint template stored in the voiceprint library so as to determine the user corresponding to the collected sound from the users of the intelligent equipment. According to the scheme provided by the embodiment, the voiceprint templates corresponding to the users of the intelligent equipment are stored through the voiceprint library associated with the intelligent equipment, so that the number of the voiceprint templates for identification is reduced, the identification efficiency is improved, the users corresponding to the collected sounds are accurately determined from the users of the intelligent equipment, and the situation of identification errors is avoided.
Example five
Referring to fig. 6, a block diagram of an interactive apparatus according to a fifth embodiment of the present application is shown.
The device provided by the embodiment comprises: an acquisition module 402 and an interaction module 404.
The acquiring module 402 is configured to acquire sound acquired by an intelligent device, and determine a voiceprint library associated with the intelligent device, where the voiceprint library is used to store a voiceprint template corresponding to a user of the intelligent device.
The identifying module 404 is configured to identify, according to the voiceprint templates stored in the voiceprint library, voiceprints of the collected sounds, so as to determine a user corresponding to the collected sounds from users of the intelligent device, so that the intelligent terminal provides service content corresponding to the user.
In an alternative embodiment, a plurality of sound boxes with the same user and the same model share one voiceprint library.
In an optional implementation manner, in the plurality of voiceprint libraries corresponding to the same-user but different-model sound boxes, voiceprint templates corresponding to the same user are different.
In an alternative embodiment, a voiceprint library associated with the smart device is stored in a local memory of the smart device.
In an alternative embodiment, the apparatus further comprises: and the device identification adding module is used for adding the device identification of the intelligent device to the sound collected by the intelligent device so as to determine the voiceprint library associated with the intelligent device through the device identification.
In an alternative embodiment, the apparatus further comprises: and the adjusting module is used for adjusting the sound collection threshold value when the sound is collected according to the scene when the intelligent equipment collects the sound.
According to the interaction scheme provided by the embodiment, the voiceprint templates corresponding to the users of the intelligent equipment are stored through the voiceprint library associated with the intelligent equipment, so that the number of the voiceprint templates for identification is reduced, the identification efficiency is improved, the users corresponding to the collected sounds are accurately determined from the users of the intelligent equipment, and the situation of identification errors is avoided.
Example six
Referring to fig. 7, there is shown a block diagram of an identification device according to a sixth embodiment of the present application.
The device provided by the embodiment comprises: an acquisition module 502 and an identification module 504.
The obtaining module 502 is configured to obtain a sound collected by an intelligent device, and determine a voiceprint library associated with the intelligent device, where the voiceprint library is used to store a voiceprint template corresponding to a user of the intelligent device.
And the identifying module 504 is configured to identify, according to the voiceprint templates stored in the voiceprint library, voiceprints of the collected sounds, so as to determine, from users of the intelligent device, users corresponding to the collected sounds.
According to the recognition scheme provided by the embodiment, the voiceprint templates corresponding to the users of the intelligent equipment are stored through the voiceprint library associated with the intelligent equipment, so that the number of the voiceprint templates for recognition is reduced, the recognition efficiency is improved, the users corresponding to the collected sounds are accurately determined from the users of the intelligent equipment, and the situation of recognition errors is avoided.
Example seven
A terminal device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to the interaction method as described above.
In addition, the executable instructions may also cause the processor to perform operations corresponding to the interaction method described above or operations corresponding to the identification method described above.
Specifically, referring to fig. 8, a schematic structural diagram of a terminal device according to a seventh embodiment of the present application is shown, and the specific embodiment of the present application does not limit the specific implementation of the terminal device.
As shown in fig. 8, the terminal device may include: a processor 602, a communication interface (Communications Interface), a memory 606, and a communication bus 608.
Wherein:
processor 602, communication interface 604, and memory 606 perform communication with each other via communication bus 608.
Communication interface 604 for communicating with other terminal devices or servers.
The processor 602 is configured to execute the program 610, and may specifically perform relevant steps in the foregoing embodiments.
In particular, program 610 may include program code including computer-operating instructions.
The processor 602 may be a central processing unit CPU or a specific integrated circuit ASIC (Application Specific Integrated Circuit) or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors comprised by the terminal device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.
A memory 606 for storing a program 610. The memory 606 may comprise high-speed RAM memory or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 610 may be specifically configured to cause the processor 602 to perform the following interactive method operations: acquiring sound acquired by intelligent equipment, and determining a voiceprint library associated with the intelligent equipment, wherein the voiceprint library is used for storing voiceprint templates corresponding to users of the intelligent equipment; and identifying the voiceprint of the collected sound according to the voiceprint template stored in the voiceprint library so as to determine the user corresponding to the collected sound from the users of the intelligent equipment, so that the intelligent terminal provides service content corresponding to the user.
In an alternative embodiment, a plurality of intelligent devices with the same user and the same model share one voiceprint library.
In an optional implementation manner, in a plurality of voiceprint libraries corresponding to the same intelligent device but different types of intelligent devices, voiceprint templates corresponding to the same user are different.
In an alternative embodiment, a voiceprint library associated with the smart device is stored in a local memory of the smart device.
In an optional implementation manner, the identifying, according to the voiceprint templates stored in the voiceprint library, the voiceprint of the collected sound, so as to determine, from the users of the smart device, the user corresponding to the collected sound includes: and invoking a computing resource of a server corresponding to the intelligent equipment, and determining the voiceprint template matched with the voiceprint of the collected sound in the voiceprint library through the computing resource so as to determine a user corresponding to the collected sound from users of the intelligent equipment.
In an alternative embodiment, the method further comprises: encrypting a voiceprint library stored in a local memory of the intelligent device through a secret key corresponding to the intelligent device; and uploading the collected sound and the encrypted voiceprint library to the server to call the computing resource of the server.
In an alternative embodiment, the method further comprises: and processing the collected sound through the intelligent equipment, determining voiceprint information corresponding to the collected sound and uploading the voiceprint information to the server.
In an alternative embodiment, the method further comprises: and adding the equipment identification of the intelligent equipment to the sound collected by the intelligent equipment so as to determine a voiceprint library associated with the intelligent equipment through the equipment identification.
In an alternative embodiment, the method further comprises: and adjusting a sound collection threshold value when the sound is collected according to the scene when the intelligent device collects the sound.
The specific implementation of each step in the program 610 may refer to corresponding steps and corresponding descriptions in the units in the above embodiment of the interaction method, which are not described herein. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and modules described above may refer to corresponding procedure descriptions in the foregoing method embodiments, which are not repeated herein.
The program 610 may also be specifically configured to cause the processor 602 to perform operations corresponding to the following identification methods: acquiring sound acquired by intelligent equipment, and determining a voiceprint library associated with the intelligent equipment, wherein the voiceprint library is used for storing voiceprint templates corresponding to users of the intelligent equipment; and identifying the voiceprint of the collected sound according to the voiceprint template stored in the voiceprint library so as to determine the user corresponding to the collected sound from the users of the intelligent equipment.
The specific implementation of each step in the program 610 may refer to the corresponding steps and corresponding descriptions in the units in the above-mentioned identification method embodiment, which are not repeated herein. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and modules described above may refer to corresponding procedure descriptions in the foregoing method embodiments, which are not repeated herein.
When the above interaction scheme or recognition scheme is executed in the program 610 in the terminal device of the present embodiment, the voiceprint templates corresponding to the users of the intelligent device are stored through the voiceprint library associated with the intelligent device, so as to reduce the number of voiceprint templates used for recognition, improve the recognition efficiency, and accurately determine the user corresponding to the collected sound from the users of the intelligent device, thereby avoiding the situation of recognition errors.
It should be noted that, according to implementation requirements, each component/step described in the embodiments of the present application may be split into more components/steps, or two or more components/steps or part of operations of the components/steps may be combined into new components/steps, so as to achieve the objects of the embodiments of the present application.
The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, RAM, floppy disk, hard disk, or magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium and to be stored in a local recording medium downloaded through a network, so that the methods described herein may be stored on such software processes on a recording medium using a general purpose computer, special purpose processor, or programmable or special purpose hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes a memory component (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor, or hardware, implements the methods of adjusting the collection object described herein. Further, when the general-purpose computer accesses code for implementing the adjustment method of the collection object shown herein, execution of the code converts the general-purpose computer into a special-purpose computer for executing the adjustment method of the collection object shown herein.
Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.
The above embodiments are only for illustrating the embodiments of the present application, but not for limiting the embodiments of the present application, and various changes and modifications may be made by one skilled in the relevant art without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also fall within the scope of the embodiments of the present application, and the scope of the embodiments of the present application should be defined by the claims.

Claims (15)

1. An interaction method, comprising:
acquiring sound acquired by intelligent equipment, and determining a voiceprint library associated with the intelligent equipment, wherein the voiceprint library is used for storing voiceprint templates corresponding to users of the intelligent equipment;
identifying voiceprints of the collected sounds according to the voiceprint templates stored in the voiceprint library, so as to determine a user corresponding to the collected sounds from users of the intelligent equipment, and enabling the intelligent terminal to provide service content corresponding to the user;
wherein, a plurality of intelligent devices with the same users and the same model share one voiceprint library; and in a plurality of voiceprint libraries corresponding to the intelligent devices with the same users but different models, voiceprint templates corresponding to the same user are different.
2. The method of claim 1, wherein a voiceprint library associated with the smart device is stored in a local memory of the smart device.
3. The method of claim 2, wherein the identifying the voiceprint of the collected sound from the voiceprint templates stored in the voiceprint library to determine a user corresponding to the collected sound from the users of the smart device comprises:
and invoking a computing resource of a server corresponding to the intelligent equipment, and determining the voiceprint template matched with the voiceprint of the collected sound in the voiceprint library through the computing resource so as to determine a user corresponding to the collected sound from users of the intelligent equipment.
4. A method according to claim 3, further comprising:
encrypting a voiceprint library stored in a local memory of the intelligent device through a secret key corresponding to the intelligent device;
and uploading the collected sound and the encrypted voiceprint library to the server to call the computing resource of the server.
5. The method as recited in claim 4, further comprising: and processing the collected sound through the intelligent equipment, determining voiceprint information corresponding to the collected sound and uploading the voiceprint information to the server.
6. The method as recited in claim 1, further comprising: and adding the equipment identification of the intelligent equipment to the sound collected by the intelligent equipment so as to determine a voiceprint library associated with the intelligent equipment through the equipment identification.
7. The method as recited in claim 1, further comprising:
and adjusting a sound collection threshold value when the sound is collected according to the scene when the intelligent device collects the sound.
8. A method of identification, comprising:
acquiring sound acquired by intelligent equipment, and determining a voiceprint library associated with the intelligent equipment, wherein the voiceprint library is used for storing voiceprint templates corresponding to users of the intelligent equipment;
identifying voiceprints of the collected sounds according to the voiceprint templates stored in the voiceprint library so as to determine a user corresponding to the collected sounds from users of the intelligent equipment;
wherein, a plurality of intelligent devices with the same users and the same model share one voiceprint library; and in a plurality of voiceprint libraries corresponding to the intelligent devices with the same users but different models, voiceprint templates corresponding to the same user are different.
9. An interactive apparatus, comprising:
the intelligent device comprises an acquisition module, a voice recognition module and a voice recognition module, wherein the acquisition module is used for acquiring voice acquired by the intelligent device and determining a voice print library associated with the intelligent device, and the voice print library is used for storing voice print templates corresponding to users of the intelligent device;
the interaction module is used for identifying voiceprints of the collected sounds according to the voiceprint templates stored in the voiceprint library so as to determine a user corresponding to the collected sounds from users of the intelligent equipment, so that the intelligent terminal provides service content corresponding to the user;
wherein, a plurality of intelligent devices with the same users and the same model share one voiceprint library; and in a plurality of voiceprint libraries corresponding to the intelligent devices with the same users but different models, voiceprint templates corresponding to the same user are different.
10. The apparatus of claim 9, wherein a voiceprint library associated with the smart device is stored in a local memory of the smart device.
11. The apparatus as recited in claim 9, further comprising: and the device identification adding module is used for adding the device identification of the intelligent device to the sound collected by the intelligent device so as to determine the voiceprint library associated with the intelligent device through the device identification.
12. The apparatus as recited in claim 9, further comprising: and the adjusting module is used for adjusting the sound collection threshold value when the sound is collected according to the scene when the intelligent equipment collects the sound.
13. An identification device, comprising:
the intelligent device comprises an acquisition module, a voice recognition module and a voice recognition module, wherein the acquisition module is used for acquiring voice acquired by the intelligent device and determining a voice print library associated with the intelligent device, and the voice print library is used for storing voice print templates corresponding to users of the intelligent device;
the recognition module is used for recognizing the voiceprint of the collected sound according to the voiceprint template stored in the voiceprint library so as to determine a user corresponding to the collected sound from the users of the intelligent equipment;
wherein, a plurality of intelligent devices with the same users and the same model share one voiceprint library; and in a plurality of voiceprint libraries corresponding to the intelligent devices with the same users but different models, voiceprint templates corresponding to the same user are different.
14. A terminal device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to the interaction method according to any one of claims 1 to 7 or operations corresponding to the identification method according to claim 8.
15. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the interaction method of any of claims 1-7 or the identification method of claim 8.
CN201910119857.8A 2019-02-18 2019-02-18 Interaction and identification method, device, terminal equipment and computer storage medium Active CN111653284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910119857.8A CN111653284B (en) 2019-02-18 2019-02-18 Interaction and identification method, device, terminal equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910119857.8A CN111653284B (en) 2019-02-18 2019-02-18 Interaction and identification method, device, terminal equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN111653284A CN111653284A (en) 2020-09-11
CN111653284B true CN111653284B (en) 2023-08-11

Family

ID=72346081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910119857.8A Active CN111653284B (en) 2019-02-18 2019-02-18 Interaction and identification method, device, terminal equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN111653284B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112201254A (en) * 2020-09-28 2021-01-08 中国建设银行股份有限公司 Non-sensitive voice authentication method, device, equipment and storage medium
CN112614494A (en) * 2020-11-25 2021-04-06 中国能源建设集团广东省电力设计研究院有限公司 Monitoring method, device and system applied to container data center

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002247666A (en) * 2001-02-20 2002-08-30 Seiko Epson Corp Method and system for device control
WO2016151193A1 (en) * 2015-03-20 2016-09-29 Aplcomp Oy Audiovisual associative authentication method, related system and device
CN206672635U (en) * 2017-01-15 2017-11-24 北京星宇联合投资管理有限公司 A kind of voice interaction device based on book service robot
CN107863098A (en) * 2017-12-07 2018-03-30 广州市艾涛普电子有限公司 A kind of voice identification control method and device
CN108260248A (en) * 2018-01-12 2018-07-06 广东小天才科技有限公司 A kind of based reminding method and device for intelligent terminal external microphone wind
CN108320752A (en) * 2018-01-26 2018-07-24 青岛易方德物联科技有限公司 Cloud Voiceprint Recognition System and its method applied to community gate inhibition
CN108877790A (en) * 2018-05-21 2018-11-23 江西午诺科技有限公司 Speaker control method, device, readable storage medium storing program for executing and mobile terminal
CN108922528A (en) * 2018-06-29 2018-11-30 百度在线网络技术(北京)有限公司 Method and apparatus for handling voice

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10127911B2 (en) * 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002247666A (en) * 2001-02-20 2002-08-30 Seiko Epson Corp Method and system for device control
WO2016151193A1 (en) * 2015-03-20 2016-09-29 Aplcomp Oy Audiovisual associative authentication method, related system and device
CN206672635U (en) * 2017-01-15 2017-11-24 北京星宇联合投资管理有限公司 A kind of voice interaction device based on book service robot
CN107863098A (en) * 2017-12-07 2018-03-30 广州市艾涛普电子有限公司 A kind of voice identification control method and device
CN108260248A (en) * 2018-01-12 2018-07-06 广东小天才科技有限公司 A kind of based reminding method and device for intelligent terminal external microphone wind
CN108320752A (en) * 2018-01-26 2018-07-24 青岛易方德物联科技有限公司 Cloud Voiceprint Recognition System and its method applied to community gate inhibition
CN108877790A (en) * 2018-05-21 2018-11-23 江西午诺科技有限公司 Speaker control method, device, readable storage medium storing program for executing and mobile terminal
CN108922528A (en) * 2018-06-29 2018-11-30 百度在线网络技术(北京)有限公司 Method and apparatus for handling voice

Also Published As

Publication number Publication date
CN111653284A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
WO2018188586A1 (en) Method and device for user registration, and electronic device
CN107436748B (en) Method and device for processing third-party application message, terminal equipment and readable medium
US11257497B2 (en) Voice wake-up processing method, apparatus and storage medium
CN110267248B (en) B L E communication method, device, equipment and storage medium
CN111653284B (en) Interaction and identification method, device, terminal equipment and computer storage medium
US11451539B2 (en) Identity identification and preprocessing
TWI734385B (en) Identity recognition preprocessing, identity recognition method and system
CN111182390B (en) Volume data processing method and device, computer equipment and storage medium
CN104078045A (en) Identifying method and electronic device
CN113110995A (en) System migration test method and device
CN113010139A (en) Screen projection method and device and electronic equipment
WO2016124008A1 (en) Voice control method, apparatus and system
US20180182393A1 (en) Security enhanced speech recognition method and device
CN113241080A (en) Automatic registration voiceprint recognition method and device
WO2021232213A1 (en) Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method
CN109005469B (en) Message format conversion method and device, storage medium and android terminal
CN113436613A (en) Voice recognition method and device, electronic equipment and storage medium
CN112783799A (en) Software daemon test method and device
CN105282658A (en) Control method, system and device for audio playing equipment
CN111556406A (en) Audio processing method, audio processing device and earphone
CN112449059A (en) Voice interaction device, method and system for realizing call based on voice interaction device
CN113079257B (en) Device association method, network communication method, information processing method, device and equipment
CN116204152A (en) Control method, control device, terminal and storage medium
CN105471593B (en) Group conversation method, device and system
CN112767965B (en) Method, system, medium, and service/terminal for generating/applying noise recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant