JPWO2021211836A5

JPWO2021211836A5 -

Info

Publication number: JPWO2021211836A5
Application number: JP2022561448A
Authority: JP
Publication date: 2024-04-24

Claims

1. A computer-implemented method comprising:
extracting, by the computer, an inbound embedding for the inbound speaker by applying a machine learning model to the inbound speech signal;
generating, by the computer, a similarity score based on a distance between the inbound embedding and a voiceprint stored in a speaker profile in a speaker profile database;
In response to the computer determining that the similarity score of the inbound embedding does not satisfy a similarity threshold,
generating, by the computer, a new speaker profile for the inbound speaker in the speaker profile database that includes the inbound embedding, the new speaker profile being a database record that stores the inbound embedding as a new voiceprint.

receiving, by the computer, the inbound voice signal from an end user device via an intermediate server;
The method of claim 1 , further comprising transmitting, by the computer, a new speaker identifier associated with the new speaker profile to the intermediate server.

The method of claim 1, further comprising extracting, by the computer, one or more features from the inbound speech signal, and the computer generating the inbound embedding by applying the machine learning model to the one or more features extracted from the inbound speech signal.

extracting, by the computer, a second inbound embedding from a second inbound signal by applying the machine learning model to the second inbound signal;
generating, by the computer, a second similarity score based on the distance between the second inbound embedding and the new voiceprint stored in the new speaker profile;
In response to the computer determining that the second similarity score of the second inbound embedding satisfies a similarity threshold,
2. The method of claim 1, further comprising: updating, by the computer, the new voiceprint of the inbound speaker based on the second inbound signal.

receiving, by the computer, a subscriber identifier associated with the inbound voice signal;
10. The method of claim 1, further comprising: identifying, by the computer, one or more speaker profiles stored in the speaker profile database associated with the subscriber identifier, wherein the computer generates one or more similarity scores for the inbound embeddings based on one or more voiceprints stored in the one or more speaker profiles associated with the subscriber identifier.

The computer generates the new voiceprint based on one or more inbound embeddings, the method comprising:
identifying, by the computer, one or more maturity factors of the new voiceprint based on the one or more inbound embeddings;
The method of claim 1 , further comprising determining, by the computer, a level of maturity of the new voiceprint based on the one or more maturity factors.

The method of claim 6, further comprising updating, by the computer, a new similarity threshold for the new speaker profile in response to the computer determining that the maturity level of the new voiceprint satisfies a maturity threshold.

In response to the computer determining that the maturity level is below a maturity threshold,
generating, by the computer, an active enrollment prompt, the active enrollment prompt including a user interface configured to display a request for an additional inbound voice signal;
extracting, by the computer, an additional embedding from the additional inbound signal; and
7. The method of claim 6, further comprising: updating, by the computer, the new voiceprint according to additional embeddings extracted from the additional inbound signals.

The method of claim 6, further comprising updating, by the computer, the new speaker profile from a temporary profile to a permanent profile in response to the computer determining that the level of maturity satisfies a maturity threshold.

1. A system comprising:
a speaker profile database comprising a non-transitory machine-readable storage medium configured to store data records containing speaker profiles;
and a computer including a processor, the processor comprising:
extracting an inbound embedding for an inbound speaker by applying a machine learning model to the inbound speech signal;
generating a similarity score based on a distance between the inbound embedding and a voiceprint stored in a speaker profile in the speaker profile database;
In response to the computer determining that the similarity score of the inbound embedding does not satisfy a similarity threshold,
and generating a new speaker profile for the inbound speaker in the speaker profile database that includes the inbound embedding, the new speaker profile being a database record that stores the inbound embedding as a new voiceprint.

The computer includes:
receiving the inbound voice signal from an end user device via an intermediate server;
The system of claim 10 , further configured to: transmit a new speaker identifier associated with the new speaker profile to the intermediate server.

The system of claim 11, wherein the end user device is at least one of a smart television, a media device coupled to a television, and an edge device.

The system of claim 10, wherein the computer is further configured to extract one or more features from the inbound speech signal, and the computer generates the inbound embedding by applying the machine learning model to the one or more features that the computer extracted from the inbound speech signal.

The computer includes:
extracting a second inbound embedding from a second inbound speech signal by applying the machine learning model to the second inbound speech signal;
generating a second similarity score based on the distance between the second inbound embedding and the new voiceprint stored in the new speaker profile;
In response to the computer determining that the second similarity score of the second inbound embedding satisfies a similarity threshold,
11. The system of claim 10, further configured to: update the new voiceprint of the inbound speaker based on the second inbound voice signal.

The computer includes:
receiving a subscriber identifier associated with the inbound voice signal;
11. The system of claim 10, further configured to: identify one or more speaker profiles associated with the subscriber identifier stored in the speaker profile database, wherein the computer is configured to generate one or more similarity scores for the inbound embeddings based on one or more voiceprints stored in the one or more speaker profiles associated with the subscriber identifier.

The system of claim 15, wherein the subscriber identifier is associated with one or more speaker identifiers, and each speaker profile is associated with a corresponding speaker identifier.

The system of claim 16, wherein at least one of the subscriber identifier and each speaker identifier is an anonymized identifier.

The computer generates the new voiceprint based on one or more inbound embeddings, the computer further comprising:
identifying one or more maturity factors of the new voiceprint based on the one or more inbound embeddings;
The system of claim 10 , further configured to: determine a level of maturity of the new voiceprint based on the one or more maturity factors.

The computer includes:
in response to determining that the level of maturity is below a maturity threshold;
generating an active registration prompt, the active registration prompt including a user interface configured to display a request for an additional inbound voice signal;
extracting an additional embedding from the additional inbound signal; and
20. The system of claim 18, further configured to: update the new voiceprint according to an additional embedding extracted from the additional inbound signal.

The system of claim 18, wherein the computer is further configured to update the new speaker profile from a temporary profile to a permanent profile in response to the computer determining that the level of maturity satisfies a maturity threshold.