CN112883350A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112883350A
CN112883350A CN201911206373.3A CN201911206373A CN112883350A CN 112883350 A CN112883350 A CN 112883350A CN 201911206373 A CN201911206373 A CN 201911206373A CN 112883350 A CN112883350 A CN 112883350A
Authority
CN
China
Prior art keywords
identity
target
user
template
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911206373.3A
Other languages
Chinese (zh)
Inventor
杨广煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911206373.3A priority Critical patent/CN112883350A/en
Publication of CN112883350A publication Critical patent/CN112883350A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints

Abstract

The embodiment of the application discloses a data processing method, a data processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: when the primary identity mark is in an effective state, acquiring target biological information; identifying a business intention and a target user identity corresponding to the target biological information; acquiring a target secondary identity corresponding to the target user identity; the target secondary identity is a sub-identity of the primary identity; and executing a service instruction corresponding to the service intention based on the target secondary identity. By adopting the method and the device, the service behavior executed by the terminal equipment can be matched with the user identity.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus, and a related device.
Background
With the rapid development of internet technology, the number of internet users is continuously increasing, and among all application software, video software is one of the software which is used by internet users most frequently. Data shows that the usage time of the video software is up to 34.5% of the total usage time of the mobile device.
In a family scene, a terminal device (e.g., a smart television) is often shared by a plurality of family members, and when a family member a logs in a video application in the terminal device by using its own account a and watches a video 1, the terminal device records the watching progress of the family member a on the video 1. If another family member B uses the video application again to watch the video 1 and does not switch the login account of the video application, the terminal equipment automatically jumps to the watching progress of the family member A to the video 1, so that the service behavior executed by the terminal equipment is not matched with the user identity.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device and related equipment, and service behaviors executed by terminal equipment can be matched with user identities.
An embodiment of the present application provides a data processing method, including:
when the primary identity mark is in an effective state, acquiring target biological information;
identifying a business intention and a target user identity corresponding to the target biological information;
acquiring a target secondary identity corresponding to the target user identity; the target secondary identity is a sub-identity of the primary identity;
and executing a service instruction corresponding to the service intention based on the target secondary identity.
Wherein the target bio-information includes target voice data;
the identifying the business intention and the target user identity corresponding to the target biological information comprises:
converting the target voice data into text data, and performing semantic recognition on the text data to obtain the service intention;
calling an identity recognition model corresponding to the primary identity to determine a matching result between the target voice data and at least one template user identity; the identity recognition model is a classification model generated according to the at least one template user identity and template voice data respectively corresponding to the at least one template user identity;
if at least one matching result meets the matching condition, taking the template user identity corresponding to the matching result meeting the matching condition as the target user identity;
the obtaining of the target secondary identity corresponding to the target user identity includes:
extracting a target identity corresponding to the target user identity from a secondary identity set corresponding to the at least one template user identity; and the secondary identification in the secondary identification set is a sub-identification of the primary identification.
Wherein, still include:
if the matching result meeting the matching condition does not exist in the at least one matching result, establishing the target user identity;
identifying age information corresponding to the target voice data, and searching an identity head portrait matched with the age information in an image material library;
the obtaining of the target secondary identity corresponding to the target user identity includes:
creating the target secondary user identification for the target user identity;
setting the target secondary user identification as a sub-identification of the primary identity identification;
and performing associated storage on the target user identity, the target secondary identity and the identity head portrait.
Wherein the identity recognition model comprises a feature generator and a pattern matcher;
the calling of the identity recognition model corresponding to the primary identity to determine the matching result between the target voice data and at least one template user identity comprises:
extracting a target voiceprint feature of the target voice data based on the feature generator;
determining the matching probability between the target voiceprint feature and at least one template voiceprint feature based on the pattern matcher, and taking the obtained matching probabilities as matching results; the at least one template voiceprint feature is a voiceprint feature corresponding to the at least one template voice data respectively.
Wherein the extracting the target voiceprint feature of the target voice data based on the feature generator comprises:
extracting a spectrum parameter and a linear prediction parameter of the target voice data based on the feature generator; the frequency spectrum parameter is a short-time spectrum characteristic parameter of the target voice data; the linear prediction parameters are frequency spectrum fitting characteristic parameters of the target voice data;
and obtaining the target voiceprint characteristics according to the frequency spectrum parameters and the linear prediction parameters.
Wherein, still include:
acquiring template voice data corresponding to the identity of a template user;
generating an identity tag vector corresponding to the template voice data;
acquiring an initial classification model, predicting the matching degree between the sample voice data and the identity of the at least one template user based on the initial classification model, and acquiring an identity prediction vector according to the acquired matching degree;
and determining a classification error according to the identity label vector and the identity prediction vector, and training the initial classification model according to the classification error to obtain the identity recognition model.
Wherein, still include:
when the matching result meeting the matching condition exists in the at least one matching result, sending an animation playing instruction to a client to instruct the client to play a target animation;
and when the execution of the service instruction is finished, sending an animation playing stopping instruction to the client, and indicating the client to close the target animation.
The service intention comprises a client secondary login object switching intention;
the executing the business instruction corresponding to the business intention based on the target secondary identity comprises the following steps:
generating a switching instruction corresponding to the switching intention of the secondary login object of the client; the switching instruction belongs to the service instruction;
and according to the switching instruction, taking the target secondary identity as a secondary login object of the client.
Wherein, still include:
acquiring behavior data of a user in the client corresponding to the target secondary identity; the behavior data is used for generating recommended service data for the user;
and performing associated storage on the behavior data and the target secondary identity.
Wherein the business intent comprises a business data query intent;
the executing the business instruction corresponding to the business intention based on the target secondary identity comprises the following steps:
generating a query instruction corresponding to the business data query intention; the query instruction belongs to the service instruction;
and inquiring target service data corresponding to the target secondary identity, and returning the target service data to the client.
And the user authority of the target secondary identity is the same as the user authority of the primary identity.
Another aspect of the embodiments of the present application provides a data processing apparatus, including:
the first acquisition module is used for acquiring the target biological information when the primary identity is in an effective state;
the identification module is used for identifying the business intention and the target user identity corresponding to the target biological information;
the second acquisition module is used for acquiring a target secondary identity corresponding to the target user identity; the target secondary identity is a sub-identity of the primary identity;
and the determining module is used for executing the service instruction corresponding to the service intention based on the target secondary identity.
Wherein the target bio-information includes target voice data;
the identification module comprises:
the conversion unit is used for converting the target voice data into text data, and semantically identifying the text data to obtain the service intention;
the calling unit is used for calling an identity recognition model corresponding to the primary identity to determine a matching result between the target voice data and at least one template user identity; the identity recognition model is a classification model generated according to the at least one template user identity and template voice data respectively corresponding to the at least one template user identity;
the first determining unit is used for taking the template user identity corresponding to the matching result meeting the matching condition as the target user identity if the matching result meeting the matching condition exists in at least one matching result;
the second obtaining module includes:
a first extraction unit, configured to extract a target identity corresponding to the target user identity from a secondary identity set corresponding to the at least one template user identity; and the secondary identification in the secondary identification set is a sub-identification of the primary identification.
Wherein, still include:
a second determining unit, configured to create the target user identity if there is no matching result that meets the matching condition in the at least one matching result, identify age information corresponding to the target voice data, and search for an identity avatar matching the age information in an image material library;
the second obtaining module includes:
and the second extraction unit is used for creating the target secondary user identification for the target user identity, setting the target secondary user identification as a sub-identification of the primary identification, and performing associated storage on the target user identity, the target secondary identification and the identity head portrait.
Wherein the identity recognition model comprises a feature generator and a pattern matcher;
the calling unit comprises:
an extraction subunit configured to extract a target voiceprint feature of the target speech data based on the feature generator;
a matching subunit, configured to determine, based on the pattern matcher, matching probabilities between the target voiceprint feature and at least one template voiceprint feature, and take all the obtained matching probabilities as matching results; the at least one template voiceprint feature is a voiceprint feature corresponding to the at least one template voice data respectively.
The extracting subunit is specifically configured to extract, based on the feature generator, a spectrum parameter and a linear prediction parameter of the target speech data, and obtain the target voiceprint feature according to the spectrum parameter and the linear prediction parameter; the frequency spectrum parameter is a short-time spectrum characteristic parameter of the target voice data; the linear prediction parameters are spectrum fitting characteristic parameters of the target speech data.
Wherein, still include:
the training module is used for obtaining template voice data corresponding to the identity of a template user, generating an identity tag vector corresponding to the template voice data, obtaining an initial classification model, predicting the matching degree between the template voice data and the identity of the at least one template user based on the initial classification model, obtaining an identity prediction vector according to the obtained matching degree, determining a classification error according to the identity tag vector and the identity prediction vector, and training the initial classification model according to the classification error to obtain the identity recognition model.
Wherein, still include:
the playing module is used for sending an animation playing instruction to a client to indicate the client to play the target animation when the matching result meeting the matching condition exists in the at least one matching result;
and the playing module is further used for sending an animation playing stopping instruction to the client to indicate the client to close the target animation when the execution of the service instruction is completed.
The service intention comprises a client secondary login object switching intention;
the determining module includes:
the first generation unit is used for generating a switching instruction corresponding to the switching intention of the secondary login object of the client, and taking the target secondary identity as the secondary login object of the client according to the switching instruction; the switching instruction belongs to the service instruction.
Wherein, still include:
the storage module is used for acquiring behavior data of a user in the client corresponding to the target secondary identity, and storing the behavior data and the target secondary identity in an associated manner; the behavior data is used for generating recommended service data for the user.
Wherein the business intent comprises a business data query intent;
the determining module includes:
the second generation unit is used for generating a query instruction corresponding to the service data query intention, querying target service data corresponding to the target secondary identity and returning the target service data to the client; the query instruction belongs to the service instruction.
And the user authority of the target secondary identity is the same as the user authority of the primary identity.
Another aspect of the embodiments of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the method according to one aspect of the embodiments of the present application.
Another aspect of the embodiments of the present application provides a computer storage medium storing a computer program, the computer program comprising program instructions that, when executed by a processor, perform a method as in one aspect of the embodiments of the present application.
According to the method and the device, the target secondary identity corresponding to the user identity can be determined by identifying the user identity and the service intention which generate the current biological information, so that a service instruction executed based on the target secondary identity not only meets the current service intention of the user, but also is matched with the user identity; furthermore, the method and the device only need to acquire the target biological information of the user, can simultaneously determine the service intention and the user identity of the user, and the user does not need to perform twice operations of determining the service intention and determining the user identity, so that the operation cost of the user can be reduced, and the efficiency of executing the service instruction matched with both the service intention and the user identity of the user by the terminal can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of a system architecture for data processing according to an embodiment of the present disclosure;
2 a-2 d are schematic diagrams of a data processing scenario provided by an embodiment of the present application;
fig. 3 is a schematic flowchart of a data processing method according to an embodiment of the present application;
FIG. 4 is a schematic flow chart diagram of another data processing method provided in the embodiments of the present application;
fig. 5 is a timing diagram of a data processing method according to an embodiment of the present application;
FIG. 6 is a schematic flowchart of determining a target user identity and a target secondary identity according to an embodiment of the present disclosure;
FIG. 7 is a schematic flow chart diagram of another data processing method provided in the embodiments of the present application;
FIG. 8 is a timing diagram of another data processing method provided in an embodiment of the present application;
fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The scheme provided by the embodiment of the application belongs to Speech Technology (Speech Technology), Natural Language Processing (NLP) and Machine Learning (ML) belonging to the field of artificial intelligence.
Key technologies for Speech Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, watch, speak and feel.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
In the present application, speech technology is involved in converting a user's speech into text, and natural language processing is involved in semantically recognizing the text to determine the user's intent.
Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. In the application, machine learning is related to identifying the user identity of the current user, and specific technical means relate to technologies such as artificial neural networks and logistic regression in machine learning.
Fig. 1 is a block diagram of a system architecture for data processing according to an embodiment of the present disclosure. The application relates to a server 10d and a terminal device cluster, and the terminal device cluster may include: terminal device 10a, terminal device 10 b.
Taking the terminal device 10a as an example, when the primary identity is in the valid state, the terminal device 10a collects the biological information of the user, and sends the collected biological information to the server 10 d. The server 10d performs semantic recognition on the biological information to determine the intention of the biological information; the server 10d determines the user identity of the biological information, and extracts the secondary identification of the user identity, which is a sub-identification of the primary identification. The server 10d executes the instruction associated with the intention based on the determined secondary identity. Subsequently, the server 10d may return the execution result of the instruction to the terminal device 10 a.
Identifying the intent of the biometric information, determining the user identity of the biometric information, and executing instructions related to the intent may also be accomplished by the terminal device 10 a.
The terminal device 10a, the terminal device 10b,. the terminal device 10c, etc. shown in fig. 1 may include a smart television, a mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable device (e.g., a smart watch, a smart band, etc.), etc. The server 10d shown in fig. 1 may refer to a single server device, or may refer to a server cluster including a plurality of server devices.
Fig. 2a to 2d below specifically describe how the terminal device 10a recognizes the intention of the biometric information, determines the user identity of the biometric information, and executes the instruction related to the intention, and the intention of recognizing the biometric information, the user identity of the biometric information, and the instruction related to the intention may be embodied as a video client in the terminal device 10 a:
please refer to fig. 2 a-2 d, which are schematic diagrams illustrating a data processing scenario according to an embodiment of the present application. When the current user starts the video client in the terminal device 10a, and the video client detects that the primary account "01" has logged in but no secondary account under the primary account "01" has logged in, the video client can display prompt information on the screen: and inputting the current secondary account which is not logged in by voice or clicking the secondary account which can be logged in by the user, so as to prompt the current user to log in the secondary account, wherein the primary account 01 is the primary account of the user 1.
The current user can input by voice: the video client can acquire the voice data 20b of the voice 'login of the secondary account', the video client converts the voice data 20b into text data, semantically recognizes the text data, and determines that the intention corresponding to the voice data 20b is as follows: "secondary Account Login".
The video client inputs the voice data 20b into a trained prediction model 20d corresponding to the primary account number "01", the prediction model 20d can extract voiceprint features of the voice data 20b and match the voiceprint features with a plurality of template voiceprint features, and if a template voiceprint feature matched with the voiceprint features of the voice data 20b exists in the plurality of template voiceprint features, a user identity corresponding to the matched template voiceprint feature is extracted (assuming that the extracted user identity is user 2).
Each template voiceprint feature in the prediction model 20d corresponds to 1 user identity, and it is assumed that the prediction model 20d is trained from 2 template voiceprint features, and the user identities corresponding to the 2 template voiceprint features respectively are: user 1 and user 2, and the secondary account of user 1 and the secondary account of user 2 are sub-accounts of the current primary account "01".
As shown in fig. 2b, the second-level account corresponding to the user 2 is searched in the user information record table 20e corresponding to the first-level account "01" as follows: 002.
as can be seen from fig. 2 b: the user information record table 20e includes 3 user records, where the 3 user records respectively correspond to a primary account "01" and 2 secondary accounts (respectively, a secondary account "001" and a secondary account "002") subordinate to the primary account "01"; the primary account "01" and the secondary account "001" are both accounts of the user 1, and the secondary account "002" is an account of the user 2; the user's history is stored in association with the secondary account.
The video client extracts the secondary account of the user 2: 002, since the intention of the voice data 20b determined in the foregoing is: the 'secondary account number login' can take the secondary account number '002' as a secondary login object of the video client.
As shown in page 20f in fig. 2b, the video client may display an animation in the screen during the process of determining the intention of the voice data 20b and determining the secondary account number "002", and when the video client has taken the secondary account number "002" as the secondary login object, the animation is stopped to be played, and the video client jumps to the home page.
As shown in page 20g, the video client is currently logged in to secondary account "002" and primary account "01".
Alternatively, the foregoing assumes that there is a template voiceprint feature matching the voiceprint feature of the speech data 20b from among the plurality of template voiceprint features, thereby determining that the user identity corresponding to the speech data 20b is user 2.
As shown in fig. 2c, it is assumed that there is no template voiceprint feature from the plurality of template voiceprint features that matches the voiceprint feature of the speech data 20b, that is, there is no corresponding user identity for the current user that generated the speech data 20b, and there is no corresponding user record in the user information record table 20 c. Because there is no corresponding user record, the video client may create 1 new user record for the current user, and the user record includes: user identity "user 2", secondary account number "002", avatar, level "level 2", and history (of course, the history at this time is empty).
The video client adds the user record to the user information record table 20c, and a new user information record table 20h can be obtained after the addition.
The video client extracts the newly created secondary account of the user 2: 002, since the intention of the voice data 20b determined in the foregoing is: the 'secondary account number login' can take the secondary account number '002' as a secondary login object of the video client.
As shown in page 20i in fig. 2d, after the video client creates the secondary account "002" and logs in the secondary account "002", a prompt message may be displayed in the screen: the method comprises the steps of 'not detecting your secondary account, creating a secondary account for your new account and logging in', and prompting the current user to create a new secondary account.
As shown in page 20j, the video client is currently logged in to the newly created secondary account "002" and the primary account "01".
The specific process of acquiring the target biological information (the voice data 20b of "log in secondary account" in the above embodiment), identifying the service intention (the intention "log in secondary account" in the above embodiment) and the target user identity (the user 2 in the above embodiment) can be referred to the following embodiments corresponding to fig. 3 to 8.
Referring to fig. 3, which is a schematic flow chart of a data processing method according to an embodiment of the present application, as shown in fig. 3, the data processing method may include the following steps:
and S101, when the primary identity mark is in an effective state, acquiring target biological information.
Specifically, the server (e.g., the server 10d in the embodiment corresponding to fig. 1) detects whether the primary id is a primary login object of the current corresponding client (e.g., the video client in the embodiment corresponding to fig. 2a to fig. 2 d), and if the primary id is the primary login object of the client, it indicates that the primary id is in an active state. The login object of the client side can comprise a primary login object and a secondary login object, wherein the primary login object corresponds to a primary identity, the secondary login object corresponds to a secondary identity, and the secondary identity is a sub-identity of the primary identity.
The client may specifically be a video client, an instant messaging client, or a mail client.
When the primary id is in the valid state, the server may receive the biometric information (referred to as target biometric information, such as the voice data 20b of the voice "log in the next secondary account" in the corresponding embodiments of fig. 2a to 2 d) sent by the client.
The target bio-information may include voice data (referred to as target voice data), and the target bio-information may also include voice data (referred to as target voice data) and image data (referred to as target image data), wherein the target image data may be image data of the face of the current user.
And step S102, identifying the service intention and the target user identity corresponding to the target biological information.
Specifically, when the target bio-information includes the target voice data, the server may semantically recognize the text data by converting the target voice data into the text data to determine the business intention of the target voice data.
And, the server may determine the user identity of the current user (referred to as the target user identity, e.g. user 2 in the embodiment corresponding to fig. 2 a-2 d) through the identification model corresponding to the first identity (e.g. the predictive model 20d in the embodiment corresponding to fig. 2 a-2 d).
The sequence of the server determining the service intention and determining the identity of the target user is not limited.
The conversion of the target speech and speech data into text data may employ an acoustic model (the acoustic model may be a model established by a dynamic time warping method based on pattern matching, or a model established by an artificial neural network recognition method, etc.), determine the state of each audio frame of the target speech and speech data, combine a plurality of states into phonemes, and then combine a plurality of phonemes into words.
And combining a plurality of words into correct, unambiguous and logical sentences by adopting a language model (the language model can be an N-Gram language model, a Markov N-Gram, an Exponential model or a Decision Tree model) to obtain text data.
Semantically identifying the text data to determine the service intention of the text data, and performing mode matching on the text data by adopting an entity-predicate knowledge graph to further determine an entity and a predicate in the text data. The server can combine the identified entities and predicates into a business intent.
For example, current user speech input: after the voice data is converted into text data, and the historical play record is inquired, the knowledge graph can be adopted to determine that the entity is as follows: "history play record", the predicate is "inquiry", so the business intention is: history play records-queries.
The identity recognition model is a classification model trained by at least one template user identity and voice data (referred to as template voice data) corresponding to the template user identity, each template user identity has a secondary identity corresponding to the template user identity (such as the secondary account "001" and the secondary account "002" in the corresponding embodiments of fig. 2 a-2 d, and the identity may be a user account), and the secondary identity of each template user identity is a sub-identity of the primary identity.
When the target biological information includes the target voice data and the target image data, the server may also determine the service intention of the target voice data in the above manner, and determine the user identity (referred to as a first user identity) of the target voice data according to the identity recognition model;
the server may also determine a user identity (referred to as a second user identity) of the target image data according to an image recognition model, where the image recognition model and the identity recognition model are similar and are classification models trained by at least one template user identity and image data (referred to as template image data) corresponding to the template user identity.
The server can determine the final target user identity according to the first user identity determined by the identity recognition model and the second user identity determined by the image recognition model, and the target user identity determined based on the two models has higher accuracy.
Alternatively, when the target bio-information includes the target voice data and the target image data, the server may also determine the business intention of the target voice data in the above-described manner, and determine the target user identity of the target image data only based on the image recognition model.
Step S103, acquiring a target secondary identity corresponding to the target user identity; the target secondary identity is a sub-identity of the primary identity.
Specifically, the server may obtain a target secondary id of the target user identity (e.g. the secondary account "002" in the embodiment corresponding to fig. 2a to fig. 2 d) from the secondary id set corresponding to the at least one template user identity, or recreate the target secondary id, where the target secondary id is a sub-id of the primary id.
And step S104, executing a service instruction corresponding to the service intention based on the target secondary identity.
Specifically, when the service intention is a switching intention of a secondary login object of the client, the server generates a switching instruction corresponding to the switching intention of the secondary login object of the client, wherein the switching instruction is used for indicating the server to switch the current secondary login object of the client; the handover command belongs to a service command.
The server can take the target secondary identity as the current secondary login object of the client according to the switching instruction. Subsequently, the server may issue the switching notification message to the client, so that after receiving the switching notification message, the client may display a prompt message for prompting the user that the current secondary login object is the target secondary identity.
Subsequently, when the secondary login object of the client is the target secondary identity, the server may receive behavior data (the behavior data may include at least one of viewing behavior data, browsing behavior data, searching behavior data, and comment behavior data) reported by the client, where the behavior data is user behavior data of the user collected by the client when the secondary login object of the client is the target secondary identity.
The server can store the target secondary identity and the behavior data reported by the client in a correlation manner. Subsequently, the server can generate recommended service data for the user based on the behavior data so as to achieve the purpose of personalized recommendation.
When the business intention is a business data query intention, the server generates a query instruction corresponding to the business data query intention, wherein the query instruction is used for indicating the server to query the business data; the query command belongs to a service command. For example, query history viewing records, query viewing progress records, query search records, and the like.
The server may query, according to the query instruction, service data (referred to as target service data) related to the target secondary identity, and subsequently, the server may return the queried target service data to the client, so that the client may display the target service data after receiving the target service data.
Or, after the server generates the query instruction, the target secondary identity may be used as a secondary login object of the client, and meanwhile, the query operation related to the query instruction is executed.
It should be noted that the user right of the target secondary identity is the same as the user right of the primary identity, and further, the user right of all the sub-identities of the primary identity is the same as the user right of the primary identity.
For example, if the primary id has member VIP rights, all the sub-ids of the primary id (including the target secondary id and the secondary id of the template user id mentioned above) have member VIP rights.
The user identities at different levels correspond to different functional architectures, and the primary identity can be used for managing membership rights and counting statistical information (for example, total viewing time, etc.) of all secondary identities. The secondary identity is personalized information for managing the identity of each user.
It should be noted that, the above steps S101 to S104 are described with a server as an execution subject, where the execution subject may also be a client installed in a terminal device (such as the terminal device 10a in the corresponding embodiment of fig. 2a to fig. 2 d), the terminal device may be a smart television, and the client may be a video client installed in the smart television.
When the primary identity mark is in an effective state, the client acquires target biological information, and the client identifies the service intention of the target biological information and calls an identity identification model to determine the identity of a target user; the client obtains a target secondary identity of the target user identity, and executes a service instruction corresponding to the service intention based on the target secondary identity, for example, the target secondary identity is used as a service instruction of a secondary login object of the client, and the service instruction of target service data corresponding to the target secondary identity is inquired.
According to the method and the device, the target secondary identity corresponding to the user identity can be determined by identifying the user identity and the service intention which generate the current biological information, so that a service instruction executed based on the target secondary identity not only meets the current service intention of the user, but also is matched with the user identity; furthermore, the method and the device only need to acquire the target biological information of the user, can simultaneously determine the service intention and the user identity of the user, and the user does not need to perform twice operations of determining the service intention and determining the user identity, so that the operation cost of the user can be reduced, and the efficiency of executing the service instruction matched with both the service intention and the user identity of the user by the terminal can be improved.
Please refer to fig. 4, which is a schematic flow chart of another data processing method provided in the embodiment of the present application, where the data processing includes the following steps:
in step S201, the flow starts.
In step S202, the server acquires voice data.
Specifically, when the primary account (which may correspond to the primary id in the present application) logs in the client, that is, the primary account is in an active state, the server receives the voice data sent by the client, where the voice data is data collected by the client when the user inputs a "login sub-account" in the client by voice.
In step S203, the server determines whether the voiceprint already exists.
Specifically, the server semantically recognizes the voice data and determines that the service intention is a login sub-account.
The server judges whether template voiceprint characteristics matched with the voiceprint characteristics of the voice data exist or not by calling an identity recognition model of the primary account, and if the template voiceprint characteristics exist, the server executes the step S204 and the step S206; if not, step S205-step S206 are executed.
And step S204, the server logs in the secondary account to the client.
Specifically, the server sets the secondary account (which may correspond to the target secondary id in the present application) corresponding to the matched voiceprint feature as the secondary login account of the client, and at this time, the client logs in a primary account and a secondary account.
Step S205, the server creates a new secondary account (which may correspond to the target secondary id in the present application), and the secondary account is a sub-account of the primary account, stores the newly created secondary account in association with the voiceprint feature, and uses the newly created secondary account as the secondary login account of the client.
Step S206, the flow ends.
Please further refer to fig. 5, which is a timing diagram of a data processing method according to an embodiment of the present application, where the video background server, the voice recognition server, and the voiceprint recognition server described below all belong to the servers in the present application, and the data processing includes the following steps:
step S301, the primary account is in an active state, and the client collects voice data 'enter sub-account' input by the user.
Step S302, the client sends the voice data to a video background server.
Step S303, the video background server sends the voice data to the voice recognition server.
And step S304, the video background server sends the voice data to a voiceprint recognition server.
Step S305, the voice recognition server carries out semantic recognition on the voice data, determines that the service intention is to access a secondary account, and sends the determined service intention back to the video background server.
And S306, the voiceprint recognition server performs voiceprint recognition on the voice data according to the identity recognition model corresponding to the primary account to obtain a voiceprint recognition result, and the voiceprint recognition server sends the voiceprint recognition result back to the video background server.
Step S307, the video background server generates a secondary account access instruction corresponding to the service intention.
Step S308, the video background server judges whether a corresponding secondary account exists according to the voiceprint recognition result, and if so, returns service data corresponding to the secondary account to the client according to the instruction of accessing the secondary account; if not, a new secondary account is created, and the secondary account is a sub-account of the primary account.
According to the method and the device, the target secondary identity corresponding to the user identity can be determined by identifying the user identity and the service intention which generate the current biological information, so that a service instruction executed based on the target secondary identity not only meets the current service intention of the user, but also is matched with the user identity; furthermore, the method and the device only need to acquire the target biological information of the user, can simultaneously determine the service intention and the user identity of the user, and the user does not need to perform twice operations of determining the service intention and determining the user identity, so that the operation cost of the user is reduced, and the efficiency of executing the service instruction matched with both the service intention and the user identity of the user by the terminal is improved.
Please refer to fig. 6, which is a schematic flowchart illustrating a process of determining a target user identity and a target secondary identity provided in an embodiment of the present application, where determining the target user identity and the target secondary identity includes the following steps S401 to S404, and steps S401 to S404 are specific embodiments of steps S102 to S103 in the embodiment corresponding to fig. 3:
step S401, converting the target voice data into text data, and performing semantic recognition on the text data to obtain the service intention.
Specifically, when the target biological information is the target voice data, the server divides the target voice data into a plurality of audio frames according to a preset frame length and a preset frame shift, and the audio frames are partially overlapped with each other, so that the overlapping length is equal to the preset frame shift.
For example, the target voice data with the time dimension of 0-30ms is divided according to the frame length of 20ms and the frame shift of 10ms, and can be divided into audio frames 1: speech data between 0-20ms and audio frame 2: voice data between 10-30 ms.
Extracting a spectrum parameter of each audio frame, wherein the spectrum parameter is a short-time spectrum characteristic parameter of the audio frame, and the short-time spectrum characteristic parameter is a parameter extracted based on a physiological structure of a sounding organ such as a glottis, a vocal tract or a nasal cavity.
The short-time spectrum characteristic parameters may include: at least one of parameters such as a pitch spectrum and its contour, an energy of a pitch frame, a spectrum envelope, an appearance frequency of a pitch formant and its locus.
And extracting linear prediction parameters of each audio frame, wherein the linear prediction parameters are spectrum fitting characteristic parameters of the audio frames, the spectrum fitting characteristic parameters are parameters provided by simulating the characteristics of human ears on sound frequency perception from the perspective of hearing, and the spectrum fitting characteristic parameters are speech characteristics estimated by using corresponding approximation parameters, wherein mathematically, the spectrum fitting characteristic parameters are a plurality of 'past' audio frames to approximate the current audio frame.
The spectral fit characteristic parameters may include: at least one of linear prediction cepstrum (LPCC), Line Spectrum Pair (LSP), autocorrelation and log-area ratio, Mel-frequency cepstrum (MFCC), Perceptual Linear Prediction (PLP), etc.
In the above manner, the spectral parameters and linear prediction parameters extracted for each audio frame are combined into a vector, so that each audio frame can be expressed as a multi-dimensional vector (also referred to as a feature vector). The acoustic model is used to determine the state to which the feature vector corresponding to each audio frame belongs, and generally, the states of adjacent audio frames should be the same because the frame length of each audio frame is short and is in the order of milliseconds and ms.
The states corresponding to several audio frames (generally 3 audio frames) are combined into a phoneme, the phoneme is the smallest unit of speech, the phoneme is a unit separated from the point of timbre, and a phoneme exists alone or several phonemes are combined to be called a syllable.
Then, several phonemes are combined into words (or words). Because of the time-varying property, noise and other unstable factors of the speech signal, each word has a close relationship with the context, and in order to further improve the accuracy of speech-to-text conversion, adaptive adjustment is performed according to the contexts of all words. Therefore, the server can adopt a language model to combine the recognized words into a logical and unambiguous statement, and can obtain text data corresponding to the target voice data.
The server may obtain an entity-predicate knowledge graph, where the entity-predicate knowledge graph includes a plurality of entity strings and predicate strings, and each entity string (or predicate string) identifies whether the string is an entity attribute or a predicate attribute. The server may perform multi-modal string matching of the text data to the entity-predicate knowledgegraph using a multi-modal string matching algorithm (which may include AC automata, hash function matching, etc.), determine matching strings in the text data and whether the strings are entity attributes or predicate attributes. The server may use a character string belonging to an entity attribute in the text data as an entity and a character string belonging to a predicate attribute as a predicate. And combining the entity identified from the text data and the predicate into a business intention.
Step S402, calling an identity recognition model corresponding to the primary identity to determine a matching result between the target voice data and at least one template user identity; the identity recognition model is a classification model generated according to the at least one template user identity and template voice data corresponding to the at least one template user identity respectively.
Specifically, the server obtains an identity recognition model corresponding to the primary identity, where the identity recognition model is a classification model trained according to at least one template user identity and template voice data corresponding to each template user identity, and the template user identity may be understood as a user identity that has already been created by the server, and each template user identity has a secondary identity corresponding to the template user identity, and the secondary identity is a sub-identity of the primary identity.
The identity recognition model comprises a feature generator and a pattern matcher:
the feature generator is configured to divide the target speech data into a plurality of audio frames, extract the spectral parameters and linear prediction parameters of each audio frame (the process of extracting the spectral parameters and linear prediction parameters of each audio frame may refer to step S401 described above), combine the spectral parameters of all audio frames into the spectral parameters of the target speech data, and combine the linear prediction parameters of all audio frames into the linear prediction parameters of the target speech data. The spectral parameters of the target speech data and the linear prediction parameters of the target speech data are combined into the voiceprint features of the target speech data (referred to as target voiceprint features) in a predetermined order.
The pattern matcher is used for identifying similarity (or matching probability) between a target voiceprint feature and at least one template voiceprint feature, the obtained at least one matching probability is used as a matching result, and the template voiceprint feature is the voiceprint feature of template voice data (the extraction process of the template voiceprint feature is the same as that of the target biological feature).
Since the template voice data is voice data corresponding to the identity of the template user, the similarity (or matching result) between the target biometric feature and the at least one template voiceprint feature is equal to the matching degree between the target voice data and the identity of the at least one template user.
The pattern matcher is a model that may have a prediction classification function, such as a Back Propagation (BP) neural network model, a convolutional neural network model, or various regression models (e.g., a linear regression model, a logistic regression model).
Step S403, if there is a matching result that meets the matching condition in at least one matching result, taking the template user identity corresponding to the matching result that meets the matching condition as the target user identity.
Specifically, the server obtains a preset probability threshold. And if the matching result is greater than the preset probability threshold, the matching result is the matching result meeting the matching condition.
And when at least one acquired matching result has a matching result meeting the matching condition, taking the template user identity corresponding to the matching result meeting the matching condition as the target user identity.
Step S404, extracting a target identity corresponding to the target user identity from a secondary identity set corresponding to the at least one template user identity; and the secondary identification in the secondary identification set is a sub-identification of the primary identification.
Specifically, each template user identity has a secondary identity corresponding to the template user identity, the secondary identity of each template user identity is a sub-identity of the primary identity, and the secondary identities of all template user identities can be combined into a secondary identity set.
The server may extract, from the secondary identity set, a secondary identity of the target user identity (i.e., the template user identity corresponding to the matching result that satisfies the matching condition) as the target secondary identity.
For example, the matching probability between the target voiceprint feature of the target speech data and the template voiceprint feature 1 (corresponding to the template user identity 1) is 0.1, the matching probability between the target voiceprint feature and the template voiceprint feature 2 (corresponding to the template user identity 2) is 0.8, and the matching probability between the target voiceprint feature and the template voiceprint feature 3 (corresponding to the template user identity 3) is 0.1. If the preset probability threshold is 0.5, it indicates that the matching result between the target biological feature and the template voiceprint feature 2 meets the matching condition, and the server may use the template user identity 2 as the target user identity and use the secondary identity of the template user identity as the target secondary identity.
Optionally, when at least one obtained matching result has a matching result that meets the matching condition, the server may send an animation playing instruction to the client, so that the client plays the target animation according to the animation playing instruction, where the target animation may be a lightweight animation.
Subsequently, when the execution of the service instruction corresponding to the service intention is completed, the server may send an animation playing stopping instruction to the client, so that the client stops the target animation according to the animation playing stopping instruction.
The above-described steps S403 to S404 describe the case when there is a matching result satisfying the matching condition in the acquired at least one matching result, and the following describes the case when there is no matching result satisfying the matching condition in the acquired at least one matching result:
and if the matching result is smaller than or equal to the preset probability threshold, the matching result is the matching result which does not meet the matching condition.
When at least one of the obtained matching results does not have a matching result that satisfies the matching condition (or when none of the obtained matching results satisfies the matching condition), the server may create a user identity for the current user (referred to as a target user identity), create a secondary identity for the target user identity (referred to as a target secondary identity), and set the target secondary identity as a sub-identity of the primary identity.
The server can also identify age information corresponding to the target voice data, and searches an image matched with the age information from the image material library to be used as the identity head portrait.
The server can store the target user identity, the target secondary identity and the identity head portrait in an associated mode.
Subsequently, the server may take the target user identity as a new template user identity and add the target secondary user identity to the set of target secondary identities.
For example, the matching probability between the target voiceprint feature of the target speech data and the template voiceprint feature 1 (corresponding to the template user identity 1) is 0.1, the matching probability between the target voiceprint feature and the template voiceprint feature 2 (corresponding to the template user identity 2) is 0.2, and the matching probability between the target voiceprint feature and the template voiceprint feature 3 (corresponding to the template user identity 3) is 0.2. If the preset probability threshold is 0.5, it indicates that there is no matching result satisfying the matching condition in the 3 matching results, the server may recreate the target user identity (e.g., user identity 4) and recreate the target secondary identity for the target user identity, and set the recreated target secondary identity as a sub-identity of the primary identity.
Optionally, the above describes a usage process of the identity recognition model, and the following describes a training process of the identity recognition model, where the training process takes as an example that one template user identity and corresponding template voice data perform model training once:
the server acquires template voice data of the identity of the template user, and generates a tag vector (called an identity tag vector) of the template voice data, wherein the identity tag vector is used for identifying the identity of the template user to which the template voice data belongs.
And acquiring an initial classification model, predicting the matching degree between the template voice data and at least one template user identity based on the initial classification model, and combining the acquired matching degrees into an identity prediction vector.
Determining a difference between the identity label vector and the identity prediction vector as a classification error, and back-propagating the classification error to the initial classification model to adjust model parameters in the initial classification model.
For example, there are 3 template user identities (template user identity 1, template user identity 2, and template user identity 3, respectively), and the template user identity 2 is currently trained, so the identity label vector of the template voice data of the template user identity 2 is: [0,1,0]. If the initial classification model predicts that the matching degree between the template voice data of the template user identity 2 and the template user identity 1 is 0.4, the matching degree between the template voice data and the template user identity 2 is 0.3, and the matching degree between the template voice data and the template user identity 3 is 0.3, the identity prediction vector is as follows: [0.4,0.3,0.3]. The classification error may be: (0-0.4)2+(1-0.3)2+(0-0.3)20.41. And reversely propagating the calculated classification error to the initial classification model so as to adjust the model parameters in the initial classification model.
The server can continuously train the initial classification model by adopting the above mode, and when the training times reach the time threshold value or when the variation of the model parameter adjusted twice is small, the trained initial classification model can be used as the identity recognition model.
As can be seen from the foregoing, the server may newly create a target user identity and a target secondary identity, where the newly created target user identity is to be used as a new template user identity, in this case, the server needs to retrain the identity recognition model, and the new identity recognition model needs to add a new category output based on the original identity recognition model, where the new category output is used to output a probability that the voice data belongs to the new target user identity.
According to the method and the device, the target secondary identity corresponding to the user identity can be determined by identifying the user identity and the service intention which generate the current biological information, so that a service instruction executed based on the target secondary identity not only meets the current service intention of the user, but also is matched with the user identity; furthermore, the method and the device only need to acquire the target biological information of the user, can simultaneously determine the service intention and the user identity of the user, and the user does not need to perform twice operations of determining the service intention and determining the user identity, so that the operation cost of the user can be reduced, and the efficiency of executing the service instruction matched with both the service intention and the user identity of the user by the terminal can be improved.
Please refer to fig. 7, which is a schematic flow chart of another data processing method provided in the embodiment of the present application, where the data processing may include the following steps:
in step S501, the flow starts.
Step S502, the server acquires voice data.
Specifically, when the primary account (which may correspond to the primary id in the present application) logs in the client, that is, the primary account is in an active state, the server receives the voice data sent by the client, where the voice data is data collected by the client when the user inputs "enter sub-account" in the client by voice.
In step S503, the server recognizes the service intention of the voice data as an intention to access the secondary account, the server extracts the target voiceprint feature of the voice data based on the identity recognition model, and the specific process of extracting the target voiceprint feature may refer to step S402 in the embodiment corresponding to fig. 6.
Step S504, the server carries out pattern matching on the extracted target voiceprint characteristics and the existing template voiceprint characteristics.
Step S505, the server determines whether the existing template voiceprint characteristics have template voiceprint characteristics matched with the target biological characteristics according to the pattern matching result, and if yes, the step S507-step S508 are executed; if not, step S506 and step S508 are executed.
Step S506, the server creates a new secondary account (which may correspond to the target secondary id in the present application) according to the intention of accessing the secondary account, establishes an association relationship between the secondary account and the extracted target voiceprint feature, and logs in the newly created secondary account to the client.
Step S507, the server searches for a secondary account (which may correspond to the target secondary identity in the present application) corresponding to the matched template voiceprint feature, where the secondary account is an existing secondary account, and the server returns service data under the secondary account to the client.
Step S508 ends the flow.
When the client in the above steps is a video client and the video client is installed in the smart television, each family member sharing the smart television uniquely corresponds to one user identity and a secondary account (which may correspond to a secondary identity in the present application) through the voiceprint feature, and the server may determine the viewing history, the concerned movie, and the voiceprint feature of each family member based on the unique secondary account, thereby achieving personalized recommendation.
The following scenario is described by taking an example that the user a has created a secondary account of the client, but the user B has not created a secondary account: the client collects the voice data 'enter sub account' input by the user.
Please further refer to fig. 8, which is a timing chart of another data processing method according to an embodiment of the present application, where the data processing method includes the following steps:
step S601, the client collects the voice data of the user A.
Specifically, when the current primary account logs in the client in the terminal device, that is, the primary account is in an active state, the user inputs a login sub-account to the client by voice, and the client collects voice data of the user inputting the login sub-account.
Step S602, the client sends the voice data to the server.
Step S603, the server determines the service intention through semantic recognition, and matches the service intention with a corresponding secondary account through voiceprint recognition, logs in the secondary account to the client, finds behavior data (e.g., a historical viewing record, a focused video, a commented video, a search record, etc.) under the secondary account, and generates recommendation data according to the behavior data.
In step S604, the server returns the recommended data to the client.
Step S605, the client acquires the voice data of the user B: "login to sub-account".
Step S606, the client uploads the voice data of the user B to the server.
Step S607, the server determines the service intention through semantic recognition, identifies the unmatched secondary account through voiceprint, creates a secondary account, uses the created secondary account as a sub-account of the primary account, and logs in the client with the created secondary account.
In step S608, the user B generates viewing behavior data (viewing video, focused video, commented video, search record, etc.) based on the newly created secondary account in the client.
In step S609, the client uploads the viewing behavior data of the user B to the server.
And step S610, the server stores the film watching behavior data and the newly-built secondary account in a correlation mode, and the data are used for subsequently generating personalized recommendation data aiming at the user B.
Further, please refer to fig. 9, which is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 9, the data processing apparatus 1 may be applied to the server in the corresponding embodiment of fig. 3 to 8, and the data processing apparatus 1 may include: a first obtaining module 11, a recognition module 12, a second obtaining module 13 and a determination module 14.
The first obtaining module 11 is configured to obtain target biological information when the primary identity is in an active state;
the identification module 12 is used for identifying the business intention and the target user identity corresponding to the target biological information;
a second obtaining module 13, configured to obtain a target secondary identity identifier corresponding to the target user identity; the target secondary identity is a sub-identity of the primary identity;
and the determining module 14 is configured to execute a service instruction corresponding to the service intention based on the target secondary identity.
For specific functional implementation manners of the first obtaining module 11, the identifying module 12, the second obtaining module 13, and the determining module 14, reference may be made to steps S101 to S104 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 9, the target bio-information includes target voice data;
the identification module 12 may include: a conversion unit 121, a calling unit 122, and a first determination unit 123.
A conversion unit 121, configured to convert the target voice data into text data, and semantically identify the text data to obtain the service intention;
the calling unit 122 is configured to call an identity recognition model corresponding to the primary identity to determine a matching result between the target voice data and at least one template user identity; the identity recognition model is a classification model generated according to the at least one template user identity and template voice data respectively corresponding to the at least one template user identity;
a first determining unit 123, configured to, if a matching result meeting a matching condition exists in at least one matching result, take a template user identity corresponding to the matching result meeting the matching condition as the target user identity;
the second obtaining module 13 may include: a first extraction unit 131.
A first extracting unit 131, configured to extract a target identity corresponding to the target user identity from a secondary identity set corresponding to the at least one template user identity; and the secondary identification in the secondary identification set is a sub-identification of the primary identification.
The identification module 12 may further include: a second determination unit 124.
A second determining unit 124, configured to create the target user identity if there is no matching result that meets the matching condition in the at least one matching result, identify age information corresponding to the target voice data, and search an identity avatar matching the age information in an image material library;
the second obtaining module 13 may include: a second extraction unit 132.
A second extracting unit 132, configured to create the target secondary user identifier for the target user identity, set the target secondary user identifier as a sub-identifier of the primary identity identifier, and perform associated storage on the target user identity, the target secondary identity identifier, and the identity avatar.
For specific processes of the converting unit 121, the calling unit 122, the first determining unit 123, the second determining unit 124, the first extracting unit 131, and the second extracting unit 132, reference may be made to steps S401 to S404 in the embodiment corresponding to fig. 6, which is not described herein again.
When the first determining unit 123 and the first extracting unit 131 determine the target user identity and the target secondary identity, the second determining unit 124 and the second extracting unit 132 do not perform the corresponding steps; when the second determining unit 124 and the second extracting unit 132 determine the target user identity and the target secondary identity, the first determining unit 123 and the first extracting unit 131 do not perform the corresponding steps.
Referring to fig. 9, the identification model includes a feature generator and a pattern matcher;
the calling unit 122 may include: an extraction subunit 1221 and a matching subunit 1222.
An extracting sub-unit 1221 configured to extract a target voiceprint feature of the target speech data based on the feature generator;
a matching subunit 1222, configured to determine, based on the pattern matcher, matching probabilities between the target voiceprint feature and at least one template voiceprint feature, where the obtained matching probabilities are all used as matching results; the at least one template voiceprint feature is a voiceprint feature corresponding to the at least one template voice data respectively;
an extracting subunit 1221, configured to specifically extract, based on the feature generator, a spectrum parameter and a linear prediction parameter of the target voice data, and obtain the target voiceprint feature according to the spectrum parameter and the linear prediction parameter; the frequency spectrum parameter is a short-time spectrum characteristic parameter of the target voice data; the linear prediction parameters are spectrum fitting characteristic parameters of the target speech data.
The specific processes of the extracting subunit 1221 and the matching subunit 1222 may refer to step S402 in the embodiment corresponding to fig. 6, which is not described herein again.
Referring to fig. 9, the service intention includes a client-side secondary login object switching intention;
the determining module 14 includes: a first generating unit 141.
A first generating unit 141, configured to generate a switching instruction corresponding to the switching intention of the client secondary login object, and use the target secondary identity as a secondary login object of the client according to the switching instruction; the switching instruction belongs to the service instruction.
The specific process of the first generating unit 141 may refer to step S104 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 9, the business intent includes a business data query intent;
the determining module 14 may include: a second generating unit 142.
A second generating unit 142, configured to generate a query instruction corresponding to the service data query intention, query target service data corresponding to the target secondary identity, and return the target service data to a client; the query instruction belongs to the service instruction.
The specific process of the second generating unit 142 may refer to step S104 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 9, the data processing apparatus 1 may include: the first obtaining module 11, the identifying module 12, the second obtaining module 13, and the determining module 14 may further include: a storage module 15, a training module 16 and a playing module 17.
The storage module 15 is configured to acquire behavior data of the user in the client corresponding to the target secondary identity, and perform associated storage on the behavior data and the target secondary identity; the behavior data is used for generating recommended service data for the user.
The training module 16 is configured to obtain template voice data corresponding to an identity of a template user, generate an identity tag vector corresponding to the template voice data, obtain an initial classification model, predict a matching degree between the template voice data and the identity of the at least one template user based on the initial classification model, obtain an identity prediction vector according to the obtained matching degree, determine a classification error according to the identity tag vector and the identity prediction vector, and train the initial classification model according to the classification error to obtain the identity recognition model.
The playing module 17 is configured to send an animation playing instruction to a client to instruct the client to play a target animation when a matching result meeting the matching condition exists in the at least one matching result;
the playing module 17 is further configured to send an animation playing stopping instruction to the client when the execution of the service instruction is completed, and instruct the client to close the target animation.
The specific processes of the storage module 15, the training module 16, and the playing module 17 may refer to step S404 in the embodiment corresponding to fig. 6, which is not described herein again.
Further, please refer to fig. 10, which is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The server in the embodiments corresponding to fig. 3 to fig. 8 may be an electronic device 1000, as shown in fig. 10, where the electronic device 1000 may include: a user interface 1002, a processor 1004, an encoder 1006, and a memory 1008. Signal receiver 1016 is used to receive or transmit data via cellular interface 1010, WIFI interface 1012. The encoder 1006 encodes the received data into a computer-processed data format. The memory 1008 has stored therein a computer program by which the processor 1004 is arranged to perform the steps of any of the method embodiments described above. The memory 1008 may include volatile memory (e.g., dynamic random access memory DRAM) and may also include non-volatile memory (e.g., one time programmable read only memory OTPROM). In some examples, the memory 1008 can further include memory located remotely from the processor 1004, which can be connected to the electronic device 1000 via a network. The user interface 1002 may include: a keyboard 1018, and a display 1020.
In the electronic device 1000 shown in fig. 10, the processor 1004 may be configured to call the memory 1008 to store a computer program to implement:
when the primary identity mark is in an effective state, acquiring target biological information;
identifying a business intention and a target user identity corresponding to the target biological information;
acquiring a target secondary identity corresponding to the target user identity; the target secondary identity is a sub-identity of the primary identity;
and executing a service instruction corresponding to the service intention based on the target secondary identity.
It should be understood that the electronic device 1000 described in the embodiment of the present invention may perform the description of the data processing method in the embodiment corresponding to fig. 3 to fig. 8, and may also perform the description of the data processing apparatus 1 in the embodiment corresponding to fig. 9, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
Further, here, it is to be noted that: an embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores the aforementioned computer program executed by the data processing apparatus 1, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the data processing method in the embodiment corresponding to fig. 3 to 8 can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium to which the present invention relates, reference is made to the description of the method embodiments of the present invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (14)

1. A data processing method, comprising:
when the primary identity mark is in an effective state, acquiring target biological information;
identifying a business intention and a target user identity corresponding to the target biological information;
acquiring a target secondary identity corresponding to the target user identity; the target secondary identity is a sub-identity of the primary identity;
and executing a service instruction corresponding to the service intention based on the target secondary identity.
2. The method of claim 1, wherein the target biometric information comprises target speech data;
the identifying the business intention and the target user identity corresponding to the target biological information comprises:
converting the target voice data into text data, and performing semantic recognition on the text data to obtain the service intention;
calling an identity recognition model corresponding to the primary identity to determine a matching result between the target voice data and at least one template user identity; the identity recognition model is a classification model generated according to the at least one template user identity and template voice data respectively corresponding to the at least one template user identity;
if at least one matching result meets the matching condition, taking the template user identity corresponding to the matching result meeting the matching condition as the target user identity;
the obtaining of the target secondary identity corresponding to the target user identity includes:
extracting a target identity corresponding to the target user identity from a secondary identity set corresponding to the at least one template user identity; and the secondary identification in the secondary identification set is a sub-identification of the primary identification.
3. The method of claim 2, further comprising:
if the matching result meeting the matching condition does not exist in the at least one matching result, establishing the target user identity;
identifying age information corresponding to the target voice data, and searching an identity head portrait matched with the age information in an image material library;
the obtaining of the target secondary identity corresponding to the target user identity includes:
creating the target secondary user identification for the target user identity;
setting the target secondary user identification as a sub-identification of the primary identity identification;
and performing associated storage on the target user identity, the target secondary identity and the identity head portrait.
4. The method of claim 2, wherein the identification model comprises a feature generator and a pattern matcher;
the calling of the identity recognition model corresponding to the primary identity to determine the matching result between the target voice data and at least one template user identity comprises:
extracting a target voiceprint feature of the target voice data based on the feature generator;
determining the matching probability between the target voiceprint feature and at least one template voiceprint feature based on the pattern matcher, and taking the obtained matching probabilities as matching results; the at least one template voiceprint feature is a voiceprint feature corresponding to the at least one template voice data respectively.
5. The method of claim 4, wherein extracting the target voiceprint features of the target speech data based on the feature generator comprises:
extracting a spectrum parameter and a linear prediction parameter of the target voice data based on the feature generator; the frequency spectrum parameter is a short-time spectrum characteristic parameter of the target voice data; the linear prediction parameters are frequency spectrum fitting characteristic parameters of the target voice data;
and obtaining the target voiceprint characteristics according to the frequency spectrum parameters and the linear prediction parameters.
6. The method of claim 2, further comprising:
acquiring template voice data corresponding to the identity of a template user;
generating an identity tag vector corresponding to the template voice data;
acquiring an initial classification model, predicting the matching degree between the template voice data and the identity of the at least one template user based on the initial classification model, and acquiring an identity prediction vector according to the acquired matching degree;
and determining a classification error according to the identity label vector and the identity prediction vector, and training the initial classification model according to the classification error to obtain the identity recognition model.
7. The method of claim 2, further comprising:
when the matching result meeting the matching condition exists in the at least one matching result, sending an animation playing instruction to a client to instruct the client to play a target animation;
and when the execution of the service instruction is finished, sending an animation playing stopping instruction to the client, and indicating the client to close the target animation.
8. The method of claim 1, wherein the business intent comprises a client-side secondary login object switching intent;
the executing the business instruction corresponding to the business intention based on the target secondary identity comprises the following steps:
generating a switching instruction corresponding to the switching intention of the secondary login object of the client; the switching instruction belongs to the service instruction;
and according to the switching instruction, taking the target secondary identity as a secondary login object of the client.
9. The method of claim 8, further comprising:
acquiring behavior data of a user in the client corresponding to the target secondary identity; the behavior data is used for generating recommended service data for the user;
and performing associated storage on the behavior data and the target secondary identity.
10. The method of claim 1, wherein the business intent comprises a business data query intent;
the executing the business instruction corresponding to the business intention based on the target secondary identity comprises the following steps:
generating a query instruction corresponding to the business data query intention; the query instruction belongs to the service instruction;
and inquiring target service data corresponding to the target secondary identity, and returning the target service data to the client.
11. The method of claim 1, wherein the target secondary identity has the same user rights as the primary identity.
12. A data processing apparatus, comprising:
the first acquisition module is used for acquiring the target biological information when the primary identity is in an effective state;
the identification module is used for identifying the business intention and the target user identity corresponding to the target biological information;
the second acquisition module is used for acquiring a target secondary identity corresponding to the target user identity; the target secondary identity is a sub-identity of the primary identity;
and the determining module is used for executing the service instruction corresponding to the service intention based on the target secondary identity.
13. An electronic device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any of claims 1-11.
14. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method according to any one of claims 1-11.
CN201911206373.3A 2019-11-29 2019-11-29 Data processing method and device, electronic equipment and storage medium Pending CN112883350A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911206373.3A CN112883350A (en) 2019-11-29 2019-11-29 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911206373.3A CN112883350A (en) 2019-11-29 2019-11-29 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112883350A true CN112883350A (en) 2021-06-01

Family

ID=76039056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911206373.3A Pending CN112883350A (en) 2019-11-29 2019-11-29 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112883350A (en)

Similar Documents

Publication Publication Date Title
CN111933129B (en) Audio processing method, language model training method and device and computer equipment
US20210142794A1 (en) Speech processing dialog management
US10395655B1 (en) Proactive command framework
CN109155132A (en) Speaker verification method and system
US11494434B2 (en) Systems and methods for managing voice queries using pronunciation information
EP3424044A1 (en) Modular deep learning model
US20200012724A1 (en) Bidirectional speech translation system, bidirectional speech translation method and program
US10224030B1 (en) Dynamic gazetteers for personalized entity recognition
CN113168832A (en) Alternating response generation
CN112071330B (en) Audio data processing method and device and computer readable storage medium
US10504512B1 (en) Natural language speech processing application selection
CN113505198B (en) Keyword-driven generation type dialogue reply method and device and electronic equipment
US11532301B1 (en) Natural language processing
CN112131359A (en) Intention identification method based on graphical arrangement intelligent strategy and electronic equipment
US20210034662A1 (en) Systems and methods for managing voice queries using pronunciation information
CN112669842A (en) Man-machine conversation control method, device, computer equipment and storage medium
CN115171731A (en) Emotion category determination method, device and equipment and readable storage medium
CN116417003A (en) Voice interaction system, method, electronic device and storage medium
CN110853669B (en) Audio identification method, device and equipment
US11410656B2 (en) Systems and methods for managing voice queries using pronunciation information
KR102389995B1 (en) Method for generating spontaneous speech, and computer program recorded on record-medium for executing method therefor
US11798538B1 (en) Answer prediction in a speech processing system
US11626107B1 (en) Natural language processing
CN113763925B (en) Speech recognition method, device, computer equipment and storage medium
CN114373443A (en) Speech synthesis method and apparatus, computing device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40047328

Country of ref document: HK