WO2022095380A1 - 基于ai的虚拟交互模型生成方法、装置、计算机设备及存储介质 - Google Patents

基于ai的虚拟交互模型生成方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022095380A1
WO2022095380A1 PCT/CN2021/091300 CN2021091300W WO2022095380A1 WO 2022095380 A1 WO2022095380 A1 WO 2022095380A1 CN 2021091300 W CN2021091300 W CN 2021091300W WO 2022095380 A1 WO2022095380 A1 WO 2022095380A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
target
information
visited
person
Prior art date
Application number
PCT/CN2021/091300
Other languages
English (en)
French (fr)
Inventor
满园园
陈闽
章淑婷
刘喜声
宋思宇
高毅
王文杰
蔡静
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022095380A1 publication Critical patent/WO2022095380A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present application relates to the technical field of artificial intelligence speech semantics, and in particular, to an AI-based virtual interaction model generation method, device, computer equipment and storage medium.
  • online video conferencing as a user communication method has been more and more widely used. For example, due to long distances (for example, the users who need to communicate are located in different cities) or it is not convenient to meet face-to-face communication, you can choose online video conferencing.
  • Video conferencing for long-distance online communication. For example, while the new crown epidemic has not yet ended, there are certain security risks in offline face-to-face communication between people, and the demand for online video communication is increasing day by day.
  • the video conference initiator invites the video conference receiver to the video conference, it needs to obtain the information acquisition requirements of the video conference receiver, and then manually collect and organize the conference communication materials according to the information acquisition requirements. After familiarizing with the communication materials, conduct a video conference with the video conference recipient.
  • the inventor realized that obtaining the conference communication data in this way is not only inefficient, the conference communication data is difficult to accurately correlate with this video conference, and the video conference initiator can only rely on inefficient artificial memory when familiar with the conference communication data. .
  • the embodiments of the present application provide an AI-based virtual interaction model generation method, device, computer equipment and storage medium, aiming to solve the problem in the prior art that the video conference initiator manually organizes conference communication materials according to the information acquisition requirements of the video conference receiver , not only inefficiency, but also the inaccuracy of conference communication materials and the difficulty of this video conference.
  • an embodiment of the present application provides an AI-based virtual interaction model generation method, which includes:
  • the requester information includes the first proficiency parameter, the first Two proficiency parameters and target user portrait
  • the to-be-visited person information includes the to-be-visited person's user portrait and the to-be-visited person's product demand information
  • the recommendation strategy is used to extract several key tags in the user portrait of the person to be visited to generate requestor recommendation information with the product demand information of the person to be visited, and extract several key tags in the user portrait of the person to be visited to be compared with the person to be visited.
  • the product demand information of the person to be visited generates the recommendation information of the person to be visited.
  • an AI-based virtual interaction model generation device which includes:
  • a user portrait obtaining unit configured to obtain a locally stored target user portrait corresponding to the client, and randomly obtain a locally stored client user portrait if the virtual interactive object generation instruction sent by the client is detected;
  • the first classification unit is used to call the first classification model to obtain the first classification result corresponding to the target user portrait and the customer user portrait, and according to the first classification result, in the locally stored explanation text library Obtain the corresponding target explanation text, and send the target explanation text to the client;
  • the first parameter acquisition unit is configured to receive the speech data of the explanation text and practice voice sent by the user terminal, and perform similarity calculation between the speech recognition text corresponding to the explanation text and speech data and the target explanation text, and obtain a similarity between the speech recognition text corresponding to the explanation text speech data and the target explanation text. the corresponding first proficiency parameter;
  • the second classification unit is configured to perform feature extraction according to the first proficiency parameter, the target user portrait and the randomly obtained customer user portrait to obtain a feature set, and perform the feature set according to the called second classification model. Classification to obtain the second classification result;
  • a virtual interaction model acquiring unit configured to acquire the target AI virtual interaction model corresponding to the second classification result in the locally stored AI virtual interaction model library
  • an interactive voice acquisition unit configured to receive interactive voice data corresponding to the user terminal and the target AI virtual interaction model
  • a second parameter obtaining unit configured to perform similarity calculation between the interactive voice data and the target standard voice data corresponding to the target AI virtual interaction model, to obtain a second proficiency parameter corresponding to the client;
  • the visit information acquisition unit is configured to acquire the requester information and the information of the person to be visited corresponding to the information acquisition instruction of the person to be visited if the instruction for acquiring the information of the person to be visited uploaded by the client is detected; wherein, the information of the requester includes: The first proficiency parameter, the second proficiency parameter, and the target user portrait, the person to be visited information includes the user portrait of the person to be visited and the product demand information of the person to be visited; and
  • the recommendation information generating unit is used for calling the pre-stored information recommendation strategy, and according to the requester information, the information of the person to be visited, the product demand information of the person to be visited and the information recommendation strategy, to generate the recommendation information of the requester and the recommendation of the person to be visited information; wherein, the information recommendation strategy is used to extract several key tags in the user portrait of the person to be visited to generate requestor recommendation information with the product demand information of the person to be visited, and to extract several key tags in the user portrait of the person to be visited A key tag is used to generate the recommendation information of the person to be visited with the product demand information of the person to be visited.
  • an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer
  • the program implements the following steps:
  • the requester information includes the first proficiency parameter, the first Two proficiency parameters and target user portrait
  • the to-be-visited person information includes the to-be-visited person's user portrait and the to-be-visited person's product demand information
  • the recommendation strategy is used to extract several key tags in the user portrait of the person to be visited to generate requestor recommendation information with the product demand information of the person to be visited, and extract several key tags in the user portrait of the person to be visited to be compared with the person to be visited.
  • the product demand information of the person to be visited generates the recommendation information of the person to be visited.
  • embodiments of the present application further provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when executed by a processor, the computer program causes the processor to perform the following operations :
  • the requester information includes the first proficiency parameter, the first Two proficiency parameters and target user portrait
  • the to-be-visited person information includes the to-be-visited person's user portrait and the to-be-visited person's product demand information
  • the recommendation strategy is used to extract several key tags in the user portrait of the person to be visited to generate requestor recommendation information with the product demand information of the person to be visited, and extract several key tags in the user portrait of the person to be visited to be compared with the person to be visited.
  • the product demand information of the person to be visited generates the recommendation information of the person to be visited.
  • the embodiments of the present application provide an AI-based virtual interaction model generation method, device, computer equipment and storage medium.
  • the first classification model is called to obtain the first classification model corresponding to the target user portrait and the customer user portrait.
  • the corresponding target explanation text is obtained from the locally stored explanation text library according to the first classification result, and the target explanation text is sent to the user terminal, and then obtained from the locally stored AI virtual interaction model library.
  • the target AI virtual interaction model corresponding to the second classification result finally invokes the pre-stored information recommendation strategy, and generates the requester according to the requester information, the information of the person to be visited, the product demand information of the person to be visited, and the information recommendation strategy.
  • the recommendation information and the recommendation information of the person to be visited are learned and practiced based on the target explanation text recommended by the server according to the user portrait. There is no need to manually organize the target explanation text, which improves the acquisition efficiency of the target explanation text.
  • FIG. 1 is a schematic diagram of an application scenario of an AI-based virtual interaction model generation method provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for generating an AI-based virtual interaction model provided by an embodiment of the present application
  • FIG. 3 is a schematic block diagram of an apparatus for generating an AI-based virtual interaction model provided by an embodiment of the present application
  • FIG. 4 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an application scenario of an AI-based virtual interaction model generation method provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of an AI-based virtual interaction model generation method provided by an embodiment of the application. , the AI-based virtual interaction model generation method is applied in the server, and the method is executed by application software installed in the server.
  • the method includes steps S101-S109.
  • One is the user terminal, the user of which is the visitor, who can communicate online with the person to be visited (which can also be understood as the person to be visited) by establishing an online video.
  • the second is another user terminal, the user of which is the person to be visited, and can communicate with the visitor online by establishing an online video.
  • the client and another client communicate with each other in online video, they are both connected to the server for communication.
  • the third is the server.
  • the target explanation text can be generated for the visitor to study the data before the video conference, and the target AI virtual interaction model can also be generated to communicate with the visitor.
  • the visitor conducts a simulated video conference interactive exercise, and can also generate recommended information about the visitor that is pushed to the visitor for viewing before the online video communication between the visitor and the visitor.
  • the visitor before the online video communication with the visited person, the visitor can combine the customer user portrait corresponding to the person to be visited and the target user portrait corresponding to the visiting person, and push personalized explanations to the visiting person according to the classification model. Scenario (that is, the follow-up target explanatory text).
  • the visitor can learn the personalized explanation plan, and the server records the degree of mastery of the visitor on the personalized explanation plan in this process.
  • S102 Invoke a first classification model to obtain a first classification result corresponding to the target user portrait and the customer user portrait, and obtain a corresponding target explanation from a locally stored explanation text library according to the first classification result text, and send the target explanation text to the client.
  • the tag set composed of the target user portrait and the tags included in the customer user portrait can be used as the input of the first classification model, so as to The corresponding first classification result is obtained by calculation. Since explanation texts corresponding to various classification results are pre-stored locally on the server, after the first classification result is obtained, the target explanation text corresponding to the first classification result can be obtained, and then sent to The user terminal, so that the visitor can view the explanation text of the target and then carry out the explanation exercise.
  • step S102 includes:
  • the first classification model is a convolutional neural network model
  • the preset keyword screening strategy can be invoked to obtain the above-mentioned keywords.
  • the core keywords in the keywords are used to form a tag keyword set.
  • the first classification model is a convolutional neural network model, which can accurately classify.
  • the server After obtaining the first classification result, the server obtains the target explanation text corresponding to the first classification result in the locally stored explanation text library, and sends the target explanation text to the client.
  • the visitor can directly view the target explanation text on the display of the user terminal, so as to realize the process of pre-visiting the data.
  • S103 Receive the speech data of the instruction text and practice voice data sent by the client, and perform similarity calculation between the speech recognition text corresponding to the instruction text and the speech data and the target instruction text to obtain a first proficiency level corresponding to the client parameter.
  • the practice mode can be turned on on the user terminal, that is, after the audio recording function on the client terminal is turned on, the visitor reads aloud according to the target explanation text, so that The sound data collected by the client is the speech data of the teaching text corresponding to the target teaching text.
  • the client When a complete exercise is completed, the client will send the text and exercise voice data to the server to evaluate the proficiency of the exercise.
  • the speech recognition text corresponding to the speech data of the explanation text and the target explanation text may be subjected to similarity calculation to obtain the first proficiency parameter corresponding to the client.
  • step S103 includes:
  • the speech recognition text is correspondingly divided into sections to obtain a voice text section set, and obtain the target text section set of the target explanation text; wherein, the voice text section set;
  • a section set includes a plurality of sub-speech texts, the target text section set includes a plurality of target sub-speech texts, and the total number of sub-speech texts in the speech-text section set is the same as the number of target sub-speech texts in the target text section set. The total number is the same;
  • the Euclidean distance between the text semantic vector and the target text semantic vector is calculated as the first proficiency parameter.
  • a speech recognition model (such as an N-gram model, an N-gram model, That is, the multivariate model) performs speech recognition on the speech data of the explanation text, and obtains the speech recognition text.
  • calculating the similarity between the acquired speech recognition text and the target explanation text is calculating the similarity between the texts.
  • the text can be segmented to calculate the comprehensive text similarity.
  • the separator between paragraphs is used as a section break, and the target explanation text can be divided into multiple paragraphs according to the section break (ie, multiple target sub-speech texts, the above-mentioned multiple targets The sub-speech texts make up the target text subsection set).
  • the speech recognition text corresponding to the recorded speech data of the explanation text should be roughly the same as the target explanation text, so when the speech recognition text is divided into segments Reference may also be made to the section break of the target explanation text, so as to segment the speech recognition text correspondingly to obtain a voice text section set (the voice text section set includes a plurality of sub-speech texts).
  • each sub-speech text is subjected to word segmentation (word segmentation by statistical word segmentation method), keyword extraction (keyword extraction by TF-IDF model), word vector transformation (word vector conversion by word2vec model) and semantics.
  • word segmentation word segmentation by statistical word segmentation method
  • keyword extraction keyword extraction by TF-IDF model
  • word vector transformation word vector conversion by word2vec model
  • semantics semantics.
  • the vector acquisition is performed to obtain sub-text semantic vectors corresponding to each sub-speech text respectively, so as to be concatenated to form a text semantic vector.
  • the weight value corresponding to each word vector is obtained first, and then the weight sum of each word vector is calculated to obtain the semantic vector.
  • the Euclidean distance between the two can be calculated as the first proficiency parameter. Among them, the smaller the Euclidean distance between the two, the closer the two are.
  • S104 Perform feature extraction according to the first proficiency parameter, the target user portrait and the randomly obtained customer user portrait to obtain a feature set, and classify the feature set according to the called second classification model to obtain a second classification result.
  • the server can use the first proficiency parameter, the target user portrait and the randomly obtained customer
  • the user portrait is subjected to feature extraction to obtain a feature set, and then the feature set is classified according to the pre-trained second classification model, thereby obtaining a second classification result.
  • the second classification model is also a convolutional neural network model.
  • the extracted feature set can better reflect the proficiency of the visitor in the target explanation text and the explanation field that the visitor is good at.
  • step S104 includes:
  • the first proficiency parameter, the first keyword vector and the second keyword vector are sequentially concatenated to form a feature set.
  • the first user portrait selection strategy in order to obtain the feature set input as the second classification model, can be called first to obtain the first keyword set corresponding to the target user portrait, and the pre-stored The second user portrait selection strategy of , so as to obtain the second keyword set corresponding to the customer user portrait.
  • they are converted into word vectors correspondingly, so that the first proficiency parameter, the first keyword vector and the second keyword vector are concatenated in sequence to form a feature set .
  • feature sets with more reference dimensions can be extracted.
  • the AI virtual interaction model library local to the server stores multiple AI virtual interaction models corresponding to the classification results respectively. For example, if the second classification result is equal to 0.5, then the first AI virtual interaction model is obtained, which is suitable for AI simulation simulation dialogue exercises for people to be visited who are not proficient in the target explanation text; the second classification result is equal to 0.7, then obtain The second AI virtual interaction model is the second AI virtual interaction model, which is suitable for AI simulation and dialogue exercises for people to be visited who have moderate proficiency in the target explanation text; the second classification result is equal to 0.9, then the third AI virtual interaction model is obtained, which is suitable for Target to explain the text proficiency of the visitor to conduct AI simulation simulation dialogue exercises.
  • the target AI virtual interaction model can be understood as an intelligent customer service in essence, which can interact and communicate with users by voice, so as to achieve the effect of simulating a dialogue to simulate an exercise explanation.
  • the user terminal when the user terminal performs voice interaction with the target AI virtual interaction model, the user terminal can collect interactive voice data. Further assessment of practice proficiency is performed.
  • the virtual interaction model of the target AI in the visitor and the server has completed the virtual simulation interaction exercise
  • the vector similarity is calculated between the semantic vector correspondingly converted from the target standard speech data, so as to be used as the second proficiency parameter.
  • step S107 includes:
  • the Euclidean distance between the interactive speech-text semantic vector and the target standard speech-text semantic vector is calculated as the second proficiency parameter.
  • a voice recognition model for example, The N-gram model, namely the multivariate model
  • the similarity between speech texts is to calculate the similarity between texts.
  • steps S101-S103 the initial learning process of the visiting person's explanation text is realized, and in steps S104-S107, the practice process of simulating the interactive exercise of the visiting person is realized.
  • the above process is based on the learning and practice of the target explanatory text recommended by the server according to the user portrait, without manually sorting the target explanatory text, which improves the acquisition efficiency of the target explanatory text.
  • an online video conference connection can be established with the person to be visited, and then communicate with the person to be visited based on the text automatically recommended by the server.
  • the server detects the instruction to obtain the information of the person to be visited uploaded by the client, it means that the client has not established a video connection with another client for online conferences.
  • the user of the other client terminal can efficiently communicate in an online video conference, and the client terminal can first send an instruction to obtain the information of the person to be visited to the server.
  • the server detects the instruction for obtaining the information of the person to be visited sent by the client, the server obtains the requester information and the information of the person to be visited corresponding to the instruction according to the information of the person to be visited.
  • the requester information includes a first proficiency parameter, a second proficiency parameter, and a target user portrait. Because a large amount of historical data about the client user and another client user is stored in the server, based on these historical data, the user portrait corresponding to the client user can be processed and obtained (corresponding to the above-mentioned requester user portrait), and obtain a user portrait corresponding to another client user (which may correspond to the above-mentioned user portrait of the person to be visited).
  • the product demand information of the person to be visited is recorded in the conversation record when the other client user communicates with the client user by telephone or through communication software (such as WeChat, QQ, etc.).
  • the product demand information can be understood as the product purchase intention of the person to be visited.
  • the business person can recommend some products to the consumer through an online meeting.
  • the user portrait of the requester generally has the label of which type of product sales the salesman is proficient in
  • the user portrait of the person to be visited generally has the user label of the consumer (such as which age group, occupation group, salary income belongs to Which income range group)
  • the product demand information corresponding to the person to be visited can be obtained according to the instruction of obtaining the information of the person to be visited.
  • S109 Invoke a pre-stored information recommendation strategy, and generate requestor recommendation information and to-be-visited recommendation information according to the requester information, the to-be-visited information, the to-be-visited product demand information, and the information recommendation strategy;
  • the above information recommendation strategy is used to extract several key tags in the user portrait of the person to be visited to generate requestor recommendation information with the product demand information of the person to be visited, and extract several key tags in the user portrait of the person to be visited to be combined with the user.
  • the product demand information of the person to be visited generates recommendation information of the person to be visited.
  • the server in order to better assist the users of the first type of intelligent terminal to recommend several products to the users of the second type of intelligent terminal, the server can be used for the first type of intelligent terminal in the server before the video connection between the two is carried out.
  • the requester generates recommendation information of the requester and recommendation information of the person to be visited.
  • step S109 includes:
  • the requester recommendation information can be understood as generating requester recommendation information in the server according to the requester user portrait and the product demand information of the person to be visited, that is, the requester user portrait can be screened out according to the information recommendation strategy.
  • the key tag (such as proficient in life insurance product A), and also obtain the product demand information of the person to be visited (for example, it is also a life insurance product A). Search the local database for product introduction information related to the product demand information of the person to be visited (for example, insurance rules, insurance fees, insurance period, and detailed introduction of insurance types of life insurance product A) as the requester's recommendation information.
  • the to-be-visited person recommendation information can be understood as generating the to-be-visited person recommendation information in the server according to the to-be-visited person's user portrait and the to-be-visited person's product demand information, that is, the key points in the to-be-visited user portrait can be screened out according to the information recommendation strategy.
  • Labels (such as middle-aged people, the income range is 20,000-30,000 monthly salary, etc.), and the product demand information of the person to be visited (for example, life insurance product A), can be based on the key labels in the user portrait of the person to be visited and the information to be visited.
  • the human product demand information searches the local database of the server for the words for this type of user tag (the words can guide the first type of intelligent terminal users to communicate with the second type of intelligent terminal users according to the specified sentence sequence) as the to-be-visited People recommend information.
  • the requester recommendation information and the visitor recommendation information generated in the server can be used as guide data for the communication process between the first type of intelligent terminal user and the second type of intelligent terminal user, and the acquisition of these guide data is in the server.
  • the acquisition is automatically generated without the need for manual retrieval by the user, which improves the efficiency of data acquisition.
  • the method realizes learning and practice based on the target explanatory text recommended by the server according to the user portrait, without manual sorting of the target explanatory text, and improves the acquisition efficiency of the target explanatory text.
  • Embodiments of the present application further provide an AI-based virtual interaction model generation apparatus, which is used to execute any of the foregoing AI-based virtual interaction model generation methods.
  • FIG. 3 is a schematic block diagram of an apparatus for generating an AI-based virtual interaction model provided by an embodiment of the present application.
  • the AI-based virtual interaction model generating apparatus 100 may be configured in a server.
  • the AI-based virtual interaction model generation device 100 includes: a user portrait acquisition unit 101, a first classification unit 102, a first parameter acquisition unit 103, a second classification unit 104, a virtual interaction model acquisition unit 105, an interactive The voice acquisition unit 106 , the second parameter acquisition unit 107 , the visit information acquisition unit 108 , and the recommendation information generation unit 109 .
  • the user portrait obtaining unit 101 is configured to obtain a locally stored target user portrait corresponding to the client, and randomly obtain locally stored customer user portraits if a virtual interactive object generation instruction sent by the client is detected.
  • the visitor before conducting online video communication with the person being visited, the visitor can combine the customer user portrait corresponding to the person to be visited and the target user portrait corresponding to the visitor, and send the visitor to the visitor according to the classification model.
  • Push the personalized explanation plan that is, the follow-up target explanation text.
  • the visitor can learn the personalized explanation plan, and the server records the degree of mastery of the visitor on the personalized explanation plan in this process.
  • the first classification unit 102 is used to call the first classification model to obtain the first classification result corresponding to the target user portrait and the customer user portrait, and according to the first classification result, the locally stored explanation text library The corresponding target explanation text is obtained from the system, and the target explanation text is sent to the user terminal.
  • the tag set composed of the target user portrait and the tags included in the customer user portrait can be used as the input of the first classification model, so as to The corresponding first classification result is obtained by calculation. Since explanation texts corresponding to various classification results are pre-stored locally on the server, after the first classification result is obtained, the target explanation text corresponding to the first classification result can be obtained, and then sent to The user terminal, so that the visitor can view the explanation text of the target and then carry out the explanation exercise.
  • the first classification unit 102 includes:
  • the tag keyword set acquiring unit is used to acquire the tags included in the target user portrait and the customer user portrait to form a tag keyword set, and classify the tag keyword set according to the called first classification model to obtain The first classification result; wherein, the first classification model is a convolutional neural network model;
  • the target explanation text obtaining unit is configured to obtain the target explanation text corresponding to the first classification result in the locally stored explanation text library, and send the target explanation text to the user terminal.
  • the preset keyword screening strategy can be invoked to obtain the above-mentioned keywords.
  • the core keywords in the keywords are used to form a tag keyword set.
  • the first classification model is a convolutional neural network model, which can accurately classify.
  • the server After obtaining the first classification result, the server obtains the target explanation text corresponding to the first classification result in the locally stored explanation text library, and sends the target explanation text to the client.
  • the visitor can directly view the target explanation text on the display of the user terminal, so as to realize the process of pre-visiting the data.
  • the first parameter obtaining unit 103 is configured to receive the explanation text and practice speech data sent by the user terminal, and perform similarity calculation between the speech recognition text corresponding to the explanation text and speech data and the target explanation text, and obtain a similarity between the speech recognition text and the target explanation text.
  • the first proficiency parameter corresponding to the terminal is configured to receive the explanation text and practice speech data sent by the user terminal, and perform similarity calculation between the speech recognition text corresponding to the explanation text and speech data and the target explanation text, and obtain a similarity between the speech recognition text and the target explanation text.
  • the practice mode can be turned on on the user terminal, that is, after the audio recording function on the client terminal is turned on, the visitor reads aloud according to the target explanation text, so that The sound data collected by the client is the speech data of the teaching text corresponding to the target teaching text.
  • the client When a complete exercise is completed, the client will send the text and exercise voice data to the server to evaluate the proficiency of the exercise.
  • the speech recognition text corresponding to the speech data of the explanation text and the target explanation text may be subjected to similarity calculation to obtain the first proficiency parameter corresponding to the client.
  • the first parameter obtaining unit 103 includes:
  • a speech recognition text acquisition unit used for invoking a pre-trained speech recognition model to perform speech recognition on the explanation text speech data to obtain speech recognition text;
  • the target text section set acquisition unit is used to obtain the section break of the target explanation text, so as to segment the speech recognition text correspondingly to obtain a voice text section set, and obtain the target text section of the target explanation text.
  • a section set wherein the voice text section set includes multiple sub-speech texts, the target text section set includes multiple target sub-speech texts, and the total number of sub-speech texts in the voice-text section set is the same as the The total number of target sub-speech texts in the target text subsection set is the same;
  • the text semantic vector acquisition unit is used to perform word segmentation, keyword extraction, word vector transformation and semantic vector acquisition of each sub-speech text in sequence, and obtain sub-text semantic vectors corresponding to each sub-speech text respectively, and concatenate to form a text semantic vector ;
  • the target text semantic vector acquisition unit is used to perform word segmentation, keyword extraction, word vector transformation and semantic vector acquisition of each target sub-speech text in sequence, and obtain the target sub-text semantic vector corresponding to each target sub-speech text respectively, and connect them in series. Constitute the target text semantic vector;
  • a first proficiency parameter calculating unit configured to calculate the Euclidean distance between the text semantic vector and the target text semantic vector, as the first proficiency parameter.
  • a speech recognition model (such as an N-gram model, an N-gram model, That is, the multivariate model) performs speech recognition on the speech data of the explanation text, and obtains the speech recognition text.
  • calculating the similarity between the acquired speech recognition text and the target explanation text is calculating the similarity between the texts.
  • the text can be segmented to calculate the comprehensive text similarity.
  • the separator between paragraphs is used as a section break, and the target explanation text can be divided into multiple paragraphs according to the section break (ie, multiple target sub-speech texts, the above-mentioned multiple targets The sub-speech texts make up the target text subsection set).
  • the speech recognition text corresponding to the recorded speech data of the explanation text should be roughly the same as the target explanation text, so when the speech recognition text is divided into segments Reference may also be made to the section break of the target explanation text, so as to segment the speech recognition text correspondingly to obtain a voice text section set (the voice text section set includes a plurality of sub-speech texts).
  • each sub-speech text is subjected to word segmentation (word segmentation by statistical word segmentation method), keyword extraction (keyword extraction by TF-IDF model), word vector transformation (word vector conversion by word2vec model) and semantics.
  • word segmentation word segmentation by statistical word segmentation method
  • keyword extraction keyword extraction by TF-IDF model
  • word vector transformation word vector conversion by word2vec model
  • semantics semantics.
  • the vector acquisition is performed to obtain sub-text semantic vectors corresponding to each sub-speech text respectively, so as to be concatenated to form a text semantic vector.
  • the weight value corresponding to each word vector is obtained first, and then the weight sum of each word vector is calculated to obtain the semantic vector.
  • the Euclidean distance between the two can be calculated as the first proficiency parameter. Among them, the smaller the Euclidean distance between the two, the closer the two are.
  • the second classification unit 104 is configured to perform feature extraction according to the first proficiency parameter, the target user portrait and the randomly obtained customer user portrait to obtain a feature set, and the feature set is based on the called second classification model. Perform classification to obtain a second classification result.
  • the server can use the first proficiency parameter, the target user portrait and the randomly obtained customer
  • the user portrait is subjected to feature extraction to obtain a feature set, and then the feature set is classified according to the pre-trained second classification model, thereby obtaining a second classification result.
  • the second classification model is also a convolutional neural network model.
  • the extracted feature set can better reflect the proficiency of the visitor in the target explanation text and the explanation field that the visitor is good at.
  • the second classification unit 104 includes:
  • a first keyword set acquisition unit used for calling a pre-stored first user portrait selection strategy to obtain a first keyword set corresponding to the target user portrait;
  • a first keyword vector obtaining unit configured to convert each keyword in the first keyword set into a corresponding word vector, so as to form a first keyword vector in series;
  • the second keyword set obtaining unit is used to call the pre-stored second user portrait selection strategy to obtain the second keyword set corresponding to the customer user portrait;
  • the second keyword vector obtaining unit is used to convert each keyword in the second keyword set into a corresponding word vector, so as to form a second keyword vector in series;
  • a feature set splicing unit configured to sequentially concatenate the first proficiency parameter, the first keyword vector and the second keyword vector to form a feature set.
  • the first user portrait selection strategy in order to obtain the feature set input as the second classification model, can be called first to obtain the first keyword set corresponding to the target user portrait, and the pre-stored The second user portrait selection strategy of , so as to obtain the second keyword set corresponding to the customer user portrait.
  • they are converted into word vectors correspondingly, so that the first proficiency parameter, the first keyword vector and the second keyword vector are concatenated in sequence to form a feature set .
  • feature sets with more reference dimensions can be extracted.
  • the virtual interaction model acquiring unit 105 is configured to acquire the target AI virtual interaction model corresponding to the second classification result in the locally stored AI virtual interaction model library.
  • the AI virtual interaction model library local to the server stores multiple AI virtual interaction models corresponding to the classification results respectively. For example, if the second classification result is equal to 0.5, then the first AI virtual interaction model is obtained, which is suitable for AI simulation simulation dialogue exercises for people to be visited who are not proficient in the target explanation text; the second classification result is equal to 0.7, then obtain The second AI virtual interaction model is the second AI virtual interaction model, which is suitable for AI simulation and dialogue exercises for people to be visited who have moderate proficiency in the target explanation text; the second classification result is equal to 0.9, then the third AI virtual interaction model is obtained, which is suitable for Target to explain the text proficiency of the visitor to conduct AI simulation simulation dialogue exercises.
  • the target AI virtual interaction model can be understood as an intelligent customer service in essence, which can interact and communicate with users by voice, so as to achieve the effect of simulating a dialogue to simulate an exercise explanation.
  • the interactive voice acquisition unit 106 is configured to receive interactive voice data corresponding to the user terminal and the target AI virtual interaction model.
  • the user terminal when the user terminal performs voice interaction with the target AI virtual interaction model, the user terminal can collect interactive voice data. Further assessment of practice proficiency is performed.
  • the second parameter obtaining unit 107 is configured to perform similarity calculation between the interactive voice data and the target standard voice data corresponding to the target AI virtual interaction model to obtain a second proficiency parameter corresponding to the client.
  • the virtual interaction model of the target AI in the visitor and the server has completed the virtual simulation interaction exercise
  • the vector similarity is calculated between the semantic vector correspondingly converted from the target standard speech data, so as to be used as the second proficiency parameter.
  • the second parameter obtaining unit 107 includes:
  • An interactive voice and text acquisition unit used for calling a pre-trained voice recognition model to perform voice recognition on the interactive voice data to obtain interactive voice text;
  • a target standard speech and text acquisition unit used for calling the speech recognition model to perform speech recognition on the target standard speech data to obtain the target standard speech text
  • an interactive voice text semantic vector acquisition unit configured to sequentially perform word segmentation, keyword extraction, word vector transformation and semantic vector acquisition on the interactive voice text to obtain an interactive voice text semantic vector corresponding to the interactive voice text;
  • a target standard phonetic text semantic vector acquisition unit used to sequentially perform word segmentation, keyword extraction, word vector transformation and semantic vector acquisition on the target standard phonetic text to obtain a target standard phonetic text semantic vector corresponding to the target standard phonetic text;
  • a second proficiency parameter calculating unit configured to calculate the Euclidean distance between the interactive speech-text semantic vector and the target standard speech-text semantic vector, as the second proficiency parameter.
  • a voice recognition model for example, The N-gram model, namely the multivariate model
  • the similarity between speech texts is to calculate the similarity between texts.
  • the initial learning process of the visiting person's explanation text is realized.
  • the second parameter obtaining unit 107 implements the practice process of the simulated interactive practice of visiting people. The above process is based on the learning and practice of the target explanatory text recommended by the server according to the user portrait, without manually sorting the target explanatory text, which improves the acquisition efficiency of the target explanatory text.
  • the visit information acquisition unit 108 is configured to acquire the requester information and the visitor information corresponding to the to-be-visited person’s data acquisition instruction if it detects the to-be-visited person’s data acquisition instruction uploaded by the client; wherein, the requester’s information It includes the first proficiency parameter, the second proficiency parameter and the target user portrait, and the information of the person to be visited includes the user portrait of the person to be visited and the product requirement information of the person to be visited.
  • an online video conference connection can be established with the person to be visited, and then communicate with the person to be visited based on the text automatically recommended by the server.
  • the server detects the instruction to obtain the information of the person to be visited uploaded by the client, it means that the client has not established a video connection with another client for online conferences.
  • the user of the other client terminal can efficiently communicate in an online video conference, and the client terminal can first send an instruction to obtain the information of the person to be visited to the server.
  • the server detects the instruction for obtaining the information of the person to be visited sent by the client, the server obtains the requester information and the information of the person to be visited corresponding to the instruction according to the information of the person to be visited.
  • the requester information includes a first proficiency parameter, a second proficiency parameter, and a target user portrait. Because a large amount of historical data about the client user and another client user is stored in the server, based on these historical data, the user portrait corresponding to the client user can be processed and obtained (corresponding to the above-mentioned requester user portrait), and obtain a user portrait corresponding to another client user (which may correspond to the above-mentioned user portrait of the person to be visited).
  • the product demand information of the person to be visited is recorded in the conversation record when the other client user communicates with the client user by telephone or through communication software (such as WeChat, QQ, etc.).
  • the product demand information can be understood as the product purchase intention of the person to be visited.
  • the business person can recommend some products to the consumer through an online meeting.
  • the user portrait of the requester generally has the label of which type of product sales the salesman is proficient in
  • the user portrait of the person to be visited generally has the user label of the consumer (such as which age group, occupation group, salary income belongs to Which income range group)
  • the product demand information corresponding to the person to be visited can be obtained according to the instruction of obtaining the information of the person to be visited.
  • the recommendation information generating unit 109 is used to call the pre-stored information recommendation strategy, and generate the requester recommendation information and the person to be visited according to the requester information, the information of the person to be visited, the product demand information of the person to be visited and the information recommendation strategy Recommendation information; wherein, the information recommendation strategy is used to extract several key tags in the user portrait of the person to be visited to generate requestor recommendation information with the product demand information of the person to be visited, and to extract the information in the user portrait of the person to be visited. Several key tags are combined with the product requirement information of the to-be-visited person to generate the to-be-visited person's recommendation information.
  • the server in order to better assist the users of the first type of intelligent terminal to recommend several products to the users of the second type of intelligent terminal, the server can be used for the first type of intelligent terminal in the server before the video connection between the two is carried out.
  • the requester generates recommendation information of the requester and recommendation information of the person to be visited.
  • the recommendation information generating unit 109 includes:
  • the requester recommendation information generating unit is configured to obtain the first recommendation information generation strategy in the information recommendation strategy, and generate the strategy according to the requester information, the product demand information of the person to be visited, and the first recommendation information, to Generate requester recommendation information;
  • a person to be visited recommendation information generating unit configured to obtain a second recommendation information generation strategy in the information recommendation strategy, and generate a strategy according to the information of the person to be visited, the product demand information of the person to be visited, and the second recommendation information , to generate the recommendation information of the person to be visited.
  • the requester recommendation information can be understood as generating requester recommendation information in the server according to the requester user portrait and the product demand information of the person to be visited, that is, the requester user portrait can be screened out according to the information recommendation strategy.
  • the key tag (such as proficient in life insurance product A), and also obtain the product demand information of the person to be visited (for example, it is also a life insurance product A). Search the local database for product introduction information related to the product demand information of the person to be visited (for example, insurance rules, insurance fees, insurance period, and detailed introduction of insurance types of life insurance product A) as the requester's recommendation information.
  • the to-be-visited person recommendation information can be understood as generating the to-be-visited person recommendation information in the server according to the to-be-visited person's user portrait and the to-be-visited person's product demand information, that is, the key points in the to-be-visited user portrait can be screened out according to the information recommendation strategy.
  • Labels (such as middle-aged people, the income range is 20,000-30,000 monthly salary, etc.), and the product demand information of the person to be visited (for example, life insurance product A), can be based on the key labels in the user portrait of the person to be visited and the information to be visited.
  • the human product demand information searches the local database of the server for the words for this type of user tag (the words can guide the first type of intelligent terminal users to communicate with the second type of intelligent terminal users according to the specified sentence sequence) as the to-be-visited People recommend information.
  • the requester recommendation information and the visitor recommendation information generated in the server can be used as guide data for the communication process between the first type of intelligent terminal user and the second type of intelligent terminal user, and the acquisition of these guide data is in the server.
  • the acquisition is automatically generated without the need for manual retrieval by the user, which improves the efficiency of data acquisition.
  • the device realizes learning and practice based on the target explanatory text recommended by the server according to the user portrait, without manually sorting the target explanatory text, and improves the acquisition efficiency of the target explanatory text.
  • the above-mentioned AI-based virtual interaction model generating apparatus can be implemented in the form of a computer program, and the computer program can be executed on a computer device as shown in FIG. 4 .
  • FIG. 4 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • the computer device 500 is a server, and the server may be an independent server or a server cluster composed of multiple servers.
  • the computer device 500 includes a processor 502 , a memory and a network interface 505 connected through a system bus 501 , wherein the memory may include a non-volatile storage medium 503 and an internal memory 504 .
  • the nonvolatile storage medium 503 can store an operating system 5031 and a computer program 5032 .
  • the computer program 5032 when executed, can cause the processor 502 to execute an AI-based virtual interaction model generation method.
  • the processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500 .
  • the internal memory 504 provides an environment for running the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute the AI-based virtual interaction model generation method.
  • the network interface 505 is used for network communication, such as providing transmission of data information.
  • the network interface 505 is used for network communication, such as providing transmission of data information.
  • FIG. 4 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown, or combine certain components, or have a different arrangement of components.
  • the processor 502 is configured to run the computer program 5032 stored in the memory to implement the AI-based virtual interaction model generation method disclosed in the embodiment of the present application.
  • the embodiment of the computer device shown in FIG. 4 does not constitute a limitation on the specific structure of the computer device, and in other embodiments, the computer device may include more or less components than those shown in the figure, Either some components are combined, or different component arrangements.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are the same as those of the embodiment shown in FIG. 4 , and details are not repeated here.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), special-purpose processors Integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.
  • a computer-readable storage medium may be a non-volatile computer-readable storage medium, or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program implements the AI-based virtual interaction model generation method disclosed in the embodiments of the present application when the computer program is executed by the processor.

Abstract

基于AI的虚拟交互模型生成方法、装置、计算机设备及存储介质,涉及人工智能技术,先调用第一分类模型,以获取与目标用户画像和客户用户画像对应的第一分类结果,根据第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将目标讲解文本发送至用户端(S102),之后获取本地已存储的AI虚拟交互模型库中与第二分类结果相对应的目标AI虚拟交互模型(S105),最后调用预先存储的信息推荐策略,根据请求人信息、待拜访人信息、待拜访人产品需求信息及信息推荐策略,生成请求人推荐信息及待拜访人推荐信息(S109),基于服务器根据用户画像推荐的目标讲解文本进行学习和练习,无需人工整理目标讲解文本,提高了目标讲解文本的获取效率。

Description

基于AI的虚拟交互模型生成方法、装置、计算机设备及存储介质
本申请要求于2020年11月3日提交中国专利局、申请号为202011209226.4,申请名称为“基于AI的虚拟交互模型生成方法、装置及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能的语音语义技术领域,尤其涉及一种基于AI的虚拟交互模型生成方法、装置、计算机设备及存储介质。
背景技术
目前,在线视频会议这一用户沟通方式得到了越来越广泛的应用,例如因间隔距离较远(如需要沟通的用户分别位于不同的城市)或是不方便见面当面沟通时,则可以选择在线视频会议进行远距离在线沟通。比如在新冠疫情仍未结束的情况下,人们之间的线下当面沟通存在一定的安全风险,线上视频沟通的需求日渐增多。一般视频会议发起方在向视频会议接收方进行视频会议邀请之前,需要先获取视频会议接收方的信息获取需求,之后根据信息获取需求人工搜集并整理会议沟通资料,最后视频会议发起方在对会议沟通资料熟悉后与视频会议接收方进行视频会议。发明人意识到通过这一方式获取会议沟通资料,不仅效率低下,会议沟通资料与本次视频会议的难以准确相关,而且视频会议发起方在对会议沟通资料熟悉时只能依靠低效率的人工记忆。
发明内容
本申请实施例提供了一种基于AI的虚拟交互模型生成方法、装置、计算机设备及存储介质,旨在解决现有技术中视频会议发起方根据视频会议接收方的信息获取需求人工整理会议沟通资料,不仅效率低下,会议沟通资料与本次视频会议的难以准确相关的问题。
第一方面,本申请实施例提供了一种基于AI的虚拟交互模型生成方法,其包括:
若检测到用户端发送的虚拟交互对象生成指令,获取本地已存储的与所述用户端对应的目标用户画像,并随机获取本地已存储的客户用户画像;
调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端;
接收所述用户端发送的讲解文本练习语音数据,将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数;
根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,将所述特征集根据所调用的第二分类模型进行分类得到第二分类结果;
获取本地已存储的AI虚拟交互模型库中与所述第二分类结果相对应的目标AI虚拟交互模型;
接收所述用户端与所述目标AI虚拟交互模型之间对应的交互语音数据;
将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数;
若检测到用户端上传的待拜访人资料获取指令,获取与所述待拜访人资料获取指令对应的请求人信息和待拜访人信息;其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像,所述待拜访人信息包括待拜访人用户画像和待拜访人产品需求信息;以及
调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息;其中,所述信息推荐策略用于提取所述请求人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成请求人推荐信息,以及提取待拜访人用户画像中的若干个关键标签以与所述待拜访人产品 需求信息生成待拜访人推荐信息。
第二方面,本申请实施例提供了一种基于AI的虚拟交互模型生成装置,其包括:
用户画像获取单元,用于若检测到用户端发送的虚拟交互对象生成指令,获取本地已存储的与所述用户端对应的目标用户画像,并随机获取本地已存储的客户用户画像;
第一分类单元,用于调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端;
第一参数获取单元,用于接收所述用户端发送的讲解文本练习语音数据,将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数;
第二分类单元,用于根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,将所述特征集根据所调用的第二分类模型进行分类得到第二分类结果;
虚拟交互模型获取单元,用于获取本地已存储的AI虚拟交互模型库中与所述第二分类结果相对应的目标AI虚拟交互模型;
交互语音获取单元,用于接收所述用户端与所述目标AI虚拟交互模型之间对应的交互语音数据;
第二参数获取单元,用于将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数;
拜访信息获取单元,用于若检测到用户端上传的待拜访人资料获取指令,获取与所述待拜访人资料获取指令对应的请求人信息和待拜访人信息;其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像,所述待拜访人信息包括待拜访人用户画像和待拜访人产品需求信息;以及
推荐信息生成单元,用于调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息;其中,所述信息推荐策略用于提取所述请求人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成请求人推荐信息,以及提取待拜访人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成待拜访人推荐信息。
第三方面,本申请实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
若检测到用户端发送的虚拟交互对象生成指令,获取本地已存储的与所述用户端对应的目标用户画像,并随机获取本地已存储的客户用户画像;
调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端;
接收所述用户端发送的讲解文本练习语音数据,将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数;
根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,将所述特征集根据所调用的第二分类模型进行分类得到第二分类结果;
获取本地已存储的AI虚拟交互模型库中与所述第二分类结果相对应的目标AI虚拟交互模型;
接收所述用户端与所述目标AI虚拟交互模型之间对应的交互语音数据;
将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数;
若检测到用户端上传的待拜访人资料获取指令,获取与所述待拜访人资料获取指令对应 的请求人信息和待拜访人信息;其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像,所述待拜访人信息包括待拜访人用户画像和待拜访人产品需求信息;以及
调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息;其中,所述信息推荐策略用于提取所述请求人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成请求人推荐信息,以及提取待拜访人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成待拜访人推荐信息。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:
若检测到用户端发送的虚拟交互对象生成指令,获取本地已存储的与所述用户端对应的目标用户画像,并随机获取本地已存储的客户用户画像;
调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端;
接收所述用户端发送的讲解文本练习语音数据,将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数;
根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,将所述特征集根据所调用的第二分类模型进行分类得到第二分类结果;
获取本地已存储的AI虚拟交互模型库中与所述第二分类结果相对应的目标AI虚拟交互模型;
接收所述用户端与所述目标AI虚拟交互模型之间对应的交互语音数据;
将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数;
若检测到用户端上传的待拜访人资料获取指令,获取与所述待拜访人资料获取指令对应的请求人信息和待拜访人信息;其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像,所述待拜访人信息包括待拜访人用户画像和待拜访人产品需求信息;以及
调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息;其中,所述信息推荐策略用于提取所述请求人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成请求人推荐信息,以及提取待拜访人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成待拜访人推荐信息。
本申请实施例提供了一种基于AI的虚拟交互模型生成方法、装置、计算机设备及存储介质,先是调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端,之后获取本地已存储的AI虚拟交互模型库中与第二分类结果相对应的目标AI虚拟交互模型,最后调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息,基于服务器根据用户画像推荐的目标讲解文本进行学习和练习,无需人工整理目标讲解文本,提高了目标讲解文本的获取效率。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的基于AI的虚拟交互模型生成方法的应用场景示意图;
图2为本申请实施例提供的基于AI的虚拟交互模型生成方法的流程示意图;
图3为本申请实施例提供的基于AI的虚拟交互模型生成装置的示意性框图;
图4为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参阅图1和图2,图1为本申请实施例提供的基于AI的虚拟交互模型生成方法的应用场景示意图;图2为本申请实施例提供的基于AI的虚拟交互模型生成方法的流程示意图,该基于AI的虚拟交互模型生成方法应用于服务器中,该方法通过安装于服务器中的应用软件进行执行。
如图2所示,该方法包括步骤S101~S109。
S101、若检测到用户端发送的虚拟交互对象生成指令,获取本地已存储的与所述用户端对应的目标用户画像,并随机获取本地已存储的客户用户画像。
在本实施例中,为了更清楚的理解本申请的技术方案,下面对所涉及的终端进行详细介绍。本申请是在服务器的角度描述技术方案。
一是用户端,其使用者是拜访人,其可与待拜访人(也可以理解为被拜访人)通过建立在线视频进行在线沟通。
二是另一用户端,其使用者是待拜访人,其可与拜访人通过建立在线视频进行在线沟通。其中,用户端和另一用户端在进行在线视频沟通时,其均与服务器通讯连接。
三是服务器,在服务器中可以在拜访人与被拜访人之间的在线视频沟通之前,可以生成目标讲解文本以供拜访人进行视频会议前的资料学习,也可以生成目标AI虚拟交互模型以与拜访人进行仿真视频会议交互练习,还可以生成与拜访人与被拜访人之间的在线视频沟通之前推送至待拜访人查看的待拜访人推荐信息。
其中,拜访人在与被拜访人之间进行在线视频沟通之前,可以结合与待拜访人对应的客户用户画像,以及与拜访人对应的目标用户画像,依据分类模型,向拜访人推送个性化讲解方案(也即后续的目标讲解文本)。拜访人可对该个性化讲解方案进行学习,并由服务器记录这个过程中拜访人对个性化讲解方案的掌握程度。
S102、调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端。
在本实施例中,为了更精准的向拜访人推送目标讲解文本,此时可以结合所述目标用户画像和所述客户用户画像中包括的标签组成的标签集合作为第一分类模型的输入,从而计算得到对应的第一分类结果。由于在服务器的本地已预先存储有与多种分类结果相对应的讲解文本,故在获取了第一分类结果后,即可获取与所述第一分类结果相对应的目标讲解文本,之后发送至用户端,这样拜访人可以查看该目标讲解文本之后进行讲解练习。
在一实施例中,步骤S102包括:
获取所述目标用户画像和所述客户用户画像中包括的标签以组成标签关键词集,将所述标签关键词集根据所调用的第一分类模型进行分类得到第一分类结果;其中,所述第一分类模型是卷积神经网络模型;
获取本地已存储的讲解文本库中与所述第一分类结果相对应的目标讲解文本,将所述目标讲解文本发送至用户端。
在本实施例中,由于所述目标用户画像中包括多个标签对应的关键词,而且所述客户用户画像中也包括多个标签对应的关键词,可以调用预先设置的关键词筛选策略获取上述关键词中的核心关键词以组成标签关键词集。
之后标签关键词集中的每一标签关键词均转化为词向量后,可以将这些词向量根据每一词向量的权重值,将标签关键词集转化成标签语义向量后以输入至所述第一分类模型进行计算,从而得到第一分类结果。其中,所述第一分类模型是一个卷积神经网络模型,可以准确的进行分类。
当获取了第一分类结果后,服务器获取本地已存储的讲解文本库中与所述第一分类结果相对应的目标讲解文本,将所述目标讲解文本发送至用户端。拜访人则可以在用户端的显示器上直观查看所述目标讲解文本,实现拜访前的资料预习过程。
S103、接收所述用户端发送的讲解文本练习语音数据,将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数。
在本实施例中,当用户端接收了所述目标讲解文本后,可以在用户端上开启练习模式,即开启客户端上的录音录像功能后,拜访人根据所述目标讲解文本进行朗读,这样客户端所采集到的声音数据则是与所述目标讲解文本对应的讲解文本练习语音数据。当完成某一次的完整练习后,用户端将讲解文本练习语音数据发送至服务器,以评估本次练习的熟练程度。为了量化数量程度标准,在本申请中可以将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数。
在一实施例中,步骤S103包括:
调用预先训练的语音识别模型以对所述讲解文本语音数据进行语音识别,得到语音识别文本;
获取所述目标讲解文本的分节符,以将所述语音识别文本对应进行分节得到语音文本分节集,并获取所述目标讲解文本的目标文本分节集;其中,所述语音文本分节集中包括多个子语音文本,所述目标文本分节集包括多个目标子语音文本,所述语音文本分节集中子语音文本的总个数与所述目标文本分节集中目标子语音文本的总个数相同;
将各子语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与各子语音文本分别对应的子文本语义向量,以串接组成文本语义向量;
将各目标子语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与各目标子语音文本分别对应的目标子文本语义向量,以串接组成目标文本语义向量;
计算所述文本语义向量与所述目标文本语义向量之间的欧式距离,以作为所述第一熟练度参数。
在本实施例中,当服务器接收了所述讲解文本练习语音数据后,为了计算其对应的语音识别文本与目标讲解文本之间的相似度,可以先通过语音识别模型(例如N-gram模型,也即多元模型)对所述讲解文本语音数据进行语音识别,得到语音识别文本,这样计算获取语音识别文本与目标讲解文本之间的相似度则是计算文本之间的相似度。
由于文本中存在一些非关键词(如语气词、连接词等),为了提炼文本的核心关键词组成语义向量进行文本相似度的计算,可以采用将文本进行分段处理的方式来计算文本的综合相似度。在所述目标讲解文本中以段落之间的分隔符作为分节符,所述目标讲解文本中则是可以根据分节符划分为多个段落(即多个目标子语音文本,上述多个目标子语音文本组成目标文本分节集)。由于拜访人在根据目标讲解文本进行练习的过程中,所录音得到的讲解文本语音数据相对应的语音识别文本应与所述目标讲解文本大致相同,那么在对语音识别文本进行划分分段的时候也可参考所述目标讲解文本的分节符,以将所述语音识别文本对应进行分节得到语音文本分节集(所述语音文本分节集中包括多个子语音文本)。
之后将各子语音文本依次进行分词(通过基于统计的分词方法进行分词,)、关键词提取(通过TF-IDF模型进行关键词提取)、词向量转化(通过word2vec模型进行词向量转化)和语义向量获取,得到与各子语音文本分别对应的子文本语义向量,以串接组成文本语义向量。其中,在根据多个词向量获取语音向量时,是先获取各词向量对应的权重值,再计算各词向量的权重和,从而得到语义向量。
之后再计算目标讲解文本对应的目标文本语义向量时,可以参考根据语音识别文本语义向量的过程。当完成了文本语义向量和目标文本语义向量后,即可计算两者之间的欧氏距离以作为所述第一熟练度参数。其中,两者之间的欧氏距离越小,表示两者越近似。
S104、根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,将所述特征集根据所调用的第二分类模型进行分类得到第二分类结果。
在本实施例中,当拜访人根据目标讲解文本完成了文本练习后,为了进一步进行仿真模拟对话练习,在服务器中可以根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,之后将所述特征集根据预先训练的第二分类模型进行分类,从而得到第二分类结果。其中,所述第二分类模型也为卷积神经网络模型。
由于参考了第一熟练度参数、所述目标用户画像及随机获取的客户用户画像,故提炼出来的特征集更能反应拜访人对目标讲解文本掌握的熟练程度和拜访人擅长的讲解领域。
在一实施例中,步骤S104包括:
调用预先存储的第一用户画像挑选策略,以获取与所述目标用户画像对应的第一关键词集合;
将所述第一关键词集合中各关键词均转化为对应的词向量,以串接组成第一关键词向量;
调用预先存储的第二用户画像挑选策略,以获取与所述客户用户画像对应的第二关键词集合;
将所述第二关键词集合中各关键词均转化为对应的词向量,以串接组成第二关键词向量;
将所述第一熟练度参数、所述第一关键词向量和所述第二关键词向量依序串接组成特征集。
在本实施例中,为了获得作为第二分类模型输入的特征集,此时可以先调用第一用户画像挑选策略,以获取与所述目标用户画像对应的第一关键词集合,同时调用预先存储的第二用户画像挑选策略,以获取与所述客户用户画像对应的第二关键词集合。在获取了上述两个关键词集合后,均对应转化为词向量,从而结合所述第一熟练度参数、所述第一关键词向量和所述第二关键词向量依序串接组成特征集。通过这一数据处理方式,能够提炼出参考维度更多的特征集。
S105、获取本地已存储的AI虚拟交互模型库中与所述第二分类结果相对应的目标AI虚拟交互模型。
在本实施例中,由于服务器本地的AI虚拟交互模型库中存储有多个分类结果分别对应的AI虚拟交互模型。例如,第二分类结果等于0.5,则获取的是第一AI虚拟交互模型,其适合对目标讲解文本熟练程度不高的待拜访人进行AI仿真模拟对话练习;第二分类结果等于0.7,则获取的是第二AI虚拟交互模型,其适合对目标讲解文本熟练程度中等的待拜访人进行AI仿真模拟对话练习;第二分类结果等于0.9,则获取的是第三AI虚拟交互模型,其适合对目标讲解文本熟练程度较高的待拜访人进行AI仿真模拟对话练习。该目标AI虚拟交互模型本质上可以理解成一个智能客服,可与用户进行语音互动和沟通,从而实现了仿真模拟对话以模拟练习讲解的效果。
S106、接收所述用户端与所述目标AI虚拟交互模型之间对应的交互语音数据。
在本实施例中,由于用户端与所述目标AI虚拟交互模型进行语音互动时,用户端可采集交互语音数据,在完成了本轮模拟仿真互动练习后,将交互语音数据上传至服务器,以进一步进行练习熟练程度的评估。
S107、将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数。
在本实施例中,在拜访人与服务器中所述目标AI虚拟交互模型完成了虚拟仿真交互练习后,为了评估拜访人对目标讲解文本在仿真交互练习过程中的熟练程度,也是可以参考计算 第一熟练度参数的过程。具体是将所述交互语音数据转化为对应的语义向量后,与由目标标准语音数据对应转化的语义向量之间计算向量相似度,从而作为第二熟练度参数。
在一实施例中,步骤S107包括:
调用预先训练的语音识别模型以对所述交互语音数据进行语音识别,得到交互语音文本;
调用所述语音识别模型以对所述目标标准语音数据进行语音识别,得到目标标准语音文本;
将所述交互语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与交互语音文本对应的交互语音文本语义向量;
将所述目标标准语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与目标标准语音文本对应的目标标准语音文本语义向量;
计算所述交互语音文本语义向量与所述目标标准语音文本语义向量之间的欧式距离,以作为所述第二熟练度参数。
在本实施例中,当服务器接收了所述交互语音数据以及调用了本地的目标标准语音数据后,为了计算交互语音文本与目标标准语音文本之间的相似度,可以先通过语音识别模型(例如N-gram模型,也即多元模型)对所述交互语音数据进行语音识别得到交互语音文本,也对所述目标标准语音数据进行语音识别,得到目标标准语音文本,这样计算交互语音文本与目标标准语音文本之间的相似度则是计算文本之间的相似度。计算文本之间的相似度,则可以参考步骤S103的具体实施步骤。通过这一方式量化了用户对目标讲解文本的掌握熟练程度。
在步骤S101-S103实现了拜访人的讲解文本初始学习过程,在步骤S104-S107则实现了拜访人的模拟仿真交互练习的练习过程。上述过程是基于服务器根据用户画像推荐的目标讲解文本进行学习和练习,无需人工整理目标讲解文本,提高了目标讲解文本的获取效率。
S108、若检测到用户端上传的待拜访人资料获取指令,获取与所述待拜访人资料获取指令对应的请求人信息和待拜访人信息;其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像,所述待拜访人信息包括待拜访人用户画像和待拜访人产品需求信息。
在本实施例中,当拜访人基于服务器提供的目标讲解文本提供了学习和练习后,此时可以与待拜访人建立在线视频会议连接后,基于服务器自动推荐的文本与待拜访人进行沟通。
具体的,当服务器检测到用户端上传的待拜访人资料获取指令时,表示之前用户端还未与另一用户端建立视频连接以进行在线会议,此时为了辅助用户端的使用者更好的与另一用户端的使用者高效的进行在线视频会议沟通,可先由用户端向服务器发送待拜访人资料获取指令。当服务器检测到用户端发送的待拜访人资料获取指令时,由服务器根据所述待拜访人资料获取指令对应的请求人信息和待拜访人信息。
其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像。因为在服务器中存储了关于用户端使用者以及另一用户端使用者的大量历史数据,以这些历史数据为数据基础,可以处理得到用户端使用者对应的用户画像(可对应上述的请求人用户画像),且得到与另一用户端使用者对应的用户画像(可对应上述的待拜访人用户画像)。待拜访人产品需求信息是该另一用户端使用者在与用户端使用者进行电话沟通、或是通过通讯软件沟通(如微信、QQ等)时在对话记录中有记载,这一待拜访人产品需求信息可以理解为待拜访人的产品购买意向。
例如,若用户端使用者对应业务人员,另一用户端使用者对应消费者,此时业务人员可以通过在线会议的方式向消费者推荐一些产品。此时请求人用户画像中一般有该业务员精通哪一类型产品的销售的标签,待拜访人用户画像一般有该消费者的用户标签(例如属于哪一年龄段、属于职业群体、工资收入属于哪一收入范围群体),而且在服务器中可以根据待拜访人资料获取指令获取对应待拜访人产品需求信息。通过上述方式,是在服务器中自动根据待拜访人资料获取指令识别或检索出对应数据,无需用户上传过多数据,简化了数据获取流程。
S109、调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息;其中,所述信息推荐策略用于提取所述请求人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成请求人推荐信息,以及提取待拜访人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成待拜访人推荐信息。
在本实施例中,为了更好的辅助第一类型智能终端使用者向第二类型智能终端使用者推荐若干产品,在两者进行视频连线之前,在服务器中可以为第一类型智能终端使用者生成请求人推荐信息及待拜访人推荐信息。
在一实施例中,步骤S109包括:
获取所述信息推荐策略中的第一推荐信息生成策略,根据所述请求人信息、所述待拜访人产品需求信息及所述第一推荐信息生成策略,以生成请求人推荐信息;
获取所述信息推荐策略中的第二推荐信息生成策略,根据所述待拜访人信息、所述待拜访人产品需求信息及所述第二推荐信息生成策略,以生成待拜访人推荐信息。
在本实施例中,所述请求人推荐信息可以理解为在服务器中根据请求人用户画像及待拜访人产品需求信息生成请求人推荐信息,也即可根据信息推荐策略筛选出请求人用户画像中的关键标签(如精通寿险A产品),还获取了待拜访人产品需求信息(例如也是寿险A产品),此时可以基于请求人用户画像中的关键标签和待拜访人产品需求信息在服务器的本地数据库中搜索与该待拜访人产品需求信息的产品介绍信息(例如是寿险A产品的投保规则、投保费用、投保年限、险种详细介绍)以作为请求人推荐信息。
所述待拜访人推荐信息可以理解为在服务器中根据待拜访人用户画像及待拜访人产品需求信息生成待拜访人推荐信息,也即可以根据信息推荐策略筛选出待拜访人用户画像中的关键标签(如中年人,收入范围是月薪20000-30000等),还获取了待拜访人产品需求信息(例如也是寿险A产品),此时可以基于待拜访人用户画像中的关键标签和待拜访人产品需求信息在服务器的本地数据库中搜索针对该类用户标签的话术(该话术可以引导第一类型智能终端使用者按照指定语句顺序与第二类型智能终端使用者进行沟通)以作为待拜访人推荐信息。
在服务器中所生成的请求人推荐信息和待拜访人推荐信息,可以作为第一类型智能终端使用者与第二类型智能终端使用者进行沟通过程的引导数据,这些引导数据的获取是在服务器中自动生成获取,无需用户手动检索,提高了数据获取效率。
该方法实现了基于服务器根据用户画像推荐的目标讲解文本进行学习和练习,无需人工整理目标讲解文本,提高了目标讲解文本的获取效率。
本申请实施例还提供一种基于AI的虚拟交互模型生成装置,该基于AI的虚拟交互模型生成装置用于执行前述基于AI的虚拟交互模型生成方法的任一实施例。具体地,请参阅图3,图3是本申请实施例提供的基于AI的虚拟交互模型生成装置的示意性框图。该基于AI的虚拟交互模型生成装置100可以配置于服务器中。
如图3所示,基于AI的虚拟交互模型生成装置100包括:用户画像获取单元101、第一分类单元102、第一参数获取单元103、第二分类单元104、虚拟交互模型获取单元105、交互语音获取单元106、第二参数获取单元107、拜访信息获取单元108、推荐信息生成单元109。
用户画像获取单元101,用于若检测到用户端发送的虚拟交互对象生成指令,获取本地已存储的与所述用户端对应的目标用户画像,并随机获取本地已存储的客户用户画像。
在本实施例中,拜访人在与被拜访人之间进行在线视频沟通之前,可以结合与待拜访人对应的客户用户画像,以及与拜访人对应的目标用户画像,依据分类模型,向拜访人推送个性化讲解方案(也即后续的目标讲解文本)。拜访人可对该个性化讲解方案进行学习,并由服务器记录这个过程中拜访人对个性化讲解方案的掌握程度。
第一分类单元102,用于调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端。
在本实施例中,为了更精准的向拜访人推送目标讲解文本,此时可以结合所述目标用户画像和所述客户用户画像中包括的标签组成的标签集合作为第一分类模型的输入,从而计算得到对应的第一分类结果。由于在服务器的本地已预先存储有与多种分类结果相对应的讲解文本,故在获取了第一分类结果后,即可获取与所述第一分类结果相对应的目标讲解文本,之后发送至用户端,这样拜访人可以查看该目标讲解文本之后进行讲解练习。
在一实施例中,第一分类单元102包括:
标签关键词集获取单元,用于获取所述目标用户画像和所述客户用户画像中包括的标签以组成标签关键词集,将所述标签关键词集根据所调用的第一分类模型进行分类得到第一分类结果;其中,所述第一分类模型是卷积神经网络模型;
目标讲解文本获取单元,用于获取本地已存储的讲解文本库中与所述第一分类结果相对应的目标讲解文本,将所述目标讲解文本发送至用户端。
在本实施例中,由于所述目标用户画像中包括多个标签对应的关键词,而且所述客户用户画像中也包括多个标签对应的关键词,可以调用预先设置的关键词筛选策略获取上述关键词中的核心关键词以组成标签关键词集。
之后标签关键词集中的每一标签关键词均转化为词向量后,可以将这些词向量根据每一词向量的权重值,将标签关键词集转化成标签语义向量后以输入至所述第一分类模型进行计算,从而得到第一分类结果。其中,所述第一分类模型是一个卷积神经网络模型,可以准确的进行分类。
当获取了第一分类结果后,服务器获取本地已存储的讲解文本库中与所述第一分类结果相对应的目标讲解文本,将所述目标讲解文本发送至用户端。拜访人则可以在用户端的显示器上直观查看所述目标讲解文本,实现拜访前的资料预习过程。
第一参数获取单元103,用于接收所述用户端发送的讲解文本练习语音数据,将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数。
在本实施例中,当用户端接收了所述目标讲解文本后,可以在用户端上开启练习模式,即开启客户端上的录音录像功能后,拜访人根据所述目标讲解文本进行朗读,这样客户端所采集到的声音数据则是与所述目标讲解文本对应的讲解文本练习语音数据。当完成某一次的完整练习后,用户端将讲解文本练习语音数据发送至服务器,以评估本次练习的熟练程度。为了量化数量程度标准,在本申请中可以将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数。
在一实施例中,第一参数获取单元103包括:
语音识别文本获取单元,用于调用预先训练的语音识别模型以对所述讲解文本语音数据进行语音识别,得到语音识别文本;
目标文本分节集获取单元,用于获取所述目标讲解文本的分节符,以将所述语音识别文本对应进行分节得到语音文本分节集,并获取所述目标讲解文本的目标文本分节集;其中,所述语音文本分节集中包括多个子语音文本,所述目标文本分节集包括多个目标子语音文本,所述语音文本分节集中子语音文本的总个数与所述目标文本分节集中目标子语音文本的总个数相同;
文本语义向量获取单元,用于将各子语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与各子语音文本分别对应的子文本语义向量,以串接组成文本语义向量;
目标文本语义向量获取单元,用于将各目标子语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与各目标子语音文本分别对应的目标子文本语义向量,以串接组成目标文本语义向量;
第一熟练度参数计算单元,用于计算所述文本语义向量与所述目标文本语义向量之间的欧式距离,以作为所述第一熟练度参数。
在本实施例中,当服务器接收了所述讲解文本练习语音数据后,为了计算其对应的语音识别文本与目标讲解文本之间的相似度,可以先通过语音识别模型(例如N-gram模型,也即多元模型)对所述讲解文本语音数据进行语音识别,得到语音识别文本,这样计算获取语音识别文本与目标讲解文本之间的相似度则是计算文本之间的相似度。
由于文本中存在一些非关键词(如语气词、连接词等),为了提炼文本的核心关键词组成语义向量进行文本相似度的计算,可以采用将文本进行分段处理的方式来计算文本的综合相似度。在所述目标讲解文本中以段落之间的分隔符作为分节符,所述目标讲解文本中则是可以根据分节符划分为多个段落(即多个目标子语音文本,上述多个目标子语音文本组成目标文本分节集)。由于拜访人在根据目标讲解文本进行练习的过程中,所录音得到的讲解文本语音数据相对应的语音识别文本应与所述目标讲解文本大致相同,那么在对语音识别文本进行划分分段的时候也可参考所述目标讲解文本的分节符,以将所述语音识别文本对应进行分节得到语音文本分节集(所述语音文本分节集中包括多个子语音文本)。
之后将各子语音文本依次进行分词(通过基于统计的分词方法进行分词,)、关键词提取(通过TF-IDF模型进行关键词提取)、词向量转化(通过word2vec模型进行词向量转化)和语义向量获取,得到与各子语音文本分别对应的子文本语义向量,以串接组成文本语义向量。其中,在根据多个词向量获取语音向量时,是先获取各词向量对应的权重值,再计算各词向量的权重和,从而得到语义向量。
之后再计算目标讲解文本对应的目标文本语义向量时,可以参考根据语音识别文本语义向量的过程。当完成了文本语义向量和目标文本语义向量后,即可计算两者之间的欧氏距离以作为所述第一熟练度参数。其中,两者之间的欧氏距离越小,表示两者越近似。
第二分类单元104,用于根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,将所述特征集根据所调用的第二分类模型进行分类得到第二分类结果。
在本实施例中,当拜访人根据目标讲解文本完成了文本练习后,为了进一步进行仿真模拟对话练习,在服务器中可以根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,之后将所述特征集根据预先训练的第二分类模型进行分类,从而得到第二分类结果。其中,所述第二分类模型也为卷积神经网络模型。
由于参考了第一熟练度参数、所述目标用户画像及随机获取的客户用户画像,故提炼出来的特征集更能反应拜访人对目标讲解文本掌握的熟练程度和拜访人擅长的讲解领域。
在一实施例中,第二分类单元104包括:
第一关键词集合获取单元,用于调用预先存储的第一用户画像挑选策略,以获取与所述目标用户画像对应的第一关键词集合;
第一关键词向量获取单元,用于将所述第一关键词集合中各关键词均转化为对应的词向量,以串接组成第一关键词向量;
第二关键词集合获取单元,用于调用预先存储的第二用户画像挑选策略,以获取与所述客户用户画像对应的第二关键词集合;
第二关键词向量获取单元,用于将所述第二关键词集合中各关键词均转化为对应的词向量,以串接组成第二关键词向量;
特征集拼接单元,用于将所述第一熟练度参数、所述第一关键词向量和所述第二关键词向量依序串接组成特征集。
在本实施例中,为了获得作为第二分类模型输入的特征集,此时可以先调用第一用户画像挑选策略,以获取与所述目标用户画像对应的第一关键词集合,同时调用预先存储的第二用户画像挑选策略,以获取与所述客户用户画像对应的第二关键词集合。在获取了上述两个关键词集合后,均对应转化为词向量,从而结合所述第一熟练度参数、所述第一关键词向量和所述第二关键词向量依序串接组成特征集。通过这一数据处理方式,能够提炼出参考维度更多的特征集。
虚拟交互模型获取单元105,用于获取本地已存储的AI虚拟交互模型库中与所述第二分类结果相对应的目标AI虚拟交互模型。
在本实施例中,由于服务器本地的AI虚拟交互模型库中存储有多个分类结果分别对应的AI虚拟交互模型。例如,第二分类结果等于0.5,则获取的是第一AI虚拟交互模型,其适合对目标讲解文本熟练程度不高的待拜访人进行AI仿真模拟对话练习;第二分类结果等于0.7,则获取的是第二AI虚拟交互模型,其适合对目标讲解文本熟练程度中等的待拜访人进行AI仿真模拟对话练习;第二分类结果等于0.9,则获取的是第三AI虚拟交互模型,其适合对目标讲解文本熟练程度较高的待拜访人进行AI仿真模拟对话练习。该目标AI虚拟交互模型本质上可以理解成一个智能客服,可与用户进行语音互动和沟通,从而实现了仿真模拟对话以模拟练习讲解的效果。
交互语音获取单元106,用于接收所述用户端与所述目标AI虚拟交互模型之间对应的交互语音数据。
在本实施例中,由于用户端与所述目标AI虚拟交互模型进行语音互动时,用户端可采集交互语音数据,在完成了本轮模拟仿真互动练习后,将交互语音数据上传至服务器,以进一步进行练习熟练程度的评估。
第二参数获取单元107,用于将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数。
在本实施例中,在拜访人与服务器中所述目标AI虚拟交互模型完成了虚拟仿真交互练习后,为了评估拜访人对目标讲解文本在仿真交互练习过程中的熟练程度,也是可以参考计算第一熟练度参数的过程。具体是将所述交互语音数据转化为对应的语义向量后,与由目标标准语音数据对应转化的语义向量之间计算向量相似度,从而作为第二熟练度参数。
在一实施例中,第二参数获取单元107包括:
交互语音文本获取单元,用于调用预先训练的语音识别模型以对所述交互语音数据进行语音识别,得到交互语音文本;
目标标准语音文本获取单元,用于调用所述语音识别模型以对所述目标标准语音数据进行语音识别,得到目标标准语音文本;
交互语音文本语义向量获取单元,用于将所述交互语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与交互语音文本对应的交互语音文本语义向量;
目标标准语音文本语义向量获取单元,用于将所述目标标准语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与目标标准语音文本对应的目标标准语音文本语义向量;
第二熟练度参数计算单元,用于计算所述交互语音文本语义向量与所述目标标准语音文本语义向量之间的欧式距离,以作为所述第二熟练度参数。
在本实施例中,当服务器接收了所述交互语音数据以及调用了本地的目标标准语音数据后,为了计算交互语音文本与目标标准语音文本之间的相似度,可以先通过语音识别模型(例如N-gram模型,也即多元模型)对所述交互语音数据进行语音识别得到交互语音文本,也对所述目标标准语音数据进行语音识别,得到目标标准语音文本,这样计算交互语音文本与目标标准语音文本之间的相似度则是计算文本之间的相似度。计算文本之间的相似度,则可以参考第一参数获取单元103的具体实施方式。通过这一方式量化了用户对目标讲解文本的掌握熟练程度。
在用户画像获取单元101、第一分类单元102及第一参数获取单元103中实现了拜访人的讲解文本初始学习过程,在第二分类单元104、虚拟交互模型获取单元105、交互语音获取单元106及第二参数获取单元107中则实现了拜访人的模拟仿真交互练习的练习过程。上述过程是基于服务器根据用户画像推荐的目标讲解文本进行学习和练习,无需人工整理目标讲解文本,提高了目标讲解文本的获取效率。
拜访信息获取单元108,用于若检测到用户端上传的待拜访人资料获取指令,获取与所 述待拜访人资料获取指令对应的请求人信息和待拜访人信息;其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像,所述待拜访人信息包括待拜访人用户画像和待拜访人产品需求信息。
在本实施例中,当拜访人基于服务器提供的目标讲解文本提供了学习和练习后,此时可以与待拜访人建立在线视频会议连接后,基于服务器自动推荐的文本与待拜访人进行沟通。
具体的,当服务器检测到用户端上传的待拜访人资料获取指令时,表示之前用户端还未与另一用户端建立视频连接以进行在线会议,此时为了辅助用户端的使用者更好的与另一用户端的使用者高效的进行在线视频会议沟通,可先由用户端向服务器发送待拜访人资料获取指令。当服务器检测到用户端发送的待拜访人资料获取指令时,由服务器根据所述待拜访人资料获取指令对应的请求人信息和待拜访人信息。
其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像。因为在服务器中存储了关于用户端使用者以及另一用户端使用者的大量历史数据,以这些历史数据为数据基础,可以处理得到用户端使用者对应的用户画像(可对应上述的请求人用户画像),且得到与另一用户端使用者对应的用户画像(可对应上述的待拜访人用户画像)。待拜访人产品需求信息是该另一用户端使用者在与用户端使用者进行电话沟通、或是通过通讯软件沟通(如微信、QQ等)时在对话记录中有记载,这一待拜访人产品需求信息可以理解为待拜访人的产品购买意向。
例如,若用户端使用者对应业务人员,另一用户端使用者对应消费者,此时业务人员可以通过在线会议的方式向消费者推荐一些产品。此时请求人用户画像中一般有该业务员精通哪一类型产品的销售的标签,待拜访人用户画像一般有该消费者的用户标签(例如属于哪一年龄段、属于职业群体、工资收入属于哪一收入范围群体),而且在服务器中可以根据待拜访人资料获取指令获取对应待拜访人产品需求信息。通过上述方式,是在服务器中自动根据待拜访人资料获取指令识别或检索出对应数据,无需用户上传过多数据,简化了数据获取流程。
推荐信息生成单元109,用于调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息;其中,所述信息推荐策略用于提取所述请求人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成请求人推荐信息,以及提取待拜访人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成待拜访人推荐信息。
在本实施例中,为了更好的辅助第一类型智能终端使用者向第二类型智能终端使用者推荐若干产品,在两者进行视频连线之前,在服务器中可以为第一类型智能终端使用者生成请求人推荐信息及待拜访人推荐信息。
在一实施例中,推荐信息生成单元109包括:
请求人推荐信息生成单元,用于获取所述信息推荐策略中的第一推荐信息生成策略,根据所述请求人信息、所述待拜访人产品需求信息及所述第一推荐信息生成策略,以生成请求人推荐信息;
待拜访人推荐信息生成单元,用于获取所述信息推荐策略中的第二推荐信息生成策略,根据所述待拜访人信息、所述待拜访人产品需求信息及所述第二推荐信息生成策略,以生成待拜访人推荐信息。
在本实施例中,所述请求人推荐信息可以理解为在服务器中根据请求人用户画像及待拜访人产品需求信息生成请求人推荐信息,也即可根据信息推荐策略筛选出请求人用户画像中的关键标签(如精通寿险A产品),还获取了待拜访人产品需求信息(例如也是寿险A产品),此时可以基于请求人用户画像中的关键标签和待拜访人产品需求信息在服务器的本地数据库中搜索与该待拜访人产品需求信息的产品介绍信息(例如是寿险A产品的投保规则、投保费用、投保年限、险种详细介绍)以作为请求人推荐信息。
所述待拜访人推荐信息可以理解为在服务器中根据待拜访人用户画像及待拜访人产品需 求信息生成待拜访人推荐信息,也即可以根据信息推荐策略筛选出待拜访人用户画像中的关键标签(如中年人,收入范围是月薪20000-30000等),还获取了待拜访人产品需求信息(例如也是寿险A产品),此时可以基于待拜访人用户画像中的关键标签和待拜访人产品需求信息在服务器的本地数据库中搜索针对该类用户标签的话术(该话术可以引导第一类型智能终端使用者按照指定语句顺序与第二类型智能终端使用者进行沟通)以作为待拜访人推荐信息。
在服务器中所生成的请求人推荐信息和待拜访人推荐信息,可以作为第一类型智能终端使用者与第二类型智能终端使用者进行沟通过程的引导数据,这些引导数据的获取是在服务器中自动生成获取,无需用户手动检索,提高了数据获取效率。
该装置实现了基于服务器根据用户画像推荐的目标讲解文本进行学习和练习,无需人工整理目标讲解文本,提高了目标讲解文本的获取效率。
上述基于AI的虚拟交互模型生成装置可以实现为计算机程序的形式,该计算机程序可以在如图4所示的计算机设备上运行。
请参阅图4,图4是本申请实施例提供的计算机设备的示意性框图。该计算机设备500是服务器,服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。
参阅图4,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行基于AI的虚拟交互模型生成方法。
该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行基于AI的虚拟交互模型生成方法。
该网络接口505用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现本申请实施例公开的基于AI的虚拟交互模型生成方法。
本领域技术人员可以理解,图4中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图4所示实施例一致,在此不再赘述。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central ProcessingUnit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
在本申请的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质,也可以是易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现本申请实施例公开的基于AI的虚拟交互模型生成方法。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种基于AI的虚拟交互模型生成方法,其中,包括:
    若检测到用户端发送的虚拟交互对象生成指令,获取本地已存储的与所述用户端对应的目标用户画像,并随机获取本地已存储的客户用户画像;
    调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端;
    接收所述用户端发送的讲解文本练习语音数据,将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数;
    根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,将所述特征集根据所调用的第二分类模型进行分类得到第二分类结果;
    获取本地已存储的AI虚拟交互模型库中与所述第二分类结果相对应的目标AI虚拟交互模型;
    接收所述用户端与所述目标AI虚拟交互模型之间对应的交互语音数据;
    将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数;
    若检测到用户端上传的待拜访人资料获取指令,获取与所述待拜访人资料获取指令对应的请求人信息和待拜访人信息;其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像,所述待拜访人信息包括待拜访人用户画像和待拜访人产品需求信息;以及
    调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息;其中,所述信息推荐策略用于提取所述请求人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成请求人推荐信息,以及提取待拜访人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成待拜访人推荐信息。
  2. 根据权利要求1所述的基于AI的虚拟交互模型生成方法,其中,所述调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端,包括:
    获取所述目标用户画像和所述客户用户画像中包括的标签以组成标签关键词集,将所述标签关键词集根据所调用的第一分类模型进行分类得到第一分类结果;其中,所述第一分类模型是卷积神经网络模型;
    获取本地已存储的讲解文本库中与所述第一分类结果相对应的目标讲解文本,将所述目标讲解文本发送至用户端。
  3. 根据权利要求1所述的基于AI的虚拟交互模型生成方法,其中,所述将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数,包括:
    调用预先训练的语音识别模型以对所述讲解文本语音数据进行语音识别,得到语音识别文本;
    获取所述目标讲解文本的分节符,以将所述语音识别文本对应进行分节得到语音文本分节集,并获取所述目标讲解文本的目标文本分节集;其中,所述语音文本分节集中包括多个子语音文本,所述目标文本分节集包括多个目标子语音文本,所述语音文本分节集中子语音文本的总个数与所述目标文本分节集中目标子语音文本的总个数相同;
    将各子语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与各子语音文本分别对应的子文本语义向量,以串接组成文本语义向量;
    将各目标子语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与 各目标子语音文本分别对应的目标子文本语义向量,以串接组成目标文本语义向量;
    计算所述文本语义向量与所述目标文本语义向量之间的欧式距离,以作为所述第一熟练度参数。
  4. 根据权利要求1所述的基于AI的虚拟交互模型生成方法,其中,所述根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,包括:
    调用预先存储的第一用户画像挑选策略,以获取与所述目标用户画像对应的第一关键词集合;
    将所述第一关键词集合中各关键词均转化为对应的词向量,以串接组成第一关键词向量;
    调用预先存储的第二用户画像挑选策略,以获取与所述客户用户画像对应的第二关键词集合;
    将所述第二关键词集合中各关键词均转化为对应的词向量,以串接组成第二关键词向量;
    将所述第一熟练度参数、所述第一关键词向量和所述第二关键词向量依序串接组成特征集。
  5. 根据权利要求1所述的基于AI的虚拟交互模型生成方法,其中,所述将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数,包括:
    调用预先训练的语音识别模型以对所述交互语音数据进行语音识别,得到交互语音文本;
    调用所述语音识别模型以对所述目标标准语音数据进行语音识别,得到目标标准语音文本;
    将所述交互语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与交互语音文本对应的交互语音文本语义向量;
    将所述目标标准语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与目标标准语音文本对应的目标标准语音文本语义向量;
    计算所述交互语音文本语义向量与所述目标标准语音文本语义向量之间的欧式距离,以作为所述第二熟练度参数。
  6. 根据权利要求1所述的基于AI的虚拟交互模型生成方法,其中,所述调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息,包括:
    获取所述信息推荐策略中的第一推荐信息生成策略,根据所述第一熟练度参数、第二熟练度参数、目标用户画像、所述待拜访人产品需求信息及所述第一推荐信息生成策略,以生成请求人推荐信息;
    获取所述信息推荐策略中的第二推荐信息生成策略,根据所述待拜访人信息、所述待拜访人产品需求信息及所述第二推荐信息生成策略,以生成待拜访人推荐信息。
  7. 根据权利要求1所述的基于AI的虚拟交互模型生成方法,其中,还包括:
    接收另一用户端所发送的与所述待拜访人推荐信息相对应的用户回复语音数据并存储。
  8. 根据权利要求2所述的基于AI的虚拟交互模型生成方法,其中,所述获取所述目标用户画像和所述客户用户画像中包括的标签以组成标签关键词集,包括:
    通过预先设置的关键词筛选策略获取所述目标用户画像中包括的多个标签对应的关键词和所述客户用户画像中包括的标签对应的关键词中的核心关键词以组成标签关键词集。
  9. 一种基于AI的虚拟交互模型生成装置,其中,包括:
    用户画像获取单元,用于若检测到用户端发送的虚拟交互对象生成指令,获取本地已存储的与所述用户端对应的目标用户画像,并随机获取本地已存储的客户用户画像;
    第一分类单元,用于调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端;
    第一参数获取单元,用于接收所述用户端发送的讲解文本练习语音数据,将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数;
    第二分类单元,用于根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,将所述特征集根据所调用的第二分类模型进行分类得到第二分类结果;
    虚拟交互模型获取单元,用于获取本地已存储的AI虚拟交互模型库中与所述第二分类结果相对应的目标AI虚拟交互模型;
    交互语音获取单元,用于接收所述用户端与所述目标AI虚拟交互模型之间对应的交互语音数据;
    第二参数获取单元,用于将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数;
    拜访信息获取单元,用于若检测到用户端上传的待拜访人资料获取指令,获取与所述待拜访人资料获取指令对应的请求人信息和待拜访人信息;其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像,所述待拜访人信息包括待拜访人用户画像和待拜访人产品需求信息;以及
    推荐信息生成单元,用于调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息;其中,所述信息推荐策略用于提取所述请求人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成请求人推荐信息,以及提取待拜访人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成待拜访人推荐信息。
  10. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:
    若检测到用户端发送的虚拟交互对象生成指令,获取本地已存储的与所述用户端对应的目标用户画像,并随机获取本地已存储的客户用户画像;
    调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端;
    接收所述用户端发送的讲解文本练习语音数据,将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数;
    根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,将所述特征集根据所调用的第二分类模型进行分类得到第二分类结果;
    获取本地已存储的AI虚拟交互模型库中与所述第二分类结果相对应的目标AI虚拟交互模型;
    接收所述用户端与所述目标AI虚拟交互模型之间对应的交互语音数据;
    将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数;
    若检测到用户端上传的待拜访人资料获取指令,获取与所述待拜访人资料获取指令对应的请求人信息和待拜访人信息;其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像,所述待拜访人信息包括待拜访人用户画像和待拜访人产品需求信息;以及
    调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息;其中,所述信息推荐策略用于提取所述请求人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成请求人推荐信息,以及提取待拜访人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成待拜访人推荐信息。
  11. 根据权利要求10所述的计算机设备,其中,所述调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端,包括:
    获取所述目标用户画像和所述客户用户画像中包括的标签以组成标签关键词集,将所述标签关键词集根据所调用的第一分类模型进行分类得到第一分类结果;其中,所述第一分类模型是卷积神经网络模型;
    获取本地已存储的讲解文本库中与所述第一分类结果相对应的目标讲解文本,将所述目标讲解文本发送至用户端。
  12. 根据权利要求10所述的计算机设备,其中,所述将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数,包括:
    调用预先训练的语音识别模型以对所述讲解文本语音数据进行语音识别,得到语音识别文本;
    获取所述目标讲解文本的分节符,以将所述语音识别文本对应进行分节得到语音文本分节集,并获取所述目标讲解文本的目标文本分节集;其中,所述语音文本分节集中包括多个子语音文本,所述目标文本分节集包括多个目标子语音文本,所述语音文本分节集中子语音文本的总个数与所述目标文本分节集中目标子语音文本的总个数相同;
    将各子语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与各子语音文本分别对应的子文本语义向量,以串接组成文本语义向量;
    将各目标子语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与各目标子语音文本分别对应的目标子文本语义向量,以串接组成目标文本语义向量;
    计算所述文本语义向量与所述目标文本语义向量之间的欧式距离,以作为所述第一熟练度参数。
  13. 根据权利要求10所述的计算机设备,其中,所述根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,包括:
    调用预先存储的第一用户画像挑选策略,以获取与所述目标用户画像对应的第一关键词集合;
    将所述第一关键词集合中各关键词均转化为对应的词向量,以串接组成第一关键词向量;
    调用预先存储的第二用户画像挑选策略,以获取与所述客户用户画像对应的第二关键词集合;
    将所述第二关键词集合中各关键词均转化为对应的词向量,以串接组成第二关键词向量;
    将所述第一熟练度参数、所述第一关键词向量和所述第二关键词向量依序串接组成特征集。
  14. 根据权利要求10所述的计算机设备,其中,所述将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数,包括:
    调用预先训练的语音识别模型以对所述交互语音数据进行语音识别,得到交互语音文本;
    调用所述语音识别模型以对所述目标标准语音数据进行语音识别,得到目标标准语音文本;
    将所述交互语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与交互语音文本对应的交互语音文本语义向量;
    将所述目标标准语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与目标标准语音文本对应的目标标准语音文本语义向量;
    计算所述交互语音文本语义向量与所述目标标准语音文本语义向量之间的欧式距离,以作为所述第二熟练度参数。
  15. 根据权利要求10所述的计算机设备,其中,所述调用预先存储的信息推荐策略,根 据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息,包括:
    获取所述信息推荐策略中的第一推荐信息生成策略,根据所述第一熟练度参数、第二熟练度参数、目标用户画像、所述待拜访人产品需求信息及所述第一推荐信息生成策略,以生成请求人推荐信息;
    获取所述信息推荐策略中的第二推荐信息生成策略,根据所述待拜访人信息、所述待拜访人产品需求信息及所述第二推荐信息生成策略,以生成待拜访人推荐信息。
  16. 根据权利要求10所述的计算机设备,其中,还包括:
    接收另一用户端所发送的与所述待拜访人推荐信息相对应的用户回复语音数据并存储。
  17. 根据权利要求11所述的计算机设备,其中,所述获取所述目标用户画像和所述客户用户画像中包括的标签以组成标签关键词集,包括:
    通过预先设置的关键词筛选策略获取所述目标用户画像中包括的多个标签对应的关键词和所述客户用户画像中包括的标签对应的关键词中的核心关键词以组成标签关键词集。
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:
    若检测到用户端发送的虚拟交互对象生成指令,获取本地已存储的与所述用户端对应的目标用户画像,并随机获取本地已存储的客户用户画像;
    调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端;
    接收所述用户端发送的讲解文本练习语音数据,将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数;
    根据所述第一熟练度参数、所述目标用户画像及随机获取的客户用户画像进行特征提取以得到特征集,将所述特征集根据所调用的第二分类模型进行分类得到第二分类结果;
    获取本地已存储的AI虚拟交互模型库中与所述第二分类结果相对应的目标AI虚拟交互模型;
    接收所述用户端与所述目标AI虚拟交互模型之间对应的交互语音数据;
    将所述交互语音数据与所述目标AI虚拟交互模型对应的目标标准语音数据进行相似度计算,得到与所述客户端对应的第二熟练度参数;
    若检测到用户端上传的待拜访人资料获取指令,获取与所述待拜访人资料获取指令对应的请求人信息和待拜访人信息;其中,所述请求人信息包括第一熟练度参数、第二熟练度参数和目标用户画像,所述待拜访人信息包括待拜访人用户画像和待拜访人产品需求信息;以及
    调用预先存储的信息推荐策略,根据所述请求人信息、待拜访人信息、待拜访人产品需求信息及所述信息推荐策略,生成请求人推荐信息及待拜访人推荐信息;其中,所述信息推荐策略用于提取所述请求人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成请求人推荐信息,以及提取待拜访人用户画像中的若干个关键标签以与所述待拜访人产品需求信息生成待拜访人推荐信息。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述调用第一分类模型,以获取与所述目标用户画像和所述客户用户画像对应的第一分类结果,根据所述第一分类结果在本地已存储的讲解文本库中获取对应的目标讲解文本,将所述目标讲解文本发送至用户端,包括:
    获取所述目标用户画像和所述客户用户画像中包括的标签以组成标签关键词集,将所述标签关键词集根据所调用的第一分类模型进行分类得到第一分类结果;其中,所述第一分类模型是卷积神经网络模型;
    获取本地已存储的讲解文本库中与所述第一分类结果相对应的目标讲解文本,将所述目 标讲解文本发送至用户端。
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述将所述讲解文本语音数据对应的语音识别文本与所述目标讲解文本进行相似度计算,得到与所述客户端对应的第一熟练度参数,包括:
    调用预先训练的语音识别模型以对所述讲解文本语音数据进行语音识别,得到语音识别文本;
    获取所述目标讲解文本的分节符,以将所述语音识别文本对应进行分节得到语音文本分节集,并获取所述目标讲解文本的目标文本分节集;其中,所述语音文本分节集中包括多个子语音文本,所述目标文本分节集包括多个目标子语音文本,所述语音文本分节集中子语音文本的总个数与所述目标文本分节集中目标子语音文本的总个数相同;
    将各子语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与各子语音文本分别对应的子文本语义向量,以串接组成文本语义向量;
    将各目标子语音文本依次进行分词、关键词提取、词向量转化和语义向量获取,得到与各目标子语音文本分别对应的目标子文本语义向量,以串接组成目标文本语义向量;
    计算所述文本语义向量与所述目标文本语义向量之间的欧式距离,以作为所述第一熟练度参数。
PCT/CN2021/091300 2020-11-03 2021-04-30 基于ai的虚拟交互模型生成方法、装置、计算机设备及存储介质 WO2022095380A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011209226.4 2020-11-03
CN202011209226.4A CN112346567B (zh) 2020-11-03 2020-11-03 基于ai的虚拟交互模型生成方法、装置及计算机设备

Publications (1)

Publication Number Publication Date
WO2022095380A1 true WO2022095380A1 (zh) 2022-05-12

Family

ID=74356169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/091300 WO2022095380A1 (zh) 2020-11-03 2021-04-30 基于ai的虚拟交互模型生成方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN112346567B (zh)
WO (1) WO2022095380A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115857704A (zh) * 2023-03-03 2023-03-28 北京黑油数字展览股份有限公司 一种基于元宇宙的展览系统、交互方法以及电子设备
CN116737936A (zh) * 2023-06-21 2023-09-12 圣风多媒体科技(上海)有限公司 一种基于人工智能的ai虚拟人物语言库分类管理系统
CN116741143A (zh) * 2023-08-14 2023-09-12 深圳市加推科技有限公司 基于数字分身的个性化ai名片的交互方法及相关组件
CN116976821A (zh) * 2023-08-03 2023-10-31 广东企企通科技有限公司 企业问题反馈信息处理方法、装置、设备及介质
CN117274421A (zh) * 2023-11-06 2023-12-22 北京中数文化科技有限公司 一种基于ai智能终端的交互场景照片制作方法
CN117422002A (zh) * 2023-12-19 2024-01-19 利尔达科技集团股份有限公司 一种基于aigc的嵌入式产品生成方法、系统及存储介质
CN117422002B (zh) * 2023-12-19 2024-04-19 利尔达科技集团股份有限公司 一种基于aigc的嵌入式产品生成方法、系统及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112346567B (zh) * 2020-11-03 2022-12-06 平安科技(深圳)有限公司 基于ai的虚拟交互模型生成方法、装置及计算机设备
CN112989022B (zh) * 2021-03-16 2022-11-25 中国平安人寿保险股份有限公司 虚拟文本智能选取方法、装置以及计算机设备
CN113591489B (zh) * 2021-07-30 2023-07-18 中国平安人寿保险股份有限公司 语音交互方法、装置及相关设备
CN114218363B (zh) * 2021-11-23 2023-04-18 深圳市领深信息技术有限公司 基于大数据和ai的服务内容生成方法及人工智能云系统
CN114816064A (zh) * 2022-04-27 2022-07-29 深圳微言科技有限责任公司 用于解释人工智能模型的方法、装置及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347863A (zh) * 2019-06-28 2019-10-18 腾讯科技(深圳)有限公司 话术推荐方法和装置及存储介质
CN110610705A (zh) * 2019-09-20 2019-12-24 上海数鸣人工智能科技有限公司 一种基于人工智能的语音交互提示器
CN110782318A (zh) * 2019-10-21 2020-02-11 五竹科技(天津)有限公司 基于音频交互的营销方法、装置以及存储介质
CN111259132A (zh) * 2020-01-16 2020-06-09 中国平安财产保险股份有限公司 话术推荐方法、装置、计算机设备和存储介质
US10691897B1 (en) * 2019-08-29 2020-06-23 Accenture Global Solutions Limited Artificial intelligence based virtual agent trainer
CN112346567A (zh) * 2020-11-03 2021-02-09 平安科技(深圳)有限公司 基于ai的虚拟交互模型生成方法、装置及计算机设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10467792B1 (en) * 2017-08-24 2019-11-05 Amazon Technologies, Inc. Simulating communication expressions using virtual objects
CN110503502A (zh) * 2018-05-17 2019-11-26 中国移动通信集团有限公司 一种业务推荐方法、设备、装置和计算机可读存储介质
CN109146610B (zh) * 2018-07-16 2022-08-09 众安在线财产保险股份有限公司 一种智能保险推荐方法、装置及智能保险机器人设备
CN109670110B (zh) * 2018-12-20 2023-05-12 蒋文军 一种教育资源推荐方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347863A (zh) * 2019-06-28 2019-10-18 腾讯科技(深圳)有限公司 话术推荐方法和装置及存储介质
US10691897B1 (en) * 2019-08-29 2020-06-23 Accenture Global Solutions Limited Artificial intelligence based virtual agent trainer
CN110610705A (zh) * 2019-09-20 2019-12-24 上海数鸣人工智能科技有限公司 一种基于人工智能的语音交互提示器
CN110782318A (zh) * 2019-10-21 2020-02-11 五竹科技(天津)有限公司 基于音频交互的营销方法、装置以及存储介质
CN111259132A (zh) * 2020-01-16 2020-06-09 中国平安财产保险股份有限公司 话术推荐方法、装置、计算机设备和存储介质
CN112346567A (zh) * 2020-11-03 2021-02-09 平安科技(深圳)有限公司 基于ai的虚拟交互模型生成方法、装置及计算机设备

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115857704A (zh) * 2023-03-03 2023-03-28 北京黑油数字展览股份有限公司 一种基于元宇宙的展览系统、交互方法以及电子设备
CN116737936A (zh) * 2023-06-21 2023-09-12 圣风多媒体科技(上海)有限公司 一种基于人工智能的ai虚拟人物语言库分类管理系统
CN116737936B (zh) * 2023-06-21 2024-01-02 圣风多媒体科技(上海)有限公司 一种基于人工智能的ai虚拟人物语言库分类管理系统
CN116976821A (zh) * 2023-08-03 2023-10-31 广东企企通科技有限公司 企业问题反馈信息处理方法、装置、设备及介质
CN116976821B (zh) * 2023-08-03 2024-02-13 广东企企通科技有限公司 企业问题反馈信息处理方法、装置、设备及介质
CN116741143A (zh) * 2023-08-14 2023-09-12 深圳市加推科技有限公司 基于数字分身的个性化ai名片的交互方法及相关组件
CN116741143B (zh) * 2023-08-14 2023-10-31 深圳市加推科技有限公司 基于数字分身的个性化ai名片的交互方法及相关组件
CN117274421A (zh) * 2023-11-06 2023-12-22 北京中数文化科技有限公司 一种基于ai智能终端的交互场景照片制作方法
CN117274421B (zh) * 2023-11-06 2024-04-02 北京中数文化科技有限公司 一种基于ai智能终端的交互场景照片制作方法
CN117422002A (zh) * 2023-12-19 2024-01-19 利尔达科技集团股份有限公司 一种基于aigc的嵌入式产品生成方法、系统及存储介质
CN117422002B (zh) * 2023-12-19 2024-04-19 利尔达科技集团股份有限公司 一种基于aigc的嵌入式产品生成方法、系统及存储介质

Also Published As

Publication number Publication date
CN112346567A (zh) 2021-02-09
CN112346567B (zh) 2022-12-06

Similar Documents

Publication Publication Date Title
WO2022095380A1 (zh) 基于ai的虚拟交互模型生成方法、装置、计算机设备及存储介质
US11645547B2 (en) Human-machine interactive method and device based on artificial intelligence
TW202009749A (zh) 人機對話方法、裝置、電子設備及電腦可讀媒體
WO2021114841A1 (zh) 一种用户报告的生成方法及终端设备
CN107886949A (zh) 一种内容推荐方法及装置
WO2021218028A1 (zh) 基于人工智能的面试内容精炼方法、装置、设备及介质
TWI727476B (zh) 適性化職缺媒合系統及方法
CN110175229B (zh) 一种基于自然语言进行在线培训的方法和系统
WO2021218029A1 (zh) 基于人工智能的面试方法、装置、计算机设备及存储介质
US20200051451A1 (en) Short answer grade prediction
CN111930792B (zh) 数据资源的标注方法、装置、存储介质及电子设备
CN108960574A (zh) 问答的质量确定方法、装置、服务器和存储介质
WO2021169485A1 (zh) 一种对话生成方法、装置及计算机设备
CN111651497A (zh) 用户标签挖掘方法、装置、存储介质及电子设备
CN114328817A (zh) 一种文本处理方法和装置
CN116049557A (zh) 一种基于多模态预训练模型的教育资源推荐方法
CN114065720A (zh) 会议纪要生成方法、装置、存储介质及电子设备
CN113342948A (zh) 一种智能问答方法及装置
CN114138960A (zh) 用户意图识别方法、装置、设备及介质
WO2021047103A1 (zh) 一种语音识别方法及装置
WO2023119520A1 (ja) 推定装置、推定方法、及びプログラム
CN117453895B (zh) 一种智能客服应答方法、装置、设备及可读存储介质
WO2023119521A1 (ja) 可視化情報生成装置、可視化情報生成方法、及びプログラム
AU2022204665B2 (en) Automated search and presentation computing system
CN116913278B (zh) 语音处理方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21888084

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16/08/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21888084

Country of ref document: EP

Kind code of ref document: A1