CN119311836A - Intention recognition method, device, computer equipment and medium based on artificial intelligence - Google Patents

Intention recognition method, device, computer equipment and medium based on artificial intelligence Download PDF

Info

Publication number
CN119311836A
CN119311836A CN202411501424.6A CN202411501424A CN119311836A CN 119311836 A CN119311836 A CN 119311836A CN 202411501424 A CN202411501424 A CN 202411501424A CN 119311836 A CN119311836 A CN 119311836A
Authority
CN
China
Prior art keywords
feature
dimension
voice call
preset
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411501424.6A
Other languages
Chinese (zh)
Inventor
于金阁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202411501424.6A priority Critical patent/CN119311836A/en
Publication of CN119311836A publication Critical patent/CN119311836A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请属于人工智能技术领域,涉及一种基于人工智能的意图识别方法、装置、计算机设备及存储介质,应用于金融科技领域中,包括:获取与目标用户对应的语音通话数据及通话文本数据;对语音通话数据进行特征提取得到情绪维度特征;对语音通话数据进行特征提取得到态度维度特征;基于大语言模型对通话文本数据进行特征提取得到认知维度特征;基于目标用户的基础信息生成对应的基础信息特征;对情绪维度特征、态度维度特征、认知维度特征以及基础信息特征进行整合处理得到目标特征向量;基于意图识别模型对目标特征向量进行预测处理得到产品购买意图结果。此外,产品购买意图结果可存储于区块链中。本申请提高了产品购买意图的识别效率与识别精度。

The present application belongs to the field of artificial intelligence technology, and relates to an intention recognition method, device, computer equipment and storage medium based on artificial intelligence, which is applied in the field of financial technology, including: obtaining voice call data and call text data corresponding to the target user; extracting features from the voice call data to obtain emotional dimension features; extracting features from the voice call data to obtain attitude dimension features; extracting features from the call text data based on a large language model to obtain cognitive dimension features; generating corresponding basic information features based on the basic information of the target user; integrating and processing the emotional dimension features, attitude dimension features, cognitive dimension features and basic information features to obtain a target feature vector; predicting and processing the target feature vector based on the intention recognition model to obtain the product purchase intention result. In addition, the product purchase intention result can be stored in the blockchain. The present application improves the recognition efficiency and accuracy of product purchase intention.

Description

Artificial intelligence-based intention recognition method, apparatus, computer device and medium
Technical Field
The application relates to the technical field of artificial intelligence development and the technical field of finance, in particular to an artificial intelligence-based intention recognition method, an artificial intelligence-based intention recognition device, computer equipment and a storage medium.
Background
In the insurance industry, with the increasing intensity of market competition and the diversification of consumer demands, how to efficiently and accurately recommend insurance products to potential customers with high purchase intention has become a key challenge for insurance companies to promote sales performance, optimize resource allocation and enhance customer satisfaction. Traditionally, the task of assessing the manner in which a customer purchases a product has been largely dependent upon the personal experience and subjective judgment of the business person, however, this manner has significant limitations and disadvantages.
First, business personnel's personal judgment is often affected by a variety of factors, including but not limited to personal experience, emotional state, familiarity with products, and interference with external environments, which can lead to large differences in purchase intent assessment by different business personnel for the same customer, lacking uniformity and objectivity.
Second, business personnel have limited effort and it is difficult to accurately identify potential customers that truly have high purchase intent in a large customer base. Without an effective screening mechanism, business personnel may have to invest a great deal of time and effort in marketing to low-intent customers, which not only wastes valuable sales resources, but also can compromise customer experience due to frequent disturbance, reducing brand acceptance.
In summary, the existing method of evaluating the intention of a customer to purchase a product relying on manual work in the insurance industry has the problems of low efficiency and low accuracy.
Disclosure of Invention
The embodiment of the application aims to provide an artificial intelligence-based intention recognition method, an artificial intelligence-based intention recognition device, computer equipment and a storage medium, so as to solve the technical problems that a mode which is adopted in the existing insurance industry and relies on manual evaluation of the intention of a customer to purchase a product is low in efficiency and low in accuracy.
In order to solve the technical problems, the embodiment of the application provides an artificial intelligence-based intention recognition method, which adopts the following technical scheme:
Acquiring voice call data corresponding to a target user and call text data corresponding to the voice call data;
Performing feature extraction on the voice call data based on a preset emotion extraction strategy to obtain corresponding emotion dimension features;
Performing feature extraction on the voice call data based on a preset attitude extraction strategy to obtain corresponding attitude dimension features;
Performing feature extraction on the call text data based on a preset large language model to obtain corresponding cognitive dimension features;
acquiring basic information of the target user, and generating corresponding basic information features based on the basic information;
Integrating the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature to obtain a corresponding target feature vector;
And predicting the target feature vector based on a preset intention recognition model to obtain a product purchase intention result corresponding to the target user.
Further, the step of extracting features of the voice call data based on a preset emotion extraction policy to obtain corresponding emotion dimension features specifically includes:
extracting a first audio file containing user sound of the target user from the voice call data;
Splitting the first audio file into corresponding first audio fragments;
calculating an average pitch of the first audio segment, and calculating an average intensity of the first audio segment;
Generating a corresponding emotion valence level feature based on the average pitch;
Generating corresponding emotional arousal level features based on the average sound intensity;
And generating emotion dimension characteristics corresponding to the voice call data based on the emotion valence level characteristics and the emotion arousal level characteristics.
Further, the step of extracting the first audio file containing the user sound of the target user from the voice call data specifically includes:
acquiring a preset noise reduction strategy;
Performing voice noise reduction processing on the voice call data based on the noise reduction strategy to obtain corresponding initial audio;
performing human voice separation processing on the initial audio based on a preset separation algorithm to obtain corresponding user audio and seat audio;
And taking the user audio as the first audio file.
Further, the step of extracting features of the voice call data based on a preset attitude extraction strategy to obtain corresponding attitude dimension features specifically includes:
extracting a second audio file containing user sound of the target user from the voice call data;
splitting the second audio file into corresponding second audio fragments;
Calculating the average speech rate of the second audio segment;
and generating attitude dimension features corresponding to the voice call data based on the average speech speed.
Further, the step of extracting features of the call text data based on the preset large language model to obtain corresponding cognitive dimension features specifically includes:
invoking the large language model;
Coding the call text data to obtain a corresponding embedded vector;
Performing cognitive dimension feature extraction processing on the embedded vector based on the large language model to obtain a corresponding first appointed feature;
The cognitive dimensional feature is generated based on the first specified feature.
Further, the step of integrating the emotion dimension feature, the attitude dimension feature, the cognitive dimension feature and the basic information feature to obtain a corresponding target feature vector specifically includes:
acquiring a preset feature fusion strategy;
Based on the feature fusion strategy, carrying out fusion processing on the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature to obtain a fused second designated feature;
The second designated feature is taken as the target feature vector.
Further, before the step of predicting the target feature vector based on the preset intention recognition model to obtain the product purchase intention result corresponding to the target user, the method further includes:
acquiring initial sample data acquired in advance, wherein the initial sample data comprises basic information of a user, voice call data of the user and corresponding product purchase results;
Preprocessing the initial sample data to obtain corresponding first sample data;
Performing feature engineering processing on the first sample data based on a preset feature scoring model and the large language model to obtain corresponding second sample data;
calling a preset initial model;
training and evaluating the initial model based on the second sample data to obtain a specified model conforming to preset construction conditions;
The specified model is taken as the intention recognition model.
In order to solve the technical problems, the embodiment of the application also provides an intention recognition device based on artificial intelligence, which adopts the following technical scheme:
The first acquisition module is used for acquiring voice call data corresponding to a target user and call text data corresponding to the voice call data;
The first extraction module is used for carrying out feature extraction on the voice call data based on a preset emotion extraction strategy to obtain corresponding emotion dimension features;
The second extraction module is used for extracting the characteristics of the voice call data based on a preset attitude extraction strategy to obtain corresponding attitude dimension characteristics;
the third extraction module is used for extracting the characteristics of the call text data based on a preset large language model to obtain corresponding cognitive dimension characteristics;
the generation module is used for acquiring the basic information of the target user and generating corresponding basic information characteristics based on the basic information;
The integration module is used for integrating the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature to obtain a corresponding target feature vector;
And the prediction module is used for predicting the target feature vector based on a preset intention recognition model to obtain a product purchase intention result corresponding to the target user.
In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:
Acquiring voice call data corresponding to a target user and call text data corresponding to the voice call data;
Performing feature extraction on the voice call data based on a preset emotion extraction strategy to obtain corresponding emotion dimension features;
Performing feature extraction on the voice call data based on a preset attitude extraction strategy to obtain corresponding attitude dimension features;
Performing feature extraction on the call text data based on a preset large language model to obtain corresponding cognitive dimension features;
acquiring basic information of the target user, and generating corresponding basic information features based on the basic information;
Integrating the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature to obtain a corresponding target feature vector;
And predicting the target feature vector based on a preset intention recognition model to obtain a product purchase intention result corresponding to the target user.
In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:
Acquiring voice call data corresponding to a target user and call text data corresponding to the voice call data;
Performing feature extraction on the voice call data based on a preset emotion extraction strategy to obtain corresponding emotion dimension features;
Performing feature extraction on the voice call data based on a preset attitude extraction strategy to obtain corresponding attitude dimension features;
Performing feature extraction on the call text data based on a preset large language model to obtain corresponding cognitive dimension features;
acquiring basic information of the target user, and generating corresponding basic information features based on the basic information;
Integrating the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature to obtain a corresponding target feature vector;
And predicting the target feature vector based on a preset intention recognition model to obtain a product purchase intention result corresponding to the target user.
Compared with the prior art, the embodiment of the application has the following main beneficial effects:
The method comprises the steps of firstly obtaining voice call data corresponding to a target user and call text data corresponding to the voice call data, then carrying out feature extraction on the voice call data based on a preset emotion extraction strategy to obtain corresponding emotion dimension features, carrying out feature extraction on the voice call data based on a preset attitude extraction strategy to obtain corresponding attitude dimension features, carrying out feature extraction on the call text data based on a preset large language model to obtain corresponding cognition dimension features, then obtaining basic information of the target user, generating corresponding basic information features based on the basic information, carrying out integration processing on the emotion dimension features, the attitude dimension features, the cognition dimension features and the basic information features to obtain corresponding target feature vectors, and finally carrying out prediction processing on the target feature vectors based on a preset intention recognition model to obtain a product purchasing intention corresponding to the target user. According to the application, voice call data, call text data and basic information corresponding to a target user are obtained, and then emotion dimension characteristics, attitude dimension characteristics, cognition dimension characteristics and basic information characteristics corresponding to the target user are respectively extracted from four dimensions such as emotion, attitude, cognition and basic information based on an emotion extraction strategy, an attitude extraction strategy and the use of a large language model, and then target feature vectors obtained by integrating the four dimension characteristics are predicted based on the use of an intention recognition model, so that a product purchase intention result corresponding to the target user can be rapidly and accurately generated, the recognition efficiency of the product purchase intention is effectively improved, and the recognition precision of the product purchase intention is improved.
Drawings
In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of an artificial intelligence based intent recognition method in accordance with the present application;
FIG. 3 is a schematic diagram of one embodiment of an artificial intelligence based intent recognition device in accordance with the present application;
FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used in the description herein are used for the purpose of describing particular embodiments only and are not intended to limit the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include a terminal device 101, a network 102, and a server 103, where the terminal device 101 may be a notebook 1011, a tablet 1012, or a cell phone 1013. Network 102 is the medium used to provide communication links between terminal device 101 and server 103. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 103 via the network 102 using the terminal device 101 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal device 101.
The terminal device 101 may be various electronic devices having a display screen and supporting web browsing, and the terminal device 101 may be an electronic book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer III), an MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer IV) player, a laptop portable computer, a desktop computer, and the like, in addition to the notebook 1011, the tablet 1012, or the mobile phone 1013.
The server 103 may be a server providing various services, such as a background server providing support for pages displayed on the terminal device 101.
It should be noted that, the artificial intelligence-based intention recognition method provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the artificial intelligence-based intention recognition device is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow chart of one embodiment of an artificial intelligence based intent recognition method in accordance with the present application is shown. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs. The artificial intelligence-based intention recognition method provided by the embodiment of the application can be applied to any scene needing to recognize the purchasing intention of a product, and can be applied to products in the scenes, such as the purchasing intention recognition of financial products in the field of financial insurance. The artificial intelligence-based intention recognition method comprises the following steps:
step S201, obtaining voice call data corresponding to a target user and call text data corresponding to the voice call data.
In this embodiment, the electronic device (e.g., the server/terminal device shown in fig. 1) on which the artificial intelligence-based intention recognition method operates may acquire the voice call data through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection. The execution subject of the present application is specifically an intent recognition system, or simply a system. The application can be applied to conversation scenes under the product marketing business of financial insurance, such as customer service systems, sales platforms and the like. The voice call data can be call data in a voice form collected by customer service in a business processing process of recommending insurance products to a target user. The voice conversion text processing can be performed on the voice call data, so that corresponding call text data can be obtained.
Step S202, feature extraction is carried out on the voice call data based on a preset emotion extraction strategy, and corresponding emotion dimension features are obtained.
In this embodiment, the foregoing feature extraction is performed on the voice call data based on a preset emotion extraction policy, so as to obtain a specific implementation process of the corresponding emotion dimension feature, which will be described in further detail in the following specific embodiments, which will not be described herein.
And step S203, carrying out feature extraction on the voice call data based on a preset attitude extraction strategy to obtain corresponding attitude dimension features.
In this embodiment, the foregoing feature extraction is performed on the voice call data based on the preset attitude extraction policy, so as to obtain a specific implementation process of the corresponding attitude dimension feature.
And step S204, carrying out feature extraction on the call text data based on a preset large language model to obtain corresponding cognitive dimension features.
In this embodiment, the feature extraction is performed on the call text data based on the preset large language model to obtain a specific implementation process of the corresponding cognitive dimension feature, which will be described in further detail in the following specific embodiments, which will not be described in any more detail herein.
Step S205, basic information of the target user is obtained, and corresponding basic information features are generated based on the basic information.
In this embodiment, the user database may be queried by using the user identification (e.g., user name) of the target user to obtain the basic information of the target user. The base information may include age, gender, occupation, etc., and the base information may be further characterized to generate corresponding base information features.
And S206, integrating the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature to obtain a corresponding target feature vector.
In this embodiment, the foregoing process of integrating the emotion dimension feature, the attitude dimension feature, the cognitive dimension feature and the basic information feature to obtain the corresponding target feature vector is described in further detail in the following embodiments, which are not described herein.
Step S207, predicting the target feature vector based on a preset intention recognition model to obtain a product purchase intention result corresponding to the target user.
In this embodiment, the target feature vector is input into the intention recognition model, and prediction processing on the purchase intention is performed on the target feature vector through the intention recognition model, and a corresponding prediction result, that is, the product purchase intention result is returned. The product purchase intention result may be a probability distribution or a direct category label (with or without purchase intention) indicating the probability of whether the target user has an intention to purchase the product. The specific construction process of the intent recognition model will be described in further detail in the following specific embodiments, and will not be described herein.
According to the application, voice call data, call text data and basic information corresponding to a target user are obtained, and then emotion dimension characteristics, attitude dimension characteristics, cognition dimension characteristics and basic information characteristics corresponding to the target user are respectively extracted from four dimensions such as emotion, attitude, cognition and basic information based on an emotion extraction strategy, an attitude extraction strategy and the use of a large language model, and then target feature vectors obtained by integrating the characteristics of the four dimensions are subjected to prediction processing based on the use of an intention recognition model, so that a product purchase intention result corresponding to the target user can be rapidly and accurately generated, the recognition efficiency of the product purchase intention is effectively improved, and the recognition precision of the product purchase intention is improved.
In some alternative implementations, step S202 includes the steps of:
and extracting a first audio file containing the user sound of the target user from the voice call data.
In this embodiment, the above-mentioned implementation process of extracting the first audio file containing the user sound of the target user from the voice call data will be described in further detail in the following embodiments, which will not be described herein.
And splitting the first audio file into corresponding first audio fragments.
In this embodiment, the first audio file may be split into a corresponding plurality of first audio segments, for example, one audio segment every 10 seconds, according to a preset splitting time length.
An average pitch of the first audio segment is calculated, and an average intensity of the first audio segment is calculated.
In this embodiment, the process of calculating the average pitch of the first audio segments includes, for each first audio segment, converting the audio signal from the time domain to the frequency domain by performing a fast Fourier transform on the first audio segment. Then cutting the audio signal into time frames, obtaining the pitch of the time point for each time frame, and averaging the time frames to obtain the average pitch of the first audio piece. And obtaining the average pitch of all the audio fragments containing the voice of the user in the voice call data, and obtaining the emotion tendency quantization index of the target user in the voice call data.
The process of calculating the average intensity of the first audio segments includes, for each first audio segment, slicing a time frame for each first audio segment by performing a fast fourier transform on the first audio segment to obtain the intensity of the time frame. The sound intensity is calculated by calculating the square of the amplitude, summing the squares and then opening the root number. And then calculating the corresponding tone intensities of all frames of a first audio fragment to obtain the average tone intensity of the first audio fragment.
A corresponding emotion valence level feature is generated based on the average pitch.
In this embodiment, the average pitch of all the audio clips including the user's voice in the voice call data is used to obtain the emotion tendency quantization index of the target user in the voice call data, and the index is used as the corresponding emotion valence level feature.
And generating corresponding emotion arousal level characteristics based on the average sound intensity.
In this embodiment, the median of the average intensity of all the first audio segments is selected as the quantization index of the overall emotional intensity of the user, and is used as the corresponding emotional arousal level feature.
And generating emotion dimension characteristics corresponding to the voice call data based on the emotion valence level characteristics and the emotion arousal level characteristics.
In this embodiment, emotion may be considered from two dimensions, emotion Valence (value) and Arousal level (Arousal). The higher the mood titer, the more positive and pleasant the mood of the individual. The higher the arousal level, the more intense and positive the individual's emotion. While positive emotions tend to indicate that the individual has a higher propensity to purchase. Higher tones generally mean that people's emotions are positive. It was found that high energy sounds would express a stronger level of emotional arousal. In summary, high pitch, high volume often presents a strong positive emotion, while low pitch, low volume often implies a relatively negative, flat emotion.
Specifically, by using a two-dimensional coordinate system, emotion points of each voice call data are drawn in the two-dimensional coordinate system with the emotion valence level as the vertical axis and the emotion wake level as the horizontal axis. And analyzing and obtaining corresponding emotion dimension characteristics according to the distribution of the voice call data in the emotion space (the emotion valence level characteristics and the emotion arousal level characteristics). Where, for example, positive emotions are typically located in the positive direction of the potency axis and arousal is moderate or higher, negative emotions preliminary provision may be located in the negative direction of the potency axis and arousal is higher or lower.
The method comprises the steps of extracting a first audio file containing user sound of a target user from voice call data, then segmenting the first audio file into corresponding first audio fragments, then calculating average pitch of the first audio fragments, calculating average sound intensity of the first audio fragments, subsequently generating corresponding emotion valence level characteristics based on the average pitch, generating corresponding emotion awakening level characteristics based on the average sound intensity, and finally generating emotion dimension characteristics corresponding to the voice call data based on the emotion valence level characteristics and the emotion awakening level characteristics, so that emotion dimension characteristics are accurately generated according to emotion valence levels and emotion awakening levels, and feature accuracy of the obtained emotion dimension characteristics is guaranteed.
In some optional implementations of this embodiment, feature extraction is performed on the voice call data based on a preset emotion extraction policy to obtain corresponding emotion dimension features, including the following steps:
and acquiring a preset noise reduction strategy.
In this embodiment, the above-described noise reduction strategy may specifically use a wavelet transform or the like.
And carrying out voice noise reduction processing on the voice call data based on the noise reduction strategy to obtain corresponding initial audio.
In this embodiment, the voice call data is subjected to voice noise reduction processing by using a noise reduction policy, so as to improve accuracy of subsequent processing and obtain corresponding initial audio.
And carrying out voice separation processing on the initial audio based on a preset separation algorithm to obtain corresponding user audio and seat audio.
In this embodiment, the user audio and the seat audio may be obtained by performing fourier transform on the initial audio and separating the human voice in the initial audio by using a spectral analysis method (SPECTRAL ANALYSIS) to further separate the user sound and the seat sound.
And taking the user audio as the first audio file.
The voice call data processing method comprises the steps of obtaining a preset noise reduction strategy, then carrying out voice noise reduction processing on the voice call data based on the noise reduction strategy to obtain corresponding initial audio, then carrying out voice separation processing on the initial audio based on a preset separation algorithm to obtain corresponding user audio and seat audio, and subsequently taking the user audio as the first audio file. According to the voice call data processing method and device, the voice noise reduction processing is carried out on voice call data based on the use of the noise reduction strategy to obtain the initial audio, and then the voice separation processing is carried out on the initial audio based on the use of the preset separation algorithm, so that the first audio file of the user voice of the target user can be intelligently and accurately extracted from the initial audio, the extraction efficiency of the first audio file is improved, and the data accuracy of the obtained first audio file is ensured.
In some alternative implementations, step S203 includes the steps of:
and extracting a second audio file containing the user sound of the target user from the voice call data.
In this embodiment, the implementation process of extracting the second audio file including the user sound of the target user from the voice call data may refer to the processing process of extracting the first audio file including the user sound of the target user from the voice call data, which is not described herein in detail.
And splitting the second audio file into corresponding second audio fragments.
In this embodiment, the implementation process of splitting the second audio file into corresponding second audio segments may refer to the processing process of splitting the first audio file into corresponding first audio segments, which is not described in detail herein.
And calculating the average speech speed of the second audio fragment.
In this embodiment, the process of calculating the average speech speed of the second audio segments includes performing word conversion processing on each second audio segment, calculating a sum of words of each second audio segment, and dividing the sum of words of each second audio segment by time of all second audio segments to obtain the average speech speed of the target user in the second audio segments.
And generating attitude dimension features corresponding to the voice call data based on the average speech speed.
In this embodiment, the slower the speech rate, the more friendly and happy the counterpart can consider. The average speech speed of the target user in the second audio piece can be regarded as a quantization index of the patience level of the target user and used as a attitude dimension characteristic corresponding to the voice call data.
The method and the device for generating the attitude dimension feature of the voice call data comprise the steps of extracting the second audio file containing the user voice of the target user from the voice call data, then segmenting the second audio file into corresponding second audio fragments, then calculating the average speech speed of the second audio fragments, and then generating the attitude dimension feature corresponding to the voice call data based on the average speech speed, so that the corresponding attitude dimension feature can be accurately generated according to the use of the average speech speed of the second audio fragments, and the feature accuracy of the obtained attitude dimension feature is guaranteed.
In some alternative implementations, step S204 includes the steps of:
and calling the large language model.
In this embodiment, the selection of the large language model is not particularly limited, and for example, models such as BERT and GPT may be used that have the functions of performing deep analysis on call text data of the user and extracting features such as points of interest and questions of the user.
And encoding the call text data to obtain a corresponding embedded vector.
In this embodiment, the call text data is encoded to obtain the corresponding embedded vector.
And carrying out cognitive dimension feature extraction processing on the embedded vector based on the large language model to obtain a corresponding first appointed feature.
In this embodiment, the embedded vector is input into the above large language model, and then a classifier built in the large language model or a custom NLP task (such as entity recognition, emotion analysis, etc.) is applied to extract the cognitive dimension features, such as the focus point, the question point, etc.
The cognitive dimensional feature is generated based on the first specified feature.
In this embodiment, the first specified feature may be used as the cognitive dimension feature. Or the first designated feature can be converted into a specific quantization index or label according to actual needs and used as the cognitive dimension feature.
When users represent a more aggressive attitude, it is often advantageous to the final product's achievement. If users have questions about customer service, quotation and risk awareness are required. The application constructs three cognitive dimensional characteristics, and can design more cognitive dimensional characteristics according to different scenes or products. And obtaining 'whether to question customer service', 'whether to offer price', 'whether to have risk awareness' from the call text by using a large language model. For example, as for "whether there is risk awareness", by inputting call text data of a target user into a large language model, if the user hits intention under a risk awareness system and the number of hit intentions is greater than a preset number threshold, the user is considered to have risk awareness, and thus the user is characterized as having a preliminary positive attitude to a product. The corresponding risk awareness system can be pre-constructed according to actual business requirements. By way of example, the risk awareness infrastructure may include inquiring about the effects of comprehensive premium, inquiring about the strength of interchange, inquiring about traffic accident risk, and so forth. In addition, the value of the number threshold is not particularly limited, and may be set according to actual requirements.
The method comprises the steps of calling the large language model, encoding call text data to obtain corresponding embedded vectors, carrying out cognitive dimension feature extraction processing on the embedded vectors based on the large language model to obtain corresponding first appointed features, and generating the cognitive dimension features based on the first appointed features. According to the application, the call text data is encoded to obtain the corresponding embedded vector, and the cognitive dimension feature extraction processing is carried out on the embedded vector based on the use of a large language model, so that the corresponding cognitive dimension feature can be rapidly and accurately extracted, the extraction efficiency of the cognitive dimension feature is improved, and the accuracy of the obtained cognitive dimension feature is ensured.
In some alternative implementations of the present embodiment, step S206 includes the steps of:
and acquiring a preset feature fusion strategy.
In this embodiment, a feature fusion strategy is designed in advance according to actual processing requirements, so as to combine multiple features to form a comprehensive fusion feature representation. Feature fusion strategies include, but are not limited to, stitching, weighted summation, attention mechanisms, principal Component Analysis (PCA), self-encoder, and the like.
And carrying out fusion processing on the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature based on the feature fusion strategy to obtain a fused second designated feature.
In this embodiment, the emotion dimension feature, the attitude dimension feature, the cognitive dimension feature and the basic information feature may be fused according to a fusion processing step included in the feature fusion policy, so as to obtain a corresponding fused second designated feature.
The second designated feature is taken as the target feature vector.
The method comprises the steps of obtaining a preset feature fusion strategy, then carrying out fusion processing on the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature based on the feature fusion strategy to obtain a fused second designated feature, and taking the second designated feature as the target feature vector. According to the application, through fusion processing of the emotion dimension features, the attitude dimension features, the cognition dimension features and the basic information features based on the use of the feature fusion strategy, the integration processing between features can be rapidly and accurately completed, the processing efficiency of the feature fusion processing is improved, the data accuracy of the obtained target feature vector is ensured, the prediction processing of the target feature vector based on a preset intention recognition model is facilitated, the accurate generation of a product purchase intention result corresponding to the target user can be realized, and the recognition accuracy and reliability of the product purchase intention are further improved.
In some optional implementations of this embodiment, before step S207, the electronic device may further perform the following steps:
Acquiring initial sample data acquired in advance, wherein the initial sample data comprises basic information of a user, voice call data of the user and corresponding product purchase results.
In this embodiment, the user's underlying information, such as age, gender, occupation, etc., may be obtained from a structured data source (user database) by using SQL queries. And extracting voice call data between the seat and the user and corresponding purchase results (whether the seat is in contact or not) from the recording system or the storage library.
And preprocessing the initial sample data to obtain corresponding first sample data.
In this embodiment, the preprocessing may include performing missing value processing (e.g., padding with a median) on the initial sample data. And performing one hot encoding on the category data in the initial sample data. And processing the outlier in the initial sample data. The quartiles (Q1, Q2, Q3, Q4) are calculated and result in a quartile distance iqr=q3-Q1. Values less than Q1-1.5×iqr or greater than q3+1.5×iqr are considered outliers, and outliers in the dataset are deleted from the direct initial sample data to obtain corresponding first sample data.
And carrying out feature engineering processing on the first sample data based on a preset feature scoring model and the large language model to obtain corresponding second sample data.
In this embodiment, the feature scoring model may specifically be XGBoost models. Features with scores greater than a preset scoring threshold can be screened out by scoring the importance of the initially extracted first sample data using a XGBoost model. The features screened by XGBoost models can be used as the basis of subsequent feature engineering. In feature engineering, the screened features can be utilized to further generate corresponding emotion dimension features, cognitive dimension features and attitude dimension features. These dimensional features are typically obtained by aggregating, converting, or encoding the screened features. In addition, basic information of the user is obtained, basic information features are built, and further emotion dimension features, attitude dimension features, cognition dimension features, basic information features and purchasing results of the user are integrated, so that second sample data corresponding to the user are obtained.
And calling a preset initial model.
In this embodiment, the initial model may be a comprehensive model constructed by combining a language model, XGBoost model and audio processing technology. The model architecture of the initial model includes 1) an input layer. Text input, namely coding text features (cognitive dimension features and basic information features) through models such as BERT and the like to obtain text embedded vectors. Audio input, namely directly inputting audio features (such as emotion dimension features, attitude dimension features and the like) as numerical features. 2) A hidden layer. By designing a multi-layer neural network (e.g., fully connected layer, convolutional layer, etc.) for learning interactions and associations between text and audio features. The use of attention mechanisms may be considered to enhance the capture of key information by the model. 3) And an output layer. Using a softmax function or a sigmoid function as an activation function of the output layer, buying intent (bi-or multi-classification problem) is predicted from business needs.
And training and evaluating the initial model based on the second sample data to obtain a specified model conforming to preset construction conditions.
In this embodiment, the training and evaluation process of the initial model includes (1) a training process of dividing the second sample data into a training set and a test set. The initial model is trained using the training set data. Suitable loss functions (e.g., cross entropy loss) and optimizers (e.g., adam optimizers) are set. Indexes such as loss function value, accuracy and the like in the training process are monitored, and early stop (early stop) is performed through verification to prevent overfitting. And super parameters such as learning rate, batch size, training times and the like are adjusted according to the requirements. (2) And in the evaluation process, indexes such as accuracy, recall rate, F1 score and the like of the model are evaluated by using the test set data. And analyzing the confusion matrix to know the performances of the model on different categories, especially the situations of misclassification and misclassification. And drawing ROC curves, PR curves and the like, intuitively displaying the model performance, and analyzing the model performance and optimizing the model until a specified model with the model performance conforming to the actual model construction conditions is obtained.
The specified model is taken as the intention recognition model.
The method comprises the steps of obtaining initial sample data acquired in advance, preprocessing the initial sample data to obtain corresponding first sample data, performing feature engineering processing on the first sample data based on a preset feature scoring model and the large language model to obtain corresponding second sample data, calling the preset initial model, training and evaluating the initial model based on the second sample data to obtain a designated model meeting preset construction conditions, and finally taking the designated model as the intention recognition model. According to the method, the first sample data is obtained by preprocessing the initial sample data collected in advance, then the first sample data is subjected to characteristic engineering processing based on the characteristic scoring model and the large language model to obtain the second sample data, and further the initial model is trained and evaluated based on the second sample data to obtain the intention recognition model conforming to the construction conditions, so that the quick construction processing of the intention recognition model is completed, the construction efficiency of the intention recognition model is improved, and the model prediction effect of the generated intention recognition model is ensured.
In some alternative implementations, the obtained user information solicits user consent and meets the specifications of the relevant laws and relevant policies.
In addition, the non-native company software tools or components present in the embodiments of the present application are presented by way of example only and are not representative of actual use.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
It is emphasized that the product purchase intention result may also be stored in a node of a blockchain in order to further ensure privacy and security of the product purchase intention result.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With further reference to fig. 3, as an implementation of the method shown in fig. 2 described above, the present application provides an embodiment of an artificial intelligence-based intent recognition device, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 3, the artificial intelligence based intention recognition device 300 according to the present embodiment includes a first acquisition module 301, a first extraction module 302, a second extraction module 303, a third extraction module 304, a generation module 305, an integration module 306, and a prediction module 307.
Wherein:
A first obtaining module 301, configured to obtain voice call data corresponding to a target user, and call text data corresponding to the voice call data;
The first extraction module 302 is configured to perform feature extraction on the voice call data based on a preset emotion extraction policy, so as to obtain corresponding emotion dimension features;
The second extraction module 303 is configured to perform feature extraction on the voice call data based on a preset attitude extraction policy, so as to obtain a corresponding attitude dimension feature;
The third extraction module 304 is configured to perform feature extraction on the call text data based on a preset large language model, so as to obtain corresponding cognitive dimension features;
A generating module 305, configured to obtain basic information of the target user, and generate corresponding basic information features based on the basic information;
The integration module 306 is configured to integrate the emotion dimension feature, the attitude dimension feature, the cognitive dimension feature, and the basic information feature to obtain a corresponding target feature vector;
and the prediction module 307 is configured to perform prediction processing on the target feature vector based on a preset intention recognition model, so as to obtain a product purchase intention result corresponding to the target user.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based intent recognition method in the foregoing embodiment, and are not described herein again.
In some optional implementations of this embodiment, the first extraction module 302 includes:
A first extraction sub-module, configured to extract a first audio file containing user sound of the target user from the voice call data;
The first segmentation module is used for segmenting the first audio file into corresponding first audio fragments;
a first computing sub-module for computing an average pitch of the first audio segment and computing an average intensity of the first audio segment;
a first generation sub-module for generating corresponding emotion valence level features based on the average pitch;
The second generation submodule is used for generating corresponding emotion arousal level characteristics based on the average sound intensity;
And the third generation sub-module is used for generating emotion dimension characteristics corresponding to the voice call data based on the emotion valence level characteristics and the emotion wakeup level characteristics.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based intent recognition method in the foregoing embodiment, and are not described herein again.
In some optional implementations of this embodiment, the first extraction submodule includes:
the acquisition unit is used for acquiring a preset noise reduction strategy;
The processing unit is used for carrying out voice noise reduction processing on the voice call data based on the noise reduction strategy to obtain corresponding initial audio;
The separation unit is used for carrying out voice separation processing on the initial audio based on a preset separation algorithm to obtain corresponding user audio and seat audio;
And the determining unit is used for taking the user audio as the first audio file.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based intent recognition method in the foregoing embodiment, and are not described herein again.
In some alternative implementations of the present embodiment, the second extraction module 303 includes:
A second extraction sub-module, configured to extract a second audio file containing a user sound of the target user from the voice call data;
the second dividing sub-module is used for dividing the second audio file into corresponding second audio fragments;
A second computing sub-module for computing an average speech rate of the second audio segment;
And the fourth generation sub-module is used for generating attitude dimension features corresponding to the voice call data based on the average speech speed.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based intent recognition method in the foregoing embodiment, and are not described herein again.
In some alternative implementations of the present embodiment, the third extraction module 304 includes:
A calling sub-module for calling the large language model;
The coding submodule is used for coding the call text data to obtain a corresponding embedded vector;
the third extraction submodule is used for carrying out cognitive dimension feature extraction processing on the embedded vector based on the large language model to obtain a corresponding first appointed feature;
And a fifth generation sub-module for generating the cognitive dimension feature based on the first specified feature.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based intent recognition method in the foregoing embodiment, and are not described herein again.
In some alternative implementations of the present embodiment, the integration module 306 includes:
The acquisition sub-module is used for acquiring a preset feature fusion strategy;
The fusion sub-module is used for carrying out fusion processing on the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature based on the feature fusion strategy to obtain a fused second designated feature;
and the determining submodule is used for taking the second specified feature as the target feature vector.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based intent recognition method in the foregoing embodiment, and are not described herein again.
In some optional implementations of the present embodiment, the artificial intelligence based intent recognition device further includes:
the system comprises a first acquisition module, a second acquisition module and a storage module, wherein the first acquisition module is used for acquiring initial sample data acquired in advance, and the initial sample data comprises basic information of a user, voice call data of the user and corresponding product purchase results;
The preprocessing module is used for preprocessing the initial sample data to obtain corresponding first sample data;
The first processing module is used for carrying out feature engineering processing on the first sample data based on a preset feature scoring model and the large language model to obtain corresponding second sample data;
the calling module is used for calling a preset initial model;
The second processing module is used for training and evaluating the initial model based on the second sample data to obtain a specified model conforming to preset construction conditions;
and the determining module is used for taking the specified model as the intention recognition model.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based intent recognition method in the foregoing embodiment, and are not described herein again.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions based on an artificial intelligence intention recognition method. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as executing computer readable instructions of the artificial intelligence based intent recognition method.
The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.
The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the artificial intelligence-based intent recognition method as described above.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims (10)

1. An artificial intelligence based intention recognition method is characterized by comprising the following steps:
Acquiring voice call data corresponding to a target user and call text data corresponding to the voice call data;
Performing feature extraction on the voice call data based on a preset emotion extraction strategy to obtain corresponding emotion dimension features;
Performing feature extraction on the voice call data based on a preset attitude extraction strategy to obtain corresponding attitude dimension features;
Performing feature extraction on the call text data based on a preset large language model to obtain corresponding cognitive dimension features;
acquiring basic information of the target user, and generating corresponding basic information features based on the basic information;
Integrating the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature to obtain a corresponding target feature vector;
And predicting the target feature vector based on a preset intention recognition model to obtain a product purchase intention result corresponding to the target user.
2. The artificial intelligence based intention recognition method according to claim 1, wherein the step of extracting features of the voice call data based on a preset emotion extraction policy to obtain corresponding emotion dimension features specifically comprises:
extracting a first audio file containing user sound of the target user from the voice call data;
Splitting the first audio file into corresponding first audio fragments;
calculating an average pitch of the first audio segment, and calculating an average intensity of the first audio segment;
Generating a corresponding emotion valence level feature based on the average pitch;
Generating corresponding emotional arousal level features based on the average sound intensity;
And generating emotion dimension characteristics corresponding to the voice call data based on the emotion valence level characteristics and the emotion arousal level characteristics.
3. The artificial intelligence based intent recognition method of claim 2, wherein the step of extracting a first audio file containing user sound of the target user from the voice call data specifically comprises:
acquiring a preset noise reduction strategy;
Performing voice noise reduction processing on the voice call data based on the noise reduction strategy to obtain corresponding initial audio;
performing human voice separation processing on the initial audio based on a preset separation algorithm to obtain corresponding user audio and seat audio;
And taking the user audio as the first audio file.
4. The artificial intelligence based intention recognition method according to claim 1, wherein the step of extracting features of the voice call data based on a preset attitude extraction strategy to obtain corresponding attitude dimension features specifically comprises:
extracting a second audio file containing user sound of the target user from the voice call data;
splitting the second audio file into corresponding second audio fragments;
Calculating the average speech rate of the second audio segment;
and generating attitude dimension features corresponding to the voice call data based on the average speech speed.
5. The artificial intelligence based intention recognition method according to claim 1, wherein the step of extracting features of the call text data based on a preset large language model to obtain corresponding cognitive dimension features specifically comprises:
invoking the large language model;
Coding the call text data to obtain a corresponding embedded vector;
Performing cognitive dimension feature extraction processing on the embedded vector based on the large language model to obtain a corresponding first appointed feature;
The cognitive dimensional feature is generated based on the first specified feature.
6. The artificial intelligence based intention recognition method according to claim 1, wherein the step of integrating the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature to obtain a corresponding target feature vector specifically comprises:
acquiring a preset feature fusion strategy;
Based on the feature fusion strategy, carrying out fusion processing on the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature to obtain a fused second designated feature;
The second designated feature is taken as the target feature vector.
7. The artificial intelligence based intention recognition method according to claim 1, further comprising, before the step of predicting the target feature vector based on a preset intention recognition model to obtain a product purchase intention result corresponding to the target user:
acquiring initial sample data acquired in advance, wherein the initial sample data comprises basic information of a user, voice call data of the user and corresponding product purchase results;
Preprocessing the initial sample data to obtain corresponding first sample data;
Performing feature engineering processing on the first sample data based on a preset feature scoring model and the large language model to obtain corresponding second sample data;
calling a preset initial model;
training and evaluating the initial model based on the second sample data to obtain a specified model conforming to preset construction conditions;
The specified model is taken as the intention recognition model.
8. An artificial intelligence based intent recognition device, comprising:
The first acquisition module is used for acquiring voice call data corresponding to a target user and call text data corresponding to the voice call data;
The first extraction module is used for carrying out feature extraction on the voice call data based on a preset emotion extraction strategy to obtain corresponding emotion dimension features;
The second extraction module is used for extracting the characteristics of the voice call data based on a preset attitude extraction strategy to obtain corresponding attitude dimension characteristics;
the third extraction module is used for extracting the characteristics of the call text data based on a preset large language model to obtain corresponding cognitive dimension characteristics;
the generation module is used for acquiring the basic information of the target user and generating corresponding basic information characteristics based on the basic information;
The integration module is used for integrating the emotion dimension feature, the attitude dimension feature, the cognition dimension feature and the basic information feature to obtain a corresponding target feature vector;
And the prediction module is used for predicting the target feature vector based on a preset intention recognition model to obtain a product purchase intention result corresponding to the target user.
9. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the artificial intelligence based intent recognition method as claimed in any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the artificial intelligence based intent recognition method as claimed in any of claims 1 to 7.
CN202411501424.6A 2024-10-24 2024-10-24 Intention recognition method, device, computer equipment and medium based on artificial intelligence Pending CN119311836A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411501424.6A CN119311836A (en) 2024-10-24 2024-10-24 Intention recognition method, device, computer equipment and medium based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411501424.6A CN119311836A (en) 2024-10-24 2024-10-24 Intention recognition method, device, computer equipment and medium based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN119311836A true CN119311836A (en) 2025-01-14

Family

ID=94188556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411501424.6A Pending CN119311836A (en) 2024-10-24 2024-10-24 Intention recognition method, device, computer equipment and medium based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN119311836A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119724607A (en) * 2025-02-21 2025-03-28 福建医科大学附属第一医院 A method for analyzing electronic data in gastroenterology

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119724607A (en) * 2025-02-21 2025-03-28 福建医科大学附属第一医院 A method for analyzing electronic data in gastroenterology

Similar Documents

Publication Publication Date Title
CN119311836A (en) Intention recognition method, device, computer equipment and medium based on artificial intelligence
CN116703515A (en) Recommendation method and device based on artificial intelligence, computer equipment and storage medium
WO2024021685A1 (en) Reply content processing method and media content interactive content interaction method
CN119577148A (en) A text classification method, device, computer equipment and storage medium
CN119130631A (en) Risk assessment method, device, equipment and storage medium based on user description
US20170351973A1 (en) Quantifying creativity in auditory and visual mediums
CN117314586A (en) Product recommendation method, device, computer equipment and storage medium
CN116402625A (en) Customer evaluation method, apparatus, computer device and storage medium
CN116450943A (en) Artificial intelligence-based speaking recommendation method, device, equipment and storage medium
CN117172632B (en) Enterprise abnormal behavior detection method, device, equipment and storage medium
CN116720692A (en) Customer service dispatching method and device, computer equipment and storage medium
CN119580323A (en) Artificial intelligence-based emotion recognition method, device, computer equipment and medium
CN117909489A (en) Data generation method, device, equipment and storage medium based on artificial intelligence
CN119621960A (en) Abstract generation method, device, computer equipment and storage medium
CN120104882A (en) A content recommendation method, device, equipment and medium
CN119537442A (en) A clue data processing method, device, equipment and medium
CN119513420A (en) Information recommendation method, device, equipment and medium
CN117788051A (en) Customer preference analysis method, device, equipment and medium based on artificial intelligence
CN117876021A (en) Data prediction method, device, equipment and storage medium based on artificial intelligence
CN116756539A (en) Project recommendation method, device, computer equipment and storage medium
CN119107131A (en) Method, device, equipment and storage medium for adjusting delivery strategy of delivery object
CN119068914A (en) Emotion recognition method, device, computer equipment and medium based on artificial intelligence
CN119691108A (en) A speech recommendation method and related equipment for agent question and answer
CN119180714A (en) Vehicle insurance claim processing method, device, equipment and medium based on artificial intelligence
CN118297463A (en) Quality inspection method for hospital return visit agent service flow and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination