WO2022156084A1 - 基于人脸和交互文本的目标对象行为预测方法及相关设备 - Google Patents

基于人脸和交互文本的目标对象行为预测方法及相关设备 Download PDF

Info

Publication number
WO2022156084A1
WO2022156084A1 PCT/CN2021/090147 CN2021090147W WO2022156084A1 WO 2022156084 A1 WO2022156084 A1 WO 2022156084A1 CN 2021090147 W CN2021090147 W CN 2021090147W WO 2022156084 A1 WO2022156084 A1 WO 2022156084A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
historical
behavior
real
face
Prior art date
Application number
PCT/CN2021/090147
Other languages
English (en)
French (fr)
Inventor
南海顺
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022156084A1 publication Critical patent/WO2022156084A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a method, device, computer equipment and storage medium for predicting the behavior of a target object based on human face and interactive text.
  • Interview approval is an important part of the credit business.
  • the customer's performance in the interview approval process will be used as a reference for whether or not to approve the loan.
  • the interview approval was manual approval. It can be relatively simple to use the customer's expression combined with the customer's intention. Approved comments.
  • the customer's intention can often only be recognized through the customer's voice and text in the approval process, and it will not be able to identify whether the customer has abnormal behavior during the approval process, such as changes in facial expressions.
  • the purpose of the embodiments of the present application is to propose a target object behavior prediction method, device, computer equipment and storage medium based on human face and interactive text, so as to solve the prediction obtained by using face landmarks in large data set scenarios in the prior art The problem that the accuracy of the results will be greatly reduced.
  • the embodiment of the present application provides a method for predicting the behavior of a target object based on human face and interactive text, and adopts the following technical solutions:
  • a target object behavior prediction method based on human face and interactive text comprising the following steps:
  • the historical interactive text is processed to obtain corresponding historical structured data
  • the historical structured data includes the historical behavior of the sample object
  • face pictures are extracted from the historical video and processed, and generated based on the historical behavior The label of the processed face image
  • the preset first prediction model is trained according to the historical structured data, the first model and the output value of the first model are obtained, and the preset second prediction model is made according to the processed face picture and the label
  • the model is trained to obtain a second model and an output value of the second model; wherein the output value of the first model and the output value of the second model are respectively the probability values of the corresponding model outputting the historical behavior;
  • an LR model is established to fit the historical behavior, and a trained LR model is obtained;
  • real-time interactive text and real-time video of the target object When the real-time interactive text and real-time video of the target object are obtained, corresponding real-time structured data is obtained according to the real-time interactive text, and real-time face pictures are obtained and processed according to the real-time video, and the real-time structured data is input.
  • the processed real-time face picture is input into the second model, and the outputs of the first model and the second model are input into the trained LR model at the same time.
  • the behavior of the target object is predicted.
  • the embodiment of the present application also provides a target object behavior prediction device based on human face and interactive text, which adopts the following technical solutions:
  • the data acquisition module is used to acquire historical interactive texts and historical videos of multiple sample objects
  • the data processing module is used for processing the historical interactive text to obtain corresponding historical structured data, the historical structured data includes the historical behavior of the sample object, and extracts face pictures from the historical video and processes them, Generate a label of the processed face image based on the historical behavior;
  • the model building module is used for training the preset first prediction model according to the historical structured data, obtaining the first model and the output value of the first model, and according to the processed face picture and the label pair
  • the preset second prediction model is trained to obtain the second model and the output value of the second model; wherein the output value of the first model and the output value of the second model are the probability values of the historical behavior output by the corresponding model respectively And according to described first model output value and described second model output value, establish LR model to fit described historical behavior, obtain the LR model after training;
  • the prediction module is used to make the data processing module obtain corresponding real-time structured data according to the real-time interactive text when the real-time interactive text and real-time video of the target object are obtained, and make the data processing module obtain the real-time structured data according to the real-time interactive text.
  • the real-time face picture is obtained from the video and processed, and then the real-time structured data is input into the first model, and the processed real-time face picture is input into the second model.
  • the output of the second model is simultaneously input to the trained LR model to predict the behavior of the target object.
  • the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
  • a computer device comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and the processor implements the following steps when executing the computer-readable instructions:
  • the historical interactive text is processed to obtain corresponding historical structured data
  • the historical structured data includes the historical behavior of the sample object
  • face pictures are extracted from the historical video and processed, and generated based on the historical behavior The label of the processed face image
  • the preset first prediction model is trained according to the historical structured data, the first model and the output value of the first model are obtained, and the preset second prediction model is made according to the processed face picture and the label
  • the model is trained to obtain a second model and an output value of the second model; wherein the output value of the first model and the output value of the second model are respectively the probability values of the corresponding model outputting the historical behavior;
  • an LR model is established to fit the historical behavior, and a trained LR model is obtained;
  • real-time interactive text and real-time video of the target object When the real-time interactive text and real-time video of the target object are obtained, corresponding real-time structured data is obtained according to the real-time interactive text, and real-time face pictures are obtained and processed according to the real-time video, and the real-time structured data is input.
  • the processed real-time face picture is input into the second model, and the outputs of the first model and the second model are input into the trained LR model at the same time.
  • the behavior of the target object is predicted.
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
  • a computer-readable storage medium where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the processor is caused to perform the following steps:
  • the historical interactive text is processed to obtain corresponding historical structured data
  • the historical structured data includes the historical behavior of the sample object
  • face pictures are extracted from the historical video and processed, and generated based on the historical behavior The label of the processed face image
  • the preset first prediction model is trained according to the historical structured data, the first model and the output value of the first model are obtained, and the preset second prediction model is made according to the processed face picture and the label
  • the model is trained to obtain a second model and an output value of the second model; wherein the output value of the first model and the output value of the second model are respectively the probability values of the corresponding model outputting the historical behavior;
  • an LR model is established to fit the historical behavior, and the LR model after training is obtained;
  • real-time interactive text and real-time video of the target object When the real-time interactive text and real-time video of the target object are obtained, corresponding real-time structured data is obtained according to the real-time interactive text, and real-time face pictures are obtained and processed according to the real-time video, and the real-time structured data is input.
  • the processed real-time face picture is input into the second model, and the outputs of the first model and the second model are input into the trained LR model at the same time.
  • the behavior of the target object is predicted.
  • the method, device, computer equipment and storage medium for target object behavior prediction based on human face and interactive text mainly have the following beneficial effects:
  • This scheme trains the corresponding models by acquiring face pictures and interactive texts, outputs the probability of obtaining historical behaviors based on the face pictures and interactive texts, and then uses the joint probability to establish an LR model, and then uses the established LR model to make predictions. Greatly improve the accuracy of prediction results in large dataset scenarios.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for predicting target object behavior based on human faces and interactive texts according to the present application
  • FIG. 3 is a schematic structural diagram of an embodiment of a target object behavior prediction device based on human face and interactive text according to the present application
  • FIG. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
  • the terminal devices 101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4
  • the server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .
  • the target object behavior prediction method based on face and interactive text provided by the embodiment of the present application is generally executed by the server, and accordingly, the target object behavior prediction device based on human face and interactive text is generally set in the server.
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • FIG. 2 shows a flowchart of an embodiment of a method for predicting behavior of a target object based on human face and interactive text according to the present application.
  • the described target object behavior prediction method based on human face and interactive text comprises the following steps:
  • S203 train a preset first prediction model according to the historical structured data, obtain a first model and an output value of the first model, and compare the preset first prediction model according to the processed face picture and the label
  • the second prediction model is trained to obtain the second model and the output value of the second model; wherein the output value of the first model and the output value of the second model are the probability values of the historical behavior output by the corresponding model;
  • step S201 in some business scenarios based on terminal interaction, it is necessary to predict the user's behavior, and perform a specific operation according to the predicted behavior.
  • the user here is the sample object and the target object that needs to be predicted.
  • text interaction When performing terminal interaction, text interaction, voice interaction or video interaction can be performed, such as text interaction formed by question and answer with robot customer service, voice interaction formed by question and answer of intelligent voice dialogue robot, and video interaction formed by face-to-face review of AI approval robot, etc.
  • different interaction types may be implemented in different terminals, or all the aforementioned interaction types may be implemented in the same terminal.
  • the interactive text can be obtained directly based on the text interaction, and the interactive text can be obtained indirectly by performing speech recognition and conversion on the recorded voice and video based on the voice interaction and video interaction.
  • These directly or indirectly obtained interactive texts form the historical interactive text;
  • For video interaction historical video can be directly obtained by recording the interaction process.
  • This embodiment specifically takes the credit business scenario as an example for description.
  • the data generated in the credit business scenario can be divided into pre-loan data, in-loan data and post-loan data.
  • pre-loan data, in-loan data, and post-loan data are obtained.
  • the present embodiment processes the historical interactive text to obtain the corresponding historical structured data.
  • the data corresponding to the target field is extracted for each historical interactive text of the sample object, so that each sample object corresponds to at least one Initial structured data in multiple dimensions.
  • the processing of the historical interactive text to obtain corresponding historical structured data includes: processing the historical interactive text to obtain at least one piece of structured data, and when multiple pieces of structured data are obtained, Combine the multiple pieces of structured data, and then perform data processing on the combined structured data to obtain historical behavior, and add the historical behavior as a field to the combined structured data to generate the Historical structured data.
  • the data processing includes useless data elimination, data conversion and data calculation, etc. After data processing, the fields and dimensions of the final historical structured data will change compared to the initially obtained historical structured data.
  • the sample object is a loaned customer
  • the target fields can be the customer's ID, loan principal (Loan_amount), days overdue/advance (DAY), due date, actual repayment date, etc.
  • Each target field represents a dimension of structured data. If the historical interactive text is obtained from the customer's historical pre-loan data, historical loan data and historical post-loan data, the following initial data structure is obtained:
  • the subsequent steps will use the final historical structured data for modeling to establish a structured data model, that is, the subsequent first model, which will be explained later. It should be noted that the above data merging and data processing processes can be performed synchronously.
  • the step of extracting and processing the face picture from the historical video includes: adopting a frame-by-frame interception method to intercept the face picture of the sample object from the historical video and adding a time stamp, Sorting the face pictures according to the time stamps, calculating the similarity of adjacent face pictures, and screening the intercepted face pictures based on the obtained similarity to obtain several face pictures to be marked;
  • the face image is subjected to face key point detection, the eye key points are selected from the face key points, the center coordinates of the two eyes are calculated based on the eye key points, and the eye key points are combined based on the center coordinates. Rotate the face image at the key points to align the face, and obtain the processed face image.
  • this group of face pictures can get two degrees of similarity. For example, the five pictures are sorted as A, B, C, D, and E. Take the first three pictures A, B, and C to get A The similarity with B, and the similarity between B and C, when the two similarity values are both greater than the preset threshold, the face picture in the middle of this group of face pictures will be eliminated, such as A and B.
  • B will be removed to obtain a new ranking A, C, D, and E, and then the top three images in the new ranking will be taken, namely A, C, and D.
  • the similarity between A and C and the similarity between C and D is less than the preset setting, keep the first face picture A, and then delay the acquisition of the three pictures C and D after one bit. , E to judge the similarity, and so on to complete the screening of the remaining face pictures.
  • the comparison array of the grayscale pixels of the two face images to be performed similarity calculation can be obtained, the Hamming distance of the two drawings can be obtained by comparing the array, and then the Hamming distance can be used to calculate the two images. similarity of graphs.
  • other existing methods can also be used to calculate the similarity between the two images.
  • the face pictures are eliminated, and the two face pictures with close similarity can be retained. While eliminating similar pictures, it can ensure that the training data is more abundant and improve the Accuracy of subsequent model training.
  • the angles of the faces in the several face pictures to be marked are often different in the above-mentioned embodiment, and most of the faces are inclined at different degrees. Therefore, the marked faces should be treated before marking.
  • the pictures are processed, specifically for data cleaning and face angle inclination correction, the faces in all face pictures are cut out, processed into uniform pixels, and the face angles are unified.
  • the specific face correction process is as follows:
  • the key points of the face are detected; for example, 68 key points are extracted in this embodiment.
  • the center coordinates of both eyes can be calculated by (3):
  • the step of generating the label of the processed face picture based on the historical behavior includes: determining the associated sample object according to the historical video corresponding to the face picture, and then according to the The associated sample object reads the corresponding historical behavior from the corresponding historical structured data to generate the label of the face picture, that is, the label of the face picture here is determined according to the historical structured data of the sample object.
  • the sample object is a loaned customer.
  • the historical video can be associated with its corresponding historical structured data through customer information (such as customer ID), and then Based on the fields corresponding to historical behaviors in the historical structured data, the corresponding face picture labels are obtained. For example, according to the historical structured data after customer loan, high-risk customer labels, medium-risk overdue customer labels, and low-risk overdue customer labels can be generated for face pictures. Customer tags and risk-free overdue customer tags, etc.
  • training the preset first prediction model according to the historical structured data includes extracting the input variables and output targets of the first prediction model from the historical structured data, for example, the sample object of the credit business scenario is a loan , the input variables can be the fields "Loan_amount”, "DAY”, etc. in the historical structured data, the output target is the field "Is the current month overdue” in the historical structured data, and then based on the input variables and output targets Model training.
  • the process of obtaining input variables includes variable screening and reconstruction.
  • the specific process includes: using decision tree to bin, by calculating information gain, dividing each single independent variable to generate a decision tree, and according to the binning result, the variable IV value is calculated as Sort in descending order, start training the first prediction model from the independent variable with the largest IV value, and add a new independent variable to train the model each time until the AUC value of the model reaches the maximum value and no longer changes, then the independent variable at this time is the input variable .
  • the AUC value is 0.8
  • the AUC value is 0.81
  • the AUC value is 0.9
  • the first m+1 independent variables x 1 ,...,x m+1 are selected to train the model, and the AUC value is 0.8999, then Select independent variables x1,..., xm as input variables.
  • the first prediction model is a classification model, specifically an xgboost model.
  • the data corresponding to the input variables is first divided into training sets and validation sets.
  • Set input the training set into the xgboost model for training, and then verify the effect of the trained model through the validation set.
  • the model effect reaches the preset condition, complete the model training, and obtain the first model.
  • sigmoid can convert the current
  • the output corresponding to the sample object is converted into a probability value, and the probability value refers to a probability value that the model output is a historical behavior, that is, the first model output value.
  • the second prediction model of this embodiment adopts the Resnet_100 model, and training the preset second prediction model according to the face picture and the label includes: And the corresponding label data is divided into training set and verification set, then the training set is input into the Resnet_100 model, and then the model effect after training is verified through the verification set. When the model effect reaches the preset condition, the model training is completed, and the second model is obtained. model, the last output layer of the second model is connected to softmax to output its corresponding probability value, that is, the output value of the second model.
  • the output value of the first model and the output value of the second model are used as the two dimensions of prediction to construct the model.
  • this One of the two dimensions is the overdue probability value of historical structured data, and the other is the overdue probability value of facial features.
  • an LR model is established to fit whether the customer is overdue or not. Through the selected ten-fold cross-validation set, two kinds of probabilities are obtained. The weight of the value completes the construction of the LR model.
  • the modeling method is to find the overdue probability of historical structured data through the LR model
  • the corresponding weights w1 and w2 are obtained, and the model construction is completed. .
  • step S205 the above-mentioned steps S201 to S204 are the process of constructing a prediction model.
  • the behavior of the target object is predicted based on the constructed prediction model, and the trigger point for executing this step is the text and video interaction of the target object, such as that used by the customer during the video approval interview.
  • the real-time interactive text and real-time video of the target object are obtained, the corresponding real-time structured data is obtained according to the real-time interactive text, and the process of obtaining and processing a real-time face picture according to the real-time video is the same as the historical structured data and face of step S202.
  • the image acquisition process is the same.
  • the target object is a customer who has already borrowed a loan or a customer applying for a loan.
  • the process of predicting whether the customer applying for a loan will be overdue after the application is successful is as follows:
  • the current pre-loan data real-time interactive text
  • the face picture data can be obtained from the real-time video screenshot, and the data is processed into the input format of the first model and the second model After that, input them into the first model and the second model respectively to obtain two corresponding overdue probability values, and finally input the two probability values into the trained LR model to obtain whether the customer has overdue tendency in the later stage.
  • the trained LR model After predicting the behavior of the target object, the method further includes: causing the trained LR model to output an update weight of the output value of the first model and the output value of the second model, based on the The update weight obtains preset information, and sends the preset information to the target object, so that the target object performs new real-time text interaction and/or real-time video interaction according to the preset information, by acquiring new real-time interactive text and/or real-time video to update the behavior prediction results of the target object.
  • the above two probability values can also be output. based on the update weight, obtain preset information based on the update weight, and send the preset information to the target object, so that the target object can perform new text interaction and video interaction according to the preset information. to update the forecast.
  • two probability values are input into the trained LR model to obtain whether the customer has an overdue tendency in the later stage, and at the same time, the update weights of the two probability values are also obtained.
  • the method further includes: recording new interactive texts, videos and behaviors of existing objects and newly added objects, according to the new interactive texts, Video and behavior update historical data, and after processing the updated historical data into a corresponding data format, the first model, the second model and the trained LR model are trained and optimized.
  • this embodiment can optimize the model constructed above.
  • the historical data is updated by recording new interactive texts, videos, and behaviors after the interview and approval of existing objects and new objects, and the updated historical data is processed into The corresponding data format is then used as the input data of the optimization algorithm to optimize the model.
  • Part of the newly added historical data is the structured data of existing objects, such as the pre-loan behavior data of customers in the credit business, and the other part is the data of incremental objects. Therefore, the probability of the output of the first model based on the structured data is The value distribution may change over time. With the change of new data, some old variables may become invalid, and new variables will be added at the same time. When adding new data to continuously optimize the model, it is necessary to re-screen variables to perform the above model training. process.
  • the target object behavior prediction method based on face and interactive text trains corresponding models by acquiring face pictures and interactive texts respectively, outputs the probability of obtaining historical behaviors based on the face pictures and interactive texts, and then adopts the joint probability Establishing an LR model and then predicting through the established LR model can greatly improve the accuracy of the prediction results in large data set scenarios.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the present application may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, computer readable instructions, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • the computer-readable instructions can be stored in a computer-readable storage medium.
  • the computer-readable instructions to instruct, when executed may include the processes of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
  • the present application provides an embodiment of a target object behavior prediction device based on human face and interactive text, which is the same as the method shown in FIG. 2 .
  • the apparatus can be specifically applied to various electronic devices.
  • the apparatus for predicting target object behavior based on face and interactive text described in this embodiment includes: a data acquisition module 301 , a data processing module 302 , a model construction module 303 and a prediction module 304 .
  • the data acquisition module 301 is used to acquire historical interactive texts and historical videos of multiple sample objects.
  • the data processing module 302 is configured to process the historical interactive text to obtain corresponding historical structured data, where the historical structured data includes the historical behavior of the sample object, extract face pictures from the historical video, and processing, generating a label of the processed face picture based on the historical behavior.
  • the model building module 303 is used for training the preset first prediction model according to the historical structured data, to obtain the first model and the output value of the first model, and according to the processed face picture and the The label trains the preset second prediction model to obtain the second model and the output value of the second model; wherein the output value of the first model and the output value of the second model are respectively the output value of the historical behavior corresponding to the model output. probability value; and establishing an LR model according to the output value of the first model and the output value of the second model to fit the historical behavior to obtain a trained LR model.
  • the prediction module 304 is configured to make the data processing module 301 obtain corresponding real-time structured data according to the real-time interactive text when the real-time interactive text and real-time video of the target object are obtained, and make the data processing module 302 Obtain real-time face pictures according to the real-time video and process them, then input the real-time structured data into the first model, and input the processed real-time face pictures into the second model at the same time.
  • the outputs of the first model and the second model are simultaneously input to the trained LR model to predict the behavior of the target object.
  • the user here is the sample object and the target object that needs to be predicted.
  • text interaction, voice interaction or video interaction can be performed, such as text interaction formed by question and answer with robot customer service, voice interaction formed by question and answer of intelligent voice dialogue robot, and video interaction formed by face-to-face review of AI approval robot, etc.
  • different interaction types may be implemented in different terminals, or all the aforementioned interaction types may be implemented in the same terminal.
  • the interactive text can be obtained directly based on the text interaction, and the interactive text can be obtained indirectly by performing speech recognition and conversion on the recorded voice and video based on the voice interaction and video interaction. These directly or indirectly obtained interactive texts form the historical interactive text; For video interaction, historical video can be directly obtained by recording the interaction process.
  • the data processing module 302 in this embodiment processes the historical interactive text to obtain the corresponding historical structured data. Specifically, the data corresponding to the target field is extracted for each historical interactive text of the sample object, so that each sample object corresponds to at least A piece of initial structured data with multiple dimensions.
  • the data processing module 302 when the data processing module 302 processes the historical interactive text to obtain corresponding historical structured data, the data processing module 302 is specifically configured to: process the historical interactive text to obtain at least one piece of structured data, and then process the historical interactive text to obtain at least one piece of structured data.
  • the multiple pieces of structured data are obtained, the multiple pieces of structured data are combined and processed, and then the combined structured data is processed to obtain historical behaviors, and the historical behaviors are added as a field to the combined processed data.
  • the structured data is generated, and the historical structured data is generated.
  • the data processing includes useless data elimination, data conversion and data calculation, etc. After data processing, the fields and dimensions of the final historical structured data will change compared to the initially obtained historical structured data. For details, reference may be made to the above method embodiments, which are not expanded here.
  • the data processing module 302 when the data processing module 302 extracts a face picture from the historical video and processes it, the data processing module 302 is specifically used to: adopt a frame-by-frame interception method to intercept the human face of the sample object from the historical video pictures and add timestamps, sort each of the face pictures according to the timestamps, calculate the similarity of adjacent face pictures, filter the intercepted face pictures based on the obtained similarity, and obtain a number of to-be-marked face pictures.
  • the data processing module 302 when the data processing module 302 generates the label of the processed face picture based on the historical behavior, it is specifically configured to: determine the associated historical video according to the historical video corresponding to the face picture sample object, and then read the corresponding historical behavior from the corresponding historical structured data according to the associated sample object to generate the label of the face picture, that is, the label of the face picture here is structured according to the history of the sample object Data OK.
  • the data processing module 302 when the data processing module 302 generates the label of the processed face picture based on the historical behavior, it is specifically configured to: determine the associated historical video according to the historical video corresponding to the face picture sample object, and then read the corresponding historical behavior from the corresponding historical structured data according to the associated sample object to generate the label of the face picture, that is, the label of the face picture here is structured according to the history of the sample object Data OK.
  • the model building module 303 training the preset first prediction model according to the historical structured data includes extracting input variables and output targets of the first prediction model from the historical structured data , and then perform model training based on the input variables and output targets.
  • the process of acquiring the input variables includes variable screening and reconstruction.
  • the first prediction model is a classification model, specifically an xgboost model
  • the second prediction model in this embodiment adopts the Resnet_100 model.
  • the trained LR model will output the weight of the output value of the first model and the output value of the second model.
  • the prediction module 304 predicts the behavior of the target object, it is further configured to make the trained LR model output the update weight of the output value of the first model and the output value of the second model, based on the The update weight obtains preset information, and sends the preset information to the target object, so that the target object performs new real-time text interaction and/or real-time video interaction according to the preset information, by acquiring new real-time interactive text and/or real-time video to update the behavior prediction results of the target object.
  • the update weight obtains preset information, and sends the preset information to the target object, so that the target object performs new real-time text interaction and/or real-time video interaction according to the preset information, by acquiring new real-time interactive text and/or real-time video to update the behavior prediction results of the target object.
  • the model building module 303 is further configured to record new interactive texts, videos and behaviors of existing objects and newly added objects after obtaining the trained LR model, and update them according to the new interactive texts, videos and behaviors.
  • the first model, the second model and the trained LR model are trained and optimized after the updated historical data is processed into a corresponding data format.
  • the target object behavior prediction device based on face and interactive text trains corresponding models by acquiring face pictures and interactive texts respectively, outputs the probability of obtaining historical behaviors based on face pictures and interactive texts, and then adopts the joint probability Establishing an LR model and then predicting through the established LR model can greatly improve the accuracy of the prediction results in large data set scenarios.
  • FIG. 4 is a block diagram of a basic structure of a computer device according to this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that communicate with each other through a system bus.
  • the memory 41 stores computer-readable instructions
  • the processor 42 implements the above when executing the computer-readable instructions.
  • the steps of the target object behavior prediction method based on human face and interactive text described in the method embodiment have beneficial effects corresponding to the above-mentioned method for target object behavior prediction based on human face and interactive text, and are not described here.
  • the computer device 4 having the memory 41, the processor 42, and the network interface 43 is shown in the figure, but it should be understood that it is not required to implement all the components shown, and more or more components may be implemented instead. Fewer components.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
  • the memory 41 includes at least one type of readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the readable storage medium Including flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable only memory Read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 41 may be an internal storage unit of the computer device 4 , such as a hard disk or a memory of the computer device 4 .
  • the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is generally used to store the operating system and various application software installed on the computer device 4, such as computer-readable instructions corresponding to the above-mentioned method for predicting the behavior of a target object based on human faces and interactive texts Wait.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. This processor 42 is typically used to control the overall operation of the computer device 4 . In this embodiment, the processor 42 is configured to execute computer-readable instructions or process data stored in the memory 41 , for example, execute computer-readable instructions corresponding to the target object behavior prediction method based on human face and interactive text instruction.
  • CPU Central Processing Unit
  • controller a controller
  • microcontroller a microcontroller
  • microprocessor microprocessor
  • the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
  • the present application also provides another implementation manner, that is, to provide a computer-readable storage medium
  • the computer-readable storage medium may be non-volatile or volatile
  • the computer-readable storage medium stores Computer-readable instructions that can be executed by at least one processor to cause the at least one processor to perform the steps of the above-mentioned method for predicting the behavior of a target object based on human faces and interactive texts, and have the same steps as the above-mentioned methods.
  • the corresponding beneficial effects of the target object behavior prediction method based on face and interactive text are not expanded here.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical embodiments of the present application can be embodied in the form of software products that are essentially or contribute to the prior art.
  • the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, etc. , CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Databases & Information Systems (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

一种基于人脸和交互文本的目标对象行为预测方法及相关设备,属于人工智能领域,所述方法包括:获取历史交互文本和历史视频并处理,得到包含历史行为的历史结构化数据和人脸图片,基于历史行为生成人脸图片标签;根据历史结构化数据得到第一模型和第一模型输出值,根据人脸图片和标签得到第二模型和第二模型输出值;根据第一模型输出值和第二模型输出值得到LR模型;当获取到实时交互文本和实时视频时,对应获得实时结构化数据和实时人脸图片分别输入第一模型和第二模型,将模型输出同时输入LR模型进行行为预测。还涉及区块链技术,历史交互文本中的隐私数据可存储于区块链中,可在大数据集场景中提高预测结果的准确性。

Description

基于人脸和交互文本的目标对象行为预测方法及相关设备
本申请要求于2021年01月22日提交中国专利局、申请号为202110090632.1,发明名称为“基于人脸和交互文本的目标对象行为预测方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于人脸和交互文本的目标对象行为预测方法、装置、计算机设备及存储介质。
背景技术
面谈审批是信贷业务的重要环节,客户在面谈审批环节中的表现会作为能否通过审批放款的参考,以往面谈审批为人工审批,可以相对简单地通过客户的表情结合客户的意图来给出是否能通过审批的意见。然而对于AI审批机器人,在审批环节往往只能通过客户的语音文本来识别客户的意图,对于客户在审批过程中是否存在异常行为表现,例如表情的变化等,将无法识别。
发明人意识到,现有根据人脸特征来预测个人行为表现的方案中,仅根据客户的人脸landmark(特征点)来预测个人的行为,在小数据集样本(几千量级)上能达到较高准确率,但是对于样本集达到几十万量级的大数据集场景,采用人脸landmark所得到的预测结果准确率将大幅降低,导致使用人脸landmark进行行为预测变得不可用。
发明内容
本申请实施例的目的在于提出一种基于人脸和交互文本的目标对象行为预测方法、装置、计算机设备及存储介质,以解决现有技术中大数据集场景中采用人脸landmark所得到的预测结果准确率将大幅降低的问题。
为了解决上述技术问题,本申请实施例提供一种基于人脸和交互文本的目标对象行为预测方法,采用了如下所述的技术方案:
一种基于人脸和交互文本的目标对象行为预测方法,包括下述步骤:
获取多个样本对象的历史交互文本和历史视频;
对所述历史交互文本进行处理得到对应的历史结构化数据,所述历史结构化数据包含所述样本对象的历史行为,从所述历史视频中提取人脸图片并处理,基于所述历史行为生成处理后的人脸图片的标签;
根据所述历史结构化数据对预设的第一预测模型进行训练,得到第一模型和第一模型输出值,并根据所述处理后的人脸图片和所述标签对预设的第二预测模型进行训练,得到第二模型和第二模型输出值;其中所述第一模型输出值和所述第二模型输出值分别为对应的模型输出所述历史行为的概率值;
根据所述第一模型输出值和所述第二模型输出值建立LR模型拟合所述历史行为,得到训练后的LR模型;
当获取到目标对象的实时交互文本和实时视频时,根据所述实时交互文本获取对应的实时结构化数据,并根据所述实时视频获取实时人脸图片并处理,将所述实时结构化数据输入所述第一模型,同时将处理后的所述实时人脸图片输入所述第二模型,将所述第一模型和所述第二模型的输出同时输入至所述训练后的LR模型,对所述目标对象的行为进行预测。
为了解决上述技术问题,本申请实施例还提供一种基于人脸和交互文本的目标对象行为预测装置,采用了如下所述的技术方案:
数据获取模块,用于获取多个样本对象的历史交互文本和历史视频;
数据处理模块,用于对所述历史交互文本进行处理得到对应的历史结构化数据,所述历史结构化数据包含所述样本对象的历史行为,从所述历史视频中提取人脸图片并处理,基于所述历史行为生成处理后的人脸图片的标签;
模型构建模块,用于根据所述历史结构化数据对预设的第一预测模型进行训练,得到第一模型和第一模型输出值,并根据所述处理后的人脸图片和所述标签对预设的第二预测模型进行训练,得到第二模型和第二模型输出值;其中所述第一模型输出值和所述第二模型输出值分别为对应的模型输出所述历史行为的概率值;以及根据所述第一模型输出值和所述第二模型输出值建立LR模型拟合所述历史行为,得到训练后的LR模型;
预测模块,用于当获取到目标对象的实时交互文本和实时视频时,使所述数据处理模块根据所述实时交互文本获取对应的实时结构化数据,并使所述数据处理模块根据所述实时视频获取实时人脸图片并处理,再将所述实时结构化数据输入所述第一模型,同时将处理后的所述实时人脸图片输入所述第二模型,将所述第一模型和所述第二模型的输出同时输入至所述训练后的LR模型,对所述目标对象的行为进行预测。
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下的步骤:
获取多个样本对象的历史交互文本和历史视频;
对所述历史交互文本进行处理得到对应的历史结构化数据,所述历史结构化数据包含所述样本对象的历史行为,从所述历史视频中提取人脸图片并处理,基于所述历史行为生成处理后的人脸图片的标签;
根据所述历史结构化数据对预设的第一预测模型进行训练,得到第一模型和第一模型输出值,并根据所述处理后的人脸图片和所述标签对预设的第二预测模型进行训练,得到第二模型和第二模型输出值;其中所述第一模型输出值和所述第二模型输出值分别为对应的模型输出所述历史行为的概率值;
根据所述第一模型输出值和所述第二模型输出值建立LR模型拟合所述历史行为,得到训练后的LR模型;
当获取到目标对象的实时交互文本和实时视频时,根据所述实时交互文本获取对应的实时结构化数据,并根据所述实时视频获取实时人脸图片并处理,将所述实时结构化数据输入所述第一模型,同时将处理后的所述实时人脸图片输入所述第二模型,将所述第一模型和所述第二模型的输出同时输入至所述训练后的LR模型,对所述目标对象的行为进行预测。
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器执行如下步骤:
获取多个样本对象的历史交互文本和历史视频;
对所述历史交互文本进行处理得到对应的历史结构化数据,所述历史结构化数据包含所述样本对象的历史行为,从所述历史视频中提取人脸图片并处理,基于所述历史行为生成处理后的人脸图片的标签;
根据所述历史结构化数据对预设的第一预测模型进行训练,得到第一模型和第一模型输出值,并根据所述处理后的人脸图片和所述标签对预设的第二预测模型进行训练,得到第二模型和第二模型输出值;其中所述第一模型输出值和所述第二模型输出值分别为对应的模型输出所述历史行为的概率值;
根据所述第一模型输出值和所述第二模型输出值建立LR模型拟合所述历史行为,得 到训练后的LR模型;
当获取到目标对象的实时交互文本和实时视频时,根据所述实时交互文本获取对应的实时结构化数据,并根据所述实时视频获取实时人脸图片并处理,将所述实时结构化数据输入所述第一模型,同时将处理后的所述实时人脸图片输入所述第二模型,将所述第一模型和所述第二模型的输出同时输入至所述训练后的LR模型,对所述目标对象的行为进行预测。
与现有技术相比,本申请实施例提供的基于人脸和交互文本的目标对象行为预测方法、装置、计算机设备及存储介质主要有以下有益效果:
本方案通过分别获取人脸图片和交互文本分别训练对应的模型,输出基于人脸图片和交互文本得到历史行为的概率,再采用联合概率建立LR模型,之后再通过建立的LR模型进行预测,可以在大数据集场景中大大提高预测结果的准确性。
附图说明
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,下面描述中的附图对应于本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的基于人脸和交互文本的目标对象行为预测方法的一个实施例的流程图;
图3是根据本申请的基于人脸和交互文本的目标对象行为预测装置的一个实施例的结构示意图;
图4是根据本申请的计算机设备的一个实施例的结构示意图。
具体实施方式
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式 计算机等等。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。
需要说明的是,本申请实施例所提供的基于人脸和交互文本的目标对象行为预测方法一般由服务器执行,相应地,基于人脸和交互文本的目标对象行为预测装置一般设置于服务器中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,其示出了根据本申请的基于人脸和交互文本的目标对象行为预测方法的一个实施例的流程图。所述的基于人脸和交互文本的目标对象行为预测方法包括以下步骤:
S201,获取多个样本对象的历史交互文本和历史视频;
S202,对所述历史交互文本进行处理得到对应的历史结构化数据,所述历史结构化数据包含所述样本对象的历史行为,从所述历史视频中提取人脸图片并处理,基于所述历史行为生成处理后的人脸图片的标签;
S203,根据所述历史结构化数据对预设的第一预测模型进行训练,得到第一模型和第一模型输出值,并根据所述处理后的人脸图片和所述标签对预设的第二预测模型进行训练,得到第二模型和第二模型输出值;其中所述第一模型输出值和所述第二模型输出值分别为对应的模型输出所述历史行为的概率值;
S204,根据所述第一模型输出值和所述第二模型输出值建立LR模型拟合所述历史行为,得到训练后的LR模型;
S205,当获取到目标对象的实时交互文本和实时视频时,根据所述实时交互文本获取对应的实时结构化数据,并根据所述实时视频获取实时人脸图片并处理,将所述实时结构化数据输入所述第一模型,同时将处理后的所述实时人脸图片输入所述第二模型,将所述第一模型和所述第二模型的输出同时输入至所述训练后的LR模型,对所述目标对象的行为进行预测。
下面对上述步骤进行展开说明。
对于步骤S201,在一些基于终端交互的业务场景中,需要对用户的行为进行预测,根据预测的行为执行特定操作,此处的用户即为样本对象,也是需要进行行为预测的目标对象。
在进行终端交互时,可以进行文本交互、语音交互或视频交互,比如与机器人客服的问答形成的文本交互,智能语音对话机器人的问答形成的语音交互,AI审批机器人的面审形成的视频交互等,可以是在不同终端实现不同的交互类型,也可以是在同一个终端实现前述所有的交互类型。其中基于文本交互可以直接获得交互文本,基于语音交互和视频交互通过对录制的语音和视频进行语音识别和转换可以间接获得交互文本,这些直接或间接得到交互文本组成所述历史交互文本;而基于视频交互通过对交互过程的录制则可直接得到历史视频。
本实施例具体以信贷业务场景为例进行说明,在信贷业务场景中产生的数据可分为贷前数据、贷中数据和贷后数据,本步骤获取样本对象的历史交互文本和历史视频时,可根据实际情况仅获取贷前数据、贷中数据和贷后数据中的至少一种,这些贷前数据、贷中数据和贷后数据中包含交互文本和视频。
对于步骤S202,本实施例对所述历史交互文本进行处理得到对应的历史结构化数据具体为针对样本对象的每一条历史交互文本提取目标字段对应的数据,由此每个样本对象对应至少一条具有多个维度的初始的结构化数据。在一些实施例中,所述对所述历史交互文本进行处理得到对应的历史结构化数据包括:对所述历史交互文本进行处理得到至少一条结构化数据,并在得到多条结构化数据时,将所述多条结构化数据进行合并处理,再对合 并处理后的结构化数据进行数据处理得到历史行为,并将所述历史行为作为一个字段添加至合并处理后的结构化数据,生成所述历史结构化数据。其中数据处理包括无用数据剔除、数据转换和数据计算等,在进行数据处理后,最终的历史结构化数据相比于最初得到的历史结构化数据,其字段和维度将发生变化。
以信贷业务场景为例,样本对象为已贷款的客户,目标字段可为客户的I D、贷款本金(Loan_amount)、逾期/提前天数(DAY)、应还日期、实际还款日期等,每个目标字段代表结构化数据的一个维度,假如从客户的历史贷前数据、历史贷中数据和历史贷后数据中获取所述历史交互文本,得到如下初始的数据结构为:
表一
ID Loan_amount DAY 应还日期 实际还款日期
1 3*****.00   10月1日 10月3日
1      
2 客户说有2***.0 1.0 11月11日 11月10日
从上表一中可知ID为1的用户对应两条数据,将该两条数据的相同特征合并,得到合并后的结构化数据如下表二:
表二
ID Loan_amount DAY 应还日期 实际还款日期
1 3*****.00 10月1日 10月3日
2 客户说有2***.0 1.0 11月11日 11月10日
进一步对上述表二中合并后的结构化数据进行数据处理,例如将“客户说有”去除,将“二”转换为“2.0”,并基于“应还日期”和“实际还款日期”两个字段的数据计算得到客户的历史行为“当月是否逾期”这一字段的数据,得到最终的历史结构化数据如下表三:
表三
ID Loan_amount DAY 当月是否逾期
1 3*****.00 2.O 1
2 2*****.00 1.0 0
后续步骤将使用最终的历史结构化数据进行建模,建立结构化数据模型,即后续的第一模型,后文中将展开说明。需要说明的是,上述数据合并和数据处理的过程可以同步进行。
进一步地,在本实施例中,所述从所述历史视频中提取人脸图片并处理的步骤包括:采用逐帧截取的方式从历史视频中截取的样本对象的人脸图片并添加时间戳,对按照所述时间戳对各所述人脸图片进行排序,计算相邻的人脸图片的相似度,基于得到相似度对截取的人脸图片进行筛选,得到若干待标记的人脸图片;对所述人脸图片进行人脸关键点检测,从所述人脸关键点中选取眼部关键点,基于所述眼部关键点计算两眼的中心坐标,再基于所述中心坐标结合所述眼部关键点旋转所述人脸图片使人脸对齐,得到处理后的人脸图片。
具体的,进行相似图片剔除时,对每幅人脸图片按照其在历史视频的时间轴上所处的时间位置添加时间戳,排序计算相邻的人脸图片的相似度后,将排序最前的三幅图作为一组,这一组人脸图片可以得到两个相似度,比如五幅图排序为A、B、C、D、E,取最前三幅图A、B、C,可得到A和B的相似度、以及B和C的相似度,当两个相似度值均大于预设阈值时,则将位这一组人脸图片中位于中间的人脸图片剔除,比如A和B的相似度、以及B和C的相似度均大于预设阈值,则将B剔除,得到新的排序A、C、D、E,再取新的排序最前的三幅图,即A、C、D,若A和C的相似度、C和D的相似度两个中任意一个相似度小于预设设置,则保留最前的人脸图片A,再延后一位获取之后的三幅图C、D、E进行相似度判断,以此类推完成剩余人脸图片的筛选。其中,在进行相似度计算时,可通过获取待进行相似计算的两幅人脸图像的灰度像素的比较数组,通过比较数组获取两附图的汉明距离,进而通过汉明距离计算两幅图的相似度。当然,也可采用其他现有的方式计算两幅图的相似度。本实施例当得到的两个相似度值均大于预设阈值时才将人脸图片剔除,可以保留相似度接近的两幅人脸图片,在剔除相似图片的同时可以保证训练数据更加丰富,提高后续模型训练的准确度。
在本实施例中,上述实施例得到若干待标记的人脸图片中的人脸的往往角度不一,大部分人脸都是有不同程度的角度倾斜,故在标记之前要对待标记的人脸图片进行处理,具体为进行数据清洗和人脸角度倾斜的矫正,将所有人脸图片中的人脸截取出来,处理成统一的像素,并且使人脸角度统一。其中,具体人脸矫正的过程如下:
首先,检测人脸关键点;比如本实施例提取68个关键点。
其次,从所述人脸关键点中选取眼部关键点,所述眼部关键点用于作为旋转图片做人脸对齐的参照点;
记左眼关键点坐标:L=(x iL,y iL),…,i=1,2…6,      (1)
右眼关键点坐标:R=(x iR,y iR),…,i=1,2…6,      (2)
则由式(1)和式(2)计算得到左眼右眼的中心坐标分别如下:
Figure PCTCN2021090147-appb-000001
然后根据式(3)计算所,可以得到的两眼眼中心坐标连线与水平方向的夹角θ表示为:
Figure PCTCN2021090147-appb-000002
其中
Figure PCTCN2021090147-appb-000003
两眼中心坐标可由(3)计算得到:
Figure PCTCN2021090147-appb-000004
最后,根据式(1)至(4)得到的结果,以两眼中心坐标e center为基准,将整个人脸图片逆时针旋转θ,则得到了人脸角度矫正后的像素统一的人脸图片。
在进一步的实施例中,所述基于所述历史行为生成处理后的所述人脸图片的标签的步骤包括:根据所述人脸图片对应的历史视频确定所关联的样本对象,再根据所述关联的样本对象从对应的历史结构化数据中读取相应的历史行为生成所述人脸图片的标签,即此处人脸图片的标签是根据样本对象的历史结构化数据确定。以信贷业务场景为例,样本对象为已贷款的客户,根据历史视频对应的已经过审批的客户,可以将历史视频通过客户信息(比如客户ID)关联到其对应的历史结构化数据,则可基于历史结构化数据中历史行为对应的字段得到对应的人脸图片的标签,例如根据客户贷后历史结构化数据,可以针对人脸图片生成高风险客户标签,中风险逾期客户标签,低风险逾期客户标签和无风险逾期客户标签等。
对于步骤S203,在本实施例中,基于步骤S202得到的历史结构化数据和被标记的人脸图片分别构建模型。其中,根据所述历史结构化数据对预设的第一预测模型进行训练包括,从所述历史结构化数据中提取第一预测模型的输入变量和输出目标,例如信贷业务场 景样本对象为已贷款的客户时,输入变量可为历史结构化数据中的字段“Loan_amount”、“DAY”等,输出目标为历史结构化数据中的字段“当月是否逾期”,之后基于所述输入变量和输出目标进行模型训练。
其中,获取输入变量的过程包括变量筛选和重构,具体过程包括:利用决策树分箱,通过计算信息增益,对每个单个自变量划分生成决策树,根据分箱结果,将变量IV值做降序排序,从IV值最大的自变量开始训练第一预测模型,每次加入一个新的自变量训练模型,直到模型的AUC值达到最大值不再变化,则此时的自变量即为输入变量。例如,第一次取IV值最大的自变量x 1训练模型,AUC值是0.8,第二次选择IV值最大的前两个自变量x 1和x 2训练模型,AUC值是0.81,以此类推,当选择前m个自变量x 1,…,x m训练模型,AUC值为0.9,选择前m+1个自变量x 1,…,x m+1训练模型,AUC值为0.8999,则选择自变量x1,…,x m作为输入变量。
在本实施例中所述第一预测模型为分类模型,具体为xgboost模型,在基于所述输入变量和输出目标进行模型训练的过程中,首先将输入变量对应的数据切分为训练集和验证集,将训练集输入xgboost模型进行训练,再通过验证集验证训练后的模型效果,当模型效果达到预设条件时完成模型训练,得到所述第一模型,根据xgboost模型最后一步sigmoid可将当前样本对象对应的输出转为概率值,该概率值是指模型输出为历史行为的概率值,即所述第一模型输出值。
进一步地,本实施例的第二预测模型采用Resnet_100模型,根据所述人脸图片和所述标签对预设的第二预测模型进行训练包括,将处理好的人脸图片的人脸关键点数据和对应的标签数据切分为训练集和验证集,之后将训练集输入Resnet_100模型,再通过验证集验证训练后的模型效果,当模型效果达到预设条件时完成模型训练,得到所述第二模型,所述第二模型最后输出层接入softmax输出其对应的概率值,即所述第二模型输出值。
对于步骤S204,在本实施例中,将所述第一模型输出值和所述第二模型输出值作为预测的两个维度进行模型构建,例如信贷业务场景样本对象为已贷款的客户时,这两个维度一个为历史结构化数据逾期概率值,另一个为人脸特征逾期概率值,基于两种概率值建立LR模型拟合客户是否逾期标签,通过选取的十折交叉验证集合,得到两种概率值的权重,完成LR模型的构建。例如:记某一客户的历史结构化数据逾期概率值为x,其人脸特征逾期概率值为y,其对应的逾期标签L,则建模的方式是通过LR模型找到历史结构化数据逾期概率值的权重w1和人脸特征逾期概率值的权重w2使得f(w1*x+w2*y)=L,具体通过使LR模型的损失函数最小化,得到对应的权重w1和w2,完成模型构建。
对于步骤S205,上述步骤S201至S204为构建预测模型的过程。在本步骤基于构建的预测模型对目标对象的行为进行预测,执行本步骤的触发点为目标对象进行文本和视频交互,如客户在进行视频审批面谈期间使用。当获取到目标对象的实时交互文本和实时视频时,根据实时交互文本获取对应的实时结构化数据,并根据实时视频获取实时人脸图片并处理的过程与步骤S202的历史结构化数据和人脸图片的获取处理过程相同。
具体的,以信贷业务场景为例,目标对象为已贷款的客户或申请贷款的客户,以申请贷款的客户为例,预测申请贷款的客户在申请成功后是否会发生逾期行为的过程如下:根据客户在申请贷款时候填写的当前贷前数据(实时交互文本)以及面谈时的实时视频,由实时视频截图可得到人脸图片数据,将数据处理成所述第一模型和第二模型的输入格式之后,分别输入到第一模型和第二模型,分别得到对应的两个逾期概率值,最后将两种概率值输入训练好的LR模型得出客户后期是否有逾期倾向。
进一步地,在得到所述训练后的LR模型后,所述训练后的LR模型将输出所述第一模型输出值和所述第二模型输出值的权重。在所述对所述目标对象的行为进行预测后,所述方法还包括:使所述训练后的LR模型输出所述第一模型输出值和所述第二模型输出值的更新权重,基于所述更新权重获取预设信息,将预设信息发送至目标对象,以使目标对象根据所述预设信息进行新的实时文本交互和/或实时视频交互,通过获取新的实时交互 文本和/或实时视频来更新所述目标对象的行为预测结果。
具体的,以信贷业务场景为例,预测申请贷款的客户在申请成功后是否会发生逾期行为,当通过训练好的LR模型得出客户后期是否有逾期倾向之后,还可输出上述两种概率值的更新权重,基于所述更新权重获取预设信息,将预设信息发送至目标对象,以使目标对象根据所述预设信息进行新的文本交互和视频交互,通过获取新的交互文本和视频来更新预测。具体的,在信贷业务场景中,将两种概率值输入训练好的LR模型得出客户后期是否有逾期倾向的同时,还会得到两种概率值的更新权重,比如预测客户存在逾期风险,AI审批机器人基于该预期风险给出有针对性的话术和对话流程(即所述预设信息),采用指定的话术进一步预测此类客户的表现,如进一步询问更细节的问题等。假如根据历史数据训练好模型f(w1*x+w2*y)=L,可以得到W1=0.4,W2=0.6,将两种概率值输入LR模型后,得到该客户会发生逾期,则可以得出此客户逾期的行为,该逾期行为40%的原因与交互文本中的数据相关,60%的原因与人脸视频审批过程客户的脸部特征相关,在选择后续审批的话术时,会认为客户在审批面谈过程中,由于脸部特征对后续逾期的影响更大,因此后续审批流程会选择60%风险客户对应的审批话术A,主要考虑客户对一些随机应变的问题。另一种情况是交互文本导致的原因权重过大时,可能由于客户资质问题,因此在后续审批流程后会选择对应的审批话术B。以此来更精准根据客户多模态行为实现灵活的审批话术和流程,进而进行预测更新,预测实时性和准确度更高。
进一步地,在所述得到所述训练后的LR模型的步骤后,所述方法还包括:记录已有的对象和新增对象的新的交互文本、视频以及行为,根据所述新交互文本、视频以及行为更新历史数据,将更新的历史数据处理成对应的数据格式后对所述第一模型、所述第二模型和所述训练后的LR模型进行训练优化。具体的,本实施例可对上述构建的模型进行优化,具体通过记录已有对象和新增对象面谈审批后的新的交互文本、视频、行为来更新历史数据,并将更新的历史数据处理成对应的数据格式后作为优化算法的输入数据,以优化模型。这些新增的历史数据一部分是已有对象的结构化数据,比如信贷业务中客户的贷前行为数据,另一部分是增量对象的数据,所以基于结构化数据的所述第一模型输出的概率值分布可能会随着时间变化的,随着新增数据的变化可能会导致一些旧变量失效,同时会有新变量加入,在加入新数据不断优化模型时,需要重新筛选变量来执行以上模型训练过程。
本申请提供的基于人脸和交互文本的目标对象行为预测方法,通过分别获取人脸图片和交互文本分别训练对应的模型,输出基于人脸图片和交互文本得到历史行为的概率,再采用联合概率建立LR模型,之后再通过建立的LR模型进行预测,可以在大数据集场景中大大提高预测结果的准确性。
需要强调的是,为进一步保证信息的私密和安全性,在所述获取多个样本对象的历史交互文本和历史视频的步骤中,将从区块链的节点中读取所述历史交互文本中的隐私数据。本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、计算机可读指令来指令、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来 执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令来指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
进一步参考图3,作为对上述图2所示方法的实现,本申请提供了一种基于人脸和交互文本的目标对象行为预测装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图3所示,本实施例所述的基于人脸和交互文本的目标对象行为预测装置包括:数据获取模块301、数据处理模块302、模型构建模块303以及预测模块304。
其中,所述数据获取模块301用于获取多个样本对象的历史交互文本和历史视频。
所述数据处理模块302用于对所述历史交互文本进行处理得到对应的历史结构化数据,所述历史结构化数据包含所述样本对象的历史行为,从所述历史视频中提取人脸图片并处理,基于所述历史行为生成处理后的人脸图片的标签。
所述模型构建模块303用于根据所述历史结构化数据对预设的第一预测模型进行训练,得到第一模型和第一模型输出值,并根据所述处理后的人脸图片和所述标签对预设的第二预测模型进行训练,得到第二模型和第二模型输出值;其中所述第一模型输出值和所述第二模型输出值分别为对应的模型输出所述历史行为的概率值;以及根据所述第一模型输出值和所述第二模型输出值建立LR模型拟合所述历史行为,得到训练后的LR模型。
所述预测模块304用于当获取到目标对象的实时交互文本和实时视频时,使所述数据处理模块301根据所述实时交互文本获取对应的实时结构化数据,并使所述数据处理模块302根据所述实时视频获取实时人脸图片并处理,再将所述实时结构化数据输入所述第一模型,同时将处理后的所述实时人脸图片输入所述第二模型,将所述第一模型和所述第二模型的输出同时输入至所述训练后的LR模型,对所述目标对象的行为进行预测。
在一些基于终端交互的业务场景中,需要对用户的行为进行预测,根据预测的行为执行特定操作,此处的用户即为样本对象,也是需要进行行为预测的目标对象。在进行终端交互时,可以进行文本交互、语音交互或视频交互,比如与机器人客服的问答形成的文本交互,智能语音对话机器人的问答形成的语音交互,AI审批机器人的面审形成的视频交互等,可以是在不同终端实现不同的交互类型,也可以是在同一个终端实现前述所有的交互类型。其中基于文本交互可以直接获得交互文本,基于语音交互和视频交互通过对录制的语音和视频进行语音识别和转换可以间接获得交互文本,这些直接或间接得到交互文本组成所述历史交互文本;而基于视频交互通过对交互过程的录制则可直接得到历史视频。
本实施例所述数据处理模块302对所述历史交互文本进行处理得到对应的历史结构化数据具体为针对样本对象的每一条历史交互文本提取目标字段对应的数据,由此每个样本对象对应至少一条具有多个维度的初始的结构化数据。在一些实施例中,所述数据处理模块302对所述历史交互文本进行处理得到对应的历史结构化数据时,具体用于:对所述历史交互文本进行处理得到至少一条结构化数据,并在得到多条结构化数据时,将所述多条结构化数据进行合并处理,再对合并处理后的结构化数据进行数据处理得到历史行为,并 将所述历史行为作为一个字段添加至合并处理后的结构化数据,生成所述历史结构化数据。其中数据处理包括无用数据剔除、数据转换和数据计算等,在进行数据处理后,最终的历史结构化数据相比于最初得到的历史结构化数据,其字段和维度将发生变化。具体可参考上述方法实施例,此处不作展开。
进一步地,在本实施例中,所述数据处理模块302从所述历史视频中提取人脸图片并处理时,具体用于:采用逐帧截取的方式从历史视频中截取的样本对象的人脸图片并添加时间戳,对按照所述时间戳对各所述人脸图片进行排序,计算相邻的人脸图片的相似度,基于得到相似度对截取的人脸图片进行筛选,得到若干待标记的人脸图片;对所述人脸图片进行人脸关键点检测,从所述人脸关键点中选取眼部关键点,基于所述眼部关键点计算两眼的中心坐标,再基于所述中心坐标结合所述眼部关键点旋转所述人脸图片使人脸对齐,得到处理后的人脸图片。具体可参考上述方法实施例,此处不作展开。
在进一步的实施例中,所述数据处理模块302在基于所述历史行为生成处理后的所述人脸图片的标签时,具体用于:根据所述人脸图片对应的历史视频确定所关联的样本对象,再根据所述关联的样本对象从对应的历史结构化数据中读取相应的历史行为生成所述人脸图片的标签,即此处人脸图片的标签是根据样本对象的历史结构化数据确定。具体可参考上述方法实施例,此处不作展开。
在本实施例中,所述模型构建模块303根据所述历史结构化数据对预设的第一预测模型进行训练包括,从所述历史结构化数据中提取第一预测模型的输入变量和输出目标,之后基于所述输入变量和输出目标进行模型训练。其中,获取输入变量的过程包括变量筛选和重构,具体可参考上述方法实施例,此处不作展开。
在本实施例中所述第一预测模型为分类模型,具体为xgboost模型,本实施例的第二预测模型采用Resnet_100模型,具体可参考上述方法实施例,此处不作展开。
进一步地,在所述模型构建模块303得到所述训练后的LR模型后,所述训练后的LR模型将输出所述第一模型输出值和所述第二模型输出值的权重。所述预测模块304在对所述目标对象的行为进行预测后,还用于使所述训练后的LR模型输出所述第一模型输出值和所述第二模型输出值的更新权重,基于所述更新权重获取预设信息,将预设信息发送至目标对象,以使目标对象根据所述预设信息进行新的实时文本交互和/或实时视频交互,通过获取新的实时交互文本和/或实时视频来更新所述目标对象的行为预测结果。具体可参考上述方法实施例,此处不作展开。
所述模型构建模块303还用于在得到所述训练后的LR模型后,记录已有的对象和新增对象的新的交互文本、视频以及行为,根据所述新交互文本、视频以及行为更新历史数据,将更新的历史数据处理成对应的数据格式后对所述第一模型、所述第二模型和所述训练后的LR模型进行训练优化。具体可参考上述方法实施例,此处不作展开。
本申请提供的基于人脸和交互文本的目标对象行为预测装置,通过分别获取人脸图片和交互文本分别训练对应的模型,输出基于人脸图片和交互文本得到历史行为的概率,再采用联合概率建立LR模型,之后再通过建立的LR模型进行预测,可以在大数据集场景中大大提高预测结果的准确性。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43,所述存储器41中存储有计算机可读指令,所述处理器42执行所述计算机可读指令时实现上述方法实施例中所述的基于人脸和交互文本的目标对象行为预测方法的步骤,并具有与上述基于人脸和交互文本的目标对象行为预测方法相对应的有益效果,在此不作展开。
需要指出的是,图中仅示出了具有存储器41、处理器42、网络接口43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或 存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
在本实施例中,所述存储器41至少包括一种类型的可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,具体的,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如对应于上述基于人脸和交互文本的目标对象行为预测方法的计算机可读指令等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的计算机可读指令或者处理数据,例如运行对应于所述基于人脸和交互文本的目标对象行为预测方法的计算机可读指令。
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于人脸和交互文本的目标对象行为预测方法的步骤,并具有与上述基于人脸和交互文本的目标对象行为预测方法相对应的有益效果,在此不作展开。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术实施例本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术实施例进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。

Claims (20)

  1. 一种基于人脸和交互文本的目标对象行为预测方法,包括下述步骤:
    获取多个样本对象的历史交互文本和历史视频;
    对所述历史交互文本进行处理得到对应的历史结构化数据,所述历史结构化数据包含所述样本对象的历史行为,从所述历史视频中提取人脸图片并处理,基于所述历史行为生成处理后的人脸图片的标签;
    根据所述历史结构化数据对预设的第一预测模型进行训练,得到第一模型和第一模型输出值,并根据所述处理后的人脸图片和所述标签对预设的第二预测模型进行训练,得到第二模型和第二模型输出值;其中所述第一模型输出值和所述第二模型输出值分别为对应的模型输出所述历史行为的概率值;
    根据所述第一模型输出值和所述第二模型输出值建立LR模型拟合所述历史行为,得到训练后的LR模型;
    当获取到目标对象的实时交互文本和实时视频时,根据所述实时交互文本获取对应的实时结构化数据,并根据所述实时视频获取实时人脸图片并处理,将所述实时结构化数据输入所述第一模型,同时将处理后的所述实时人脸图片输入所述第二模型,将所述第一模型和所述第二模型的输出同时输入至所述训练后的LR模型,对所述目标对象的行为进行预测。
  2. 根据权利要求1所述的基于人脸和交互文本的目标对象行为预测方法,其中,所述对所述历史交互文本进行处理得到对应的历史结构化数据包括:
    对所述历史交互文本进行处理得到至少一条结构化数据,并在得到多条结构化数据时,将所述多条结构化数据进行合并处理,再对合并处理后的结构化数据进行数据处理得到历史行为,并将所述历史行为作为一个字段添加至合并处理后的结构化数据,生成所述历史结构化数据。
  3. 根据权利要求2所述的基于人脸和交互文本的目标对象行为预测方法,其中,所述基于所述历史行为生成处理后的所述人脸图片的标签的步骤包括:
    根据所述人脸图片对应的历史视频确定所关联的样本对象,再根据所述关联的样本对象从对应的历史结构化数据中读取相应的历史行为生成所述人脸图片的标签。
  4. 根据权利要求3所述的基于人脸和交互文本的目标对象行为预测方法,其中,所述从所述历史视频中提取人脸图片并处理的步骤包括:
    采用逐帧截取的方式从历史视频中截取的样本对象的人脸图片并添加时间戳,对按照所述时间戳对各所述人脸图片进行排序,计算相邻的人脸图片的相似度,基于得到相似度对截取的人脸图片进行筛选,得到若干待标记的人脸图片;对所述人脸图片进行人脸关键点检测,从所述人脸关键点中选取眼部关键点,基于所述眼部关键点计算两眼的中心坐标,再基于所述中心坐标结合所述眼部关键点旋转所述人脸图片使人脸对齐,得到处理后的人脸图片。
  5. 根据权利要求1至4任一项所述的基于人脸和交互文本的目标对象行为预测方法,其中,在得到所述训练后的LR模型后,所述训练后的LR模型将输出所述第一模型输出值和所述第二模型输出值的权重;
    在所述对所述目标对象的行为进行预测后,所述方法还包括:
    使所述训练后的LR模型输出所述第一模型输出值和所述第二模型输出值的更新权重,基于所述更新权重获取预设信息,将预设信息发送至目标对象,以使目标对象根据所述预设信息进行新的实时文本交互和/或实时视频交互,通过获取新的实时交互文本和/或实时视频来更新所述目标对象的行为预测结果。
  6. 根据权利要求1至4任一项所述的基于人脸和交互文本的目标对象行为预测方法,其中,在所述得到所述训练后的LR模型的步骤后,所述方法还包括:
    记录已有的对象和新增对象的新的交互文本、视频以及行为,根据所述新交互文本、视频以及行为更新历史数据,将更新的历史数据处理成对应的数据格式后对所述第一模型、 所述第二模型和所述训练后的LR模型进行训练优化。
  7. 根据权利要求1至4任一项所述的基于人脸和交互文本的目标对象行为预测方法,其中,在所述获取多个样本对象的历史交互文本和历史视频的步骤中,所述方法还包括:
    将从区块链中读取所述历史交互文本中的隐私数据。
  8. 一种基于人脸和交互文本的目标对象行为预测装置,包括:
    数据获取模块,用于获取多个样本对象的历史交互文本和历史视频;
    数据处理模块,用于对所述历史交互文本进行处理得到对应的历史结构化数据,所述历史结构化数据包含所述样本对象的历史行为,从所述历史视频中提取人脸图片并处理,基于所述历史行为生成处理后的人脸图片的标签;
    模型构建模块,用于根据所述历史结构化数据对预设的第一预测模型进行训练,得到第一模型和第一模型输出值,并根据所述处理后的人脸图片和所述标签对预设的第二预测模型进行训练,得到第二模型和第二模型输出值;其中所述第一模型输出值和所述第二模型输出值分别为对应的模型输出所述历史行为的概率值;以及根据所述第一模型输出值和所述第二模型输出值建立LR模型拟合所述历史行为,得到训练后的LR模型;
    预测模块,用于当获取到目标对象的实时交互文本和实时视频时,使所述数据处理模块根据所述实时交互文本获取对应的实时结构化数据,并使所述数据处理模块根据所述实时视频获取实时人脸图片并处理,再将所述实时结构化数据输入所述第一模型,同时将处理后的所述实时人脸图片输入所述第二模型,将所述第一模型和所述第二模型的输出同时输入至所述训练后的LR模型,对所述目标对象的行为进行预测。
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下的步骤:
    获取多个样本对象的历史交互文本和历史视频;
    对所述历史交互文本进行处理得到对应的历史结构化数据,所述历史结构化数据包含所述样本对象的历史行为,从所述历史视频中提取人脸图片并处理,基于所述历史行为生成处理后的人脸图片的标签;
    根据所述历史结构化数据对预设的第一预测模型进行训练,得到第一模型和第一模型输出值,并根据所述处理后的人脸图片和所述标签对预设的第二预测模型进行训练,得到第二模型和第二模型输出值;其中所述第一模型输出值和所述第二模型输出值分别为对应的模型输出所述历史行为的概率值;
    根据所述第一模型输出值和所述第二模型输出值建立LR模型拟合所述历史行为,得到训练后的LR模型;
    当获取到目标对象的实时交互文本和实时视频时,根据所述实时交互文本获取对应的实时结构化数据,并根据所述实时视频获取实时人脸图片并处理,将所述实时结构化数据输入所述第一模型,同时将处理后的所述实时人脸图片输入所述第二模型,将所述第一模型和所述第二模型的输出同时输入至所述训练后的LR模型,对所述目标对象的行为进行预测。
  10. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令实现所述对所述历史交互文本进行处理得到对应的历史结构化数据的步骤时,具体实现如下步骤:
    对所述历史交互文本进行处理得到至少一条结构化数据,并在得到多条结构化数据时,将所述多条结构化数据进行合并处理,再对合并处理后的结构化数据进行数据处理得到历史行为,并将所述历史行为作为一个字段添加至合并处理后的结构化数据,生成所述历史结构化数据。
  11. 根据权利要求10所述的计算机设备,其中,所述处理器执行所述计算机可读指令实现所述基于所述历史行为生成处理后的所述人脸图片的标签的步骤时,具体实现如下步骤:
    根据所述人脸图片对应的历史视频确定所关联的样本对象,再根据所述关联的样本对象从对应的历史结构化数据中读取相应的历史行为生成所述人脸图片的标签。
  12. 根据权利要求11所述的计算机设备,其中,所述处理器执行所述计算机可读指令实现所述从所述历史视频中提取人脸图片并处理的步骤时,具体实现如下步骤:
    采用逐帧截取的方式从历史视频中截取的样本对象的人脸图片并添加时间戳,对按照所述时间戳对各所述人脸图片进行排序,计算相邻的人脸图片的相似度,基于得到相似度对截取的人脸图片进行筛选,得到若干待标记的人脸图片;对所述人脸图片进行人脸关键点检测,从所述人脸关键点中选取眼部关键点,基于所述眼部关键点计算两眼的中心坐标,再基于所述中心坐标结合所述眼部关键点旋转所述人脸图片使人脸对齐,得到处理后的人脸图片。
  13. 根据权利要求9至12任一项所述的计算机设备,其中,在得到所述训练后的LR模型后,所述训练后的LR模型将输出所述第一模型输出值和所述第二模型输出值的权重;
    在所述对所述目标对象的行为进行预测后,所述处理器执行所述计算机可读指令时还实现如下的步骤:
    使所述训练后的LR模型输出所述第一模型输出值和所述第二模型输出值的更新权重,基于所述更新权重获取预设信息,将预设信息发送至目标对象,以使目标对象根据所述预设信息进行新的实时文本交互和/或实时视频交互,通过获取新的实时交互文本和/或实时视频来更新所述目标对象的行为预测结果。
  14. 根据权利要求9至12任一项所述的计算机设备,其中,在所述处理器执行所述计算机可读指令实现所述得到所述训练后的LR模型的步骤后,所述处理器执行所述计算机可读指令时还实现如下的步骤:
    记录已有的对象和新增对象的新的交互文本、视频以及行为,根据所述新交互文本、视频以及行为更新历史数据,将更新的历史数据处理成对应的数据格式后对所述第一模型、所述第二模型和所述训练后的LR模型进行训练优化。
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器执行如下步骤:
    获取多个样本对象的历史交互文本和历史视频;
    对所述历史交互文本进行处理得到对应的历史结构化数据,所述历史结构化数据包含所述样本对象的历史行为,从所述历史视频中提取人脸图片并处理,基于所述历史行为生成处理后的人脸图片的标签;
    根据所述历史结构化数据对预设的第一预测模型进行训练,得到第一模型和第一模型输出值,并根据所述处理后的人脸图片和所述标签对预设的第二预测模型进行训练,得到第二模型和第二模型输出值;其中所述第一模型输出值和所述第二模型输出值分别为对应的模型输出所述历史行为的概率值;
    根据所述第一模型输出值和所述第二模型输出值建立LR模型拟合所述历史行为,得到训练后的LR模型;
    当获取到目标对象的实时交互文本和实时视频时,根据所述实时交互文本获取对应的实时结构化数据,并根据所述实时视频获取实时人脸图片并处理,将所述实时结构化数据输入所述第一模型,同时将处理后的所述实时人脸图片输入所述第二模型,将所述第一模型和所述第二模型的输出同时输入至所述训练后的LR模型,对所述目标对象的行为进行预测。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器执行所述对所述历史交互文本进行处理得到对应的历史结构化数据的步骤时,具体执行如下步骤:
    对所述历史交互文本进行处理得到至少一条结构化数据,并在得到多条结构化数据时,将所述多条结构化数据进行合并处理,再对合并处理后的结构化数据进行数据处理得到历 史行为,并将所述历史行为作为一个字段添加至合并处理后的结构化数据,生成所述历史结构化数据。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器执行所述基于所述历史行为生成处理后的所述人脸图片的标签的步骤时,具体执行如下步骤:
    根据所述人脸图片对应的历史视频确定所关联的样本对象,再根据所述关联的样本对象从对应的历史结构化数据中读取相应的历史行为生成所述人脸图片的标签。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器执行所述从所述历史视频中提取人脸图片并处理的步骤时,具体执行如下步骤:
    采用逐帧截取的方式从历史视频中截取的样本对象的人脸图片并添加时间戳,对按照所述时间戳对各所述人脸图片进行排序,计算相邻的人脸图片的相似度,基于得到相似度对截取的人脸图片进行筛选,得到若干待标记的人脸图片;对所述人脸图片进行人脸关键点检测,从所述人脸关键点中选取眼部关键点,基于所述眼部关键点计算两眼的中心坐标,再基于所述中心坐标结合所述眼部关键点旋转所述人脸图片使人脸对齐,得到处理后的人脸图片。
  19. 根据权利要求15至18任一项所述的计算机可读存储介质,其中,在得到所述训练后的LR模型后,所述训练后的LR模型将输出所述第一模型输出值和所述第二模型输出值的权重;
    在所述对所述目标对象的行为进行预测后,所述计算机可读指令被所述处理器执行,使得所述处理器还执行如下步骤:
    使所述训练后的LR模型输出所述第一模型输出值和所述第二模型输出值的更新权重,基于所述更新权重获取预设信息,将预设信息发送至目标对象,以使目标对象根据所述预设信息进行新的实时文本交互和/或实时视频交互,通过获取新的实时交互文本和/或实时视频来更新所述目标对象的行为预测结果。
  20. 根据权利要求15至18任一项所述的计算机可读存储介质,其中,在所述得到所述训练后的LR模型的步骤后,所述计算机可读指令被所述处理器执行,使得所述处理器还执行如下步骤:
    记录已有的对象和新增对象的新的交互文本、视频以及行为,根据所述新交互文本、视频以及行为更新历史数据,将更新的历史数据处理成对应的数据格式后对所述第一模型、所述第二模型和所述训练后的LR模型进行训练优化。
PCT/CN2021/090147 2021-01-22 2021-04-27 基于人脸和交互文本的目标对象行为预测方法及相关设备 WO2022156084A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110090632.1A CN112861662B (zh) 2021-01-22 2021-01-22 基于人脸和交互文本的目标对象行为预测方法及相关设备
CN202110090632.1 2021-01-22

Publications (1)

Publication Number Publication Date
WO2022156084A1 true WO2022156084A1 (zh) 2022-07-28

Family

ID=76008080

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090147 WO2022156084A1 (zh) 2021-01-22 2021-04-27 基于人脸和交互文本的目标对象行为预测方法及相关设备

Country Status (2)

Country Link
CN (1) CN112861662B (zh)
WO (1) WO2022156084A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359341A (zh) * 2022-08-19 2022-11-18 无锡物联网创新中心有限公司 一种模型更新方法、装置、设备及介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254644B (zh) * 2021-06-07 2021-09-17 成都数之联科技有限公司 模型训练方法及非投诉工单处理方法及系统及装置及介质
CN113435998B (zh) * 2021-06-23 2023-05-02 平安科技(深圳)有限公司 贷款逾期预测方法、装置、电子设备及存储介质
CN113836996B (zh) * 2021-08-10 2024-02-02 中国地质大学(武汉) 高光谱遥感图像部分迁移方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886770A (zh) * 2017-03-07 2017-06-23 佛山市融信通企业咨询服务有限公司 一种视频通讯情感分析辅助方法
CN109787881A (zh) * 2018-12-26 2019-05-21 广州灵聚信息科技有限公司 一种具有预测功能的对话方法和装置
CN110020939A (zh) * 2019-03-01 2019-07-16 平安科技(深圳)有限公司 建立违约损失率预测模型的装置、方法及存储介质
CN110852368A (zh) * 2019-11-05 2020-02-28 南京邮电大学 全局与局部特征嵌入及图文融合的情感分析方法与系统
CN111612284A (zh) * 2019-02-25 2020-09-01 阿里巴巴集团控股有限公司 数据的处理方法、装置及设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543516A (zh) * 2018-10-16 2019-03-29 深圳壹账通智能科技有限公司 签约意向判断方法、装置、计算机设备和存储介质
CN112182118B (zh) * 2020-09-29 2023-12-05 中国平安人寿保险股份有限公司 基于多数据源的目标对象预测方法及其相关设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886770A (zh) * 2017-03-07 2017-06-23 佛山市融信通企业咨询服务有限公司 一种视频通讯情感分析辅助方法
CN109787881A (zh) * 2018-12-26 2019-05-21 广州灵聚信息科技有限公司 一种具有预测功能的对话方法和装置
CN111612284A (zh) * 2019-02-25 2020-09-01 阿里巴巴集团控股有限公司 数据的处理方法、装置及设备
CN110020939A (zh) * 2019-03-01 2019-07-16 平安科技(深圳)有限公司 建立违约损失率预测模型的装置、方法及存储介质
CN110852368A (zh) * 2019-11-05 2020-02-28 南京邮电大学 全局与局部特征嵌入及图文融合的情感分析方法与系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359341A (zh) * 2022-08-19 2022-11-18 无锡物联网创新中心有限公司 一种模型更新方法、装置、设备及介质
CN115359341B (zh) * 2022-08-19 2023-11-17 无锡物联网创新中心有限公司 一种模型更新方法、装置、设备及介质

Also Published As

Publication number Publication date
CN112861662A (zh) 2021-05-28
CN112861662B (zh) 2023-09-01

Similar Documents

Publication Publication Date Title
WO2022156084A1 (zh) 基于人脸和交互文本的目标对象行为预测方法及相关设备
CN107680019B (zh) 一种考试方案的实现方法、装置、设备及存储介质
WO2021120677A1 (zh) 一种仓储模型训练方法、装置、计算机设备及存储介质
CN112148987B (zh) 基于目标对象活跃度的消息推送方法及相关设备
EP3627759A1 (en) Method and apparatus for encrypting data, method and apparatus for training machine learning model, and electronic device
WO2021155713A1 (zh) 基于权重嫁接的模型融合的人脸识别方法及相关设备
WO2022174491A1 (zh) 基于人工智能的病历质控方法、装置、计算机设备及存储介质
CN112766649B (zh) 基于多评分卡融合的目标对象评价方法及其相关设备
CN113127633B (zh) 智能会议管理方法、装置、计算机设备及存储介质
CN112395979B (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
CN112035549B (zh) 数据挖掘方法、装置、计算机设备及存储介质
CN110750523A (zh) 数据标注方法、系统、计算机设备和存储介质
CN112488163A (zh) 一种异常账号的识别方法、装置、计算机设备及存储介质
CN114398466A (zh) 基于语义识别的投诉分析方法、装置、计算机设备及介质
CN116776150A (zh) 接口异常访问识别方法、装置、计算机设备及存储介质
CN116704528A (zh) 票据识别核验方法、装置、计算机设备及存储介质
CN116563034A (zh) 基于人工智能的购买预测方法、装置、设备及存储介质
CN115618859A (zh) 一种基于大数据的研效管理方法、装置、设备及存储介质
WO2022142032A1 (zh) 手写签名校验方法、装置、计算机设备及存储介质
CN115713424A (zh) 风险评估方法、风险评估装置、设备及存储介质
CN113936677A (zh) 音色转换方法、装置、计算机设备及存储介质
CN113726785B (zh) 网络入侵检测方法、装置、计算机设备以及存储介质
CN113792342B (zh) 一种脱敏数据还原的方法、装置、计算机设备及存储介质
CN117876137A (zh) 财务状况确定方法、装置及计算设备
WO2022011947A1 (zh) 一种交易数据的处理方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920470

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21920470

Country of ref document: EP

Kind code of ref document: A1