WO2022016977A1 - 线上实时数据交互方法、装置、电子设备及存储介质 - Google Patents

线上实时数据交互方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022016977A1
WO2022016977A1 PCT/CN2021/095009 CN2021095009W WO2022016977A1 WO 2022016977 A1 WO2022016977 A1 WO 2022016977A1 CN 2021095009 W CN2021095009 W CN 2021095009W WO 2022016977 A1 WO2022016977 A1 WO 2022016977A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
video data
feature
image
data
Prior art date
Application number
PCT/CN2021/095009
Other languages
English (en)
French (fr)
Inventor
邹洪伟
Original Assignee
平安国际智慧城市科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安国际智慧城市科技股份有限公司 filed Critical 平安国际智慧城市科技股份有限公司
Publication of WO2022016977A1 publication Critical patent/WO2022016977A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied

Definitions

  • the present application relates to big data processing, and in particular, to an online real-time data interaction method, device, electronic device and storage medium.
  • the inventor realizes that with the current online learning method, students learn online course videos, teachers cannot know the learning environment of students, nor can they grasp the learning status of students in time. Usually, they can only understand the learning of students through post-event feedback mechanisms such as questionnaires and tests. Therefore, the playback strategy of the course video cannot be adjusted in time according to the learning situation of the students, resulting in low learning efficiency of the students. Therefore, an online real-time data interaction method is urgently needed to dynamically adjust the video playback strategy and improve the efficiency of online real-time data interaction.
  • An online real-time data interaction method comprising:
  • the first video data is played, the third video data of the user is acquired in real time, and feature processing is performed on the third video data to obtain the User's feature sequence;
  • the feature sequence is input into an expression recognition model to obtain a target expression category of the user, and a playback strategy of the first video data is dynamically adjusted according to the target expression category.
  • An online real-time data interaction device includes:
  • a request module configured to respond to a data interaction request sent by a user based on a client, parse the request, and obtain an identifier of the first video data corresponding to the request;
  • a judgment module configured to collect the second video data and audio data of the user within a preset time period, and determine whether the user environment where the user is located meets the preset requirements according to the second video data and audio data;
  • a playback module configured to play the first video data when judging that the user environment where the user is located meets the preset requirements, acquire the third video data of the user in real time, and execute a feature on the third video data processing to obtain the feature sequence of the user;
  • An adjustment module configured to input the feature sequence into an expression recognition model to obtain a target expression category of the user, and dynamically adjust the playback strategy of the first video data according to the target expression category.
  • An electronic device comprising:
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the following steps:
  • the first video data is played, the third video data of the user is acquired in real time, and feature processing is performed on the third video data to obtain the User's feature sequence;
  • the feature sequence is input into an expression recognition model to obtain a target expression category of the user, and a playback strategy of the first video data is dynamically adjusted according to the target expression category.
  • the first video data is played, the third video data of the user is acquired in real time, and feature processing is performed on the third video data to obtain the User's feature sequence;
  • the feature sequence is input into an expression recognition model to obtain a target expression category of the user, and a playback strategy of the first video data is dynamically adjusted according to the target expression category.
  • the present application improves the efficiency of online real-time data interaction.
  • FIG. 1 is a schematic flowchart of an online real-time data interaction method provided by an embodiment of the present application
  • FIG. 2 is a schematic block diagram of an online real-time data interaction apparatus provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an electronic device for implementing a method for online real-time data interaction provided by an embodiment of the present application
  • the present application provides an online real-time data interaction method.
  • FIG. 1 a schematic flowchart of an online real-time data interaction method provided by an embodiment of the present application is shown.
  • the method may be performed by an electronic device, which may be implemented by software and/or hardware.
  • the online real-time data interaction method includes:
  • S2 Collect second video data and audio data of the user within a preset time period, and determine whether the user environment where the user is located meets preset requirements according to the second video data and audio data.
  • the data interaction request is an online learning request
  • the request includes an identifier of a course video to be learned.
  • collect the user's audio and video data for 5 to 15 seconds to confirm whether the user's learning environment meets the learning requirements.
  • video data is collected through a camera
  • audio data is collected through a microphone.
  • the determining according to the second video data and audio data whether the user environment where the user is located meets the preset requirements includes:
  • A1. Determine whether the user environment where the user is located has changed according to the second video data
  • A2. Determine whether there is noise in the user environment where the user is located according to the audio data
  • the determining whether the user environment where the user is located according to the second video data has changed includes:
  • the images in the first image sequence are encoded, for example, in the order of natural numbers from small to large. Assuming that there are 10 images in the first image sequence, the encodings are 1, 2, 3...10, respectively. Then, starting from the first image in the first image sequence, two images encoded as adjacent odd and even numbers are regarded as one image group, that is, the two images corresponding to encoding 1 and encoding 2 are one image group, encoding 3 and encoding The two images corresponding to 4 are an image group, and a total of 5 image groups can be obtained.
  • any two adjacent images in the first image sequence can also be used as an image group, for example, the two images corresponding to code 1 and code 2 are a group of images, and the two images corresponding to code 2 and code 3 Two pictures are a picture group, and the two pictures corresponding to code 3 and code 4 are a picture group.
  • the image matching algorithm is a SIFT (Scale Invariant Feature Transform) algorithm.
  • SIFT Scale Invariant Feature Transform
  • the SIFT algorithm can detect and describe local features in the image, and has a high tolerance to the effects of light, noise, partial occlusion, and subtle viewing angle changes.
  • the process of calculating the similarity of images by the SIFT algorithm includes the steps of constructing the scale space, key point location, direction assignment, key point feature description, feature vector matching, similarity calculation, etc. Since SIFT is an existing algorithm, it will not be repeated here.
  • the user's location changes, for example, the user is walking or on other vehicles;
  • the location of the user has not changed, but there are many moving objects in the environment where the user is located, for example, the user is in a block, and there are many people and/or vehicles flowing.
  • the determining whether there is noise in the user environment where the user is located according to the audio data includes:
  • the preset sound features include short-term energy, short-term zero-crossing rate, linear prediction cepstral coefficient, logarithmic frequency energy coefficient, subband energy, Mel cepstral coefficient, loudness, spectral flow, resonance frequency, and fundamental frequency.
  • the feature value corresponding to each preset sound feature can be calculated by using an existing sound processing tool (for example, Matlab 6.0). .
  • the preset noise collectively stores pre-collected noise data under different environments.
  • a distance algorithm can be used to calculate the sound similarity between the first feature value and the second feature value corresponding to each preset sound feature of each noise data, for example, cosine similarity, Manhattan distance, Euclidean distance , Ming distance equidistant algorithm.
  • Y i is the noise similarity value between the audio data and the i-th noise data
  • a 1 is the weight parameter corresponding to the first preset sound feature
  • B i-1 is the corresponding value of the first preset sound feature of the audio data
  • a 2 is the weight parameter corresponding to the second preset sound feature
  • B i -2 is the sound similarity value between the first feature value corresponding to the second preset sound feature of the audio data and the second feature value corresponding to the second preset sound feature of the ith noise data
  • a n is The weight parameter corresponding to the nth preset sound feature
  • B in is the first feature value corresponding to the nth preset sound feature of the audio data and the second feature corresponding to the nth preset sound feature of the ith noise data Sound similarity value between values.
  • noise similarity value is greater than a third threshold, determine that the user environment where the user is located has noise.
  • the course video requested by the user can be played, and the third video data of the user in the learning process can be acquired in real time, so as to realize the real-time understanding of the user's learning situation.
  • the second video data and the third video data may also be stored in a node of a blockchain.
  • performing feature processing on the third video data to obtain the feature sequence of the user includes:
  • performing size normalization processing on the images in the second image sequence includes:
  • Face correction rotate the image according to the coordinate values of the left and right eyes to ensure the consistency of the face direction, where the distance between the two eyes is d, and the point is O;
  • Face cropping Determine the rectangular feature area according to the facial feature points and the geometric model, take O as the benchmark, cut d on the left and right sides, and take 0.5d and 1.5d rectangular areas in the vertical direction for cropping.
  • the image can be transformed into a uniform size through size normalization, which is beneficial to the extraction of expression features.
  • Y is the gray value of the normalized image
  • X is the gray value of the original image
  • min is the minimum gray value of the original image
  • max is the maximum gray value of the original image.
  • grayscale normalization is to increase the brightness of the image and make the details of the image clearer, so as to reduce the influence of light and light intensity on the image.
  • the feature extraction algorithm is a PCA (Principal Component Analysis, principal component analysis) algorithm.
  • the PCA algorithm is a dimensionality reduction algorithm. When two variables are correlated, it is considered that the two variables have certain overlapping information.
  • the PCA algorithm is to delete all the original variables from the repeated variables (closely related variables). Remove redundant variables and establish as few new variables as possible, so that the new variables are not related to each other, and the new variables retain the original information as much as possible, that is, map m-dimensional features to n-dimensions (n ⁇ m), and the resulting n-dimension Features are brand new orthogonal features called principal components.
  • the dimensionality reduction process of the PCA algorithm includes: finding the average value of each feature in the m features; finding the matrix after removing the mean value; calculating the eigenvalues of the covariance matrix; Projection matrix; find n-dimensional eigenvalues according to the projection matrix.
  • each feature in the obtained feature sequence is an n-dimensional feature.
  • the size normalization and grayscale normalization processing in this step can make the extraction of face features in the image more convenient, and then the feature data is dimensionally reduced by the PCA algorithm, which makes the feature processing more efficient.
  • the online real-time data interaction method further includes:
  • the data interaction request is rejected, and warning information is sent.
  • the expression recognition model is composed of a recurrent neural network model and a random forest model cascaded, and the output of the recurrent neural network model is the input of the random forest model.
  • the feature sequence is input into the recurrent neural network model for multiple nonlinear transformations and representations to obtain more representative advanced features, and the advanced features are input into the random forest model to obtain the user's target expression category.
  • the dynamic adjustment of the playback strategy of the first video data according to the target expression category includes:
  • the expression categories include happy, angry, annoyed, annoyed, excited, and puzzled.
  • the level data table includes three expression levels, the first-level expressions of the three expression levels include anger and boredom, the second-level expressions include doubts and surprises, and the third-level expressions include happy and excited.
  • G2. Determine the target expression level corresponding to the target expression category according to the grade data table
  • G3. Determine a target video playback strategy corresponding to the target expression level according to the predetermined mapping relationship between the expression level and the video playback strategy, and adjust the playback of the first video data according to the target video playback strategy.
  • mapping relationship between the predetermined expression level and the video playback strategy includes:
  • the video playback strategy corresponding to the first-level expression level is to stop playing the first video data and issue a warning message
  • the video playback strategy corresponding to the second-level expression level is to slow down the playback speed of the first video data, or repeatedly play the first video data within a preset time period (for example, the first 5 minutes);
  • the video playback strategy corresponding to the third-level expression level is to speed up the playback speed of the first video data.
  • the online real-time data interaction method further includes:
  • I1 obtain the screen shot of the client in real time, and judge whether the screen shot is a preset picture
  • the preset picture is that only the first video data being played is displayed in the picture. Taking online learning as an example, only the playback interface with course videos needs to be displayed on the screen. The purpose of this is to prevent students from playing games, watching movies, browsing news and other behaviors while learning.
  • the online real-time data interaction method proposed by the present application firstly collects the second video data and audio data of the user within a preset time period, and determines whether the user environment where the user is located meets the preset requirements.
  • the purpose of the steps is to improve the interaction efficiency by verifying whether the user environment is good; then, when it is judged that the user environment where the user is located meets the preset requirements, the first video data is played, and the third video data of the user is acquired in real time, and the third video data is obtained in real time.
  • FIG. 2 it is a schematic block diagram of an online real-time data interaction apparatus according to an embodiment of the present application.
  • the online real-time data interaction apparatus 100 described in this application may be installed in an electronic device. According to the implemented functions, the online real-time data interaction apparatus 100 may include a request module 110 , a judgment module 120 , a playback module 130 and an adjustment module 140 .
  • the modules described in this application may also be referred to as units, which refer to a series of computer program segments that can be executed by the processor of an electronic device and can perform fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the request module 110 is configured to respond to a data interaction request sent by a user based on the client, parse the request, and obtain an identifier of the first video data corresponding to the request;
  • the determination module 120 is configured to collect second video data and audio data of the user within a preset time period, and determine whether the user environment where the user is located meets preset requirements according to the second video data and audio data.
  • the data interaction request is an online learning request
  • the request includes an identifier of a course video to be learned.
  • collect the user's audio and video data for 5 to 15 seconds to confirm whether the user's learning environment meets the learning requirements.
  • video data is collected through a camera
  • audio data is collected through a microphone.
  • the determining according to the second video data and audio data whether the user environment where the user is located meets the preset requirements includes:
  • A1. Determine whether the user environment where the user is located has changed according to the second video data
  • A2. Determine whether there is noise in the user environment where the user is located according to the audio data
  • the determining whether the user environment where the user is located according to the second video data has changed includes:
  • the images in the first image sequence are encoded, for example, in the order of natural numbers from small to large. Assuming that there are 10 images in the first image sequence, the encodings are 1, 2, 3...10, respectively. Then, the two images encoded as adjacent odd and even numbers are regarded as an image group, that is, the two images corresponding to code 1 and code 2 are one image group, and the two images corresponding to code 3 and code 4 are one image group. Get 5 image groups.
  • any two adjacent images in the first image sequence can also be used as an image group, for example, the two images corresponding to code 1 and code 2 are a group of images, and the two images corresponding to code 2 and code 3 The two pictures are a picture group, and the two pictures corresponding to code 3 and code 4 are a picture group.
  • the image matching algorithm is a SIFT (Scale Invariant Feature Transform) algorithm.
  • SIFT Scale Invariant Feature Transform
  • the SIFT algorithm can detect and describe local features in the image, and has a high tolerance to the effects of light, noise, partial occlusion, and subtle viewing angle changes.
  • the process of calculating the similarity of images by the SIFT algorithm includes the steps of constructing the scale space, key point location, direction assignment, key point feature description, feature vector matching, similarity calculation, etc. Since SIFT is an existing algorithm, it will not be repeated here.
  • the user's location changes, for example, the user is walking or on other vehicles;
  • the location of the user has not changed, but there are many moving objects in the environment where the user is located, for example, the user is in a block, and there are many people and/or vehicles flowing.
  • the determining whether there is noise in the user environment where the user is located according to the audio data includes:
  • the preset sound features include short-term energy, short-term zero-crossing rate, linear prediction cepstral coefficient, logarithmic frequency energy coefficient, subband energy, Mel cepstral coefficient, loudness, spectral flow, resonance frequency, and fundamental frequency.
  • the feature value corresponding to each preset sound feature can be calculated by using an existing sound processing tool (for example, Matlab 6.0). .
  • the preset noise collectively stores pre-collected noise data under different environments.
  • a distance algorithm can be used to calculate the sound similarity between the first feature value and the second feature value corresponding to each preset sound feature of each noise data, for example, cosine similarity, Manhattan distance, Euclidean distance , Ming distance equidistant algorithm.
  • Y i is the noise similarity value between the audio data and the i-th noise data
  • a 1 is the weight parameter corresponding to the first preset sound feature
  • B i-1 is the corresponding value of the first preset sound feature of the audio data
  • a 2 is the weight parameter corresponding to the second preset sound feature
  • B i -2 is the sound similarity value between the first feature value corresponding to the second preset sound feature of the audio data and the second feature value corresponding to the second preset sound feature of the ith noise data
  • a n is The weight parameter corresponding to the nth preset sound feature
  • B in is the first feature value corresponding to the nth preset sound feature of the audio data and the second feature corresponding to the nth preset sound feature of the ith noise data Sound similarity value between values.
  • noise similarity value is greater than a third threshold, determine that the user environment where the user is located has noise.
  • the playing module 130 is configured to play the first video data when judging that the user environment where the user is located meets the preset requirements, acquire the third video data of the user in real time, and execute the third video data on the third video data. Feature processing to obtain the feature sequence of the user.
  • the course video requested by the user can be played, and the third video data of the user in the learning process can be acquired in real time, so as to realize the real-time understanding of the user's learning situation.
  • the second video data and the third video data may also be stored in a node of a blockchain.
  • performing feature processing on the third video data to obtain the feature sequence of the user includes:
  • performing size normalization processing on the images in the second image sequence includes:
  • Face correction rotate the image according to the coordinate values of the left and right eyes to ensure the consistency of the face direction, where the distance between the two eyes is d, and the point is O;
  • Face cropping Determine the rectangular feature area according to the facial feature points and the geometric model, take O as the benchmark, cut d on the left and right sides, and take 0.5d and 1.5d rectangular areas in the vertical direction for cropping.
  • the image can be transformed into a uniform size through size normalization, which is beneficial to the extraction of expression features.
  • Y is the gray value of the normalized image
  • X is the gray value of the original image
  • min is the minimum gray value of the original image
  • max is the maximum gray value of the original image.
  • grayscale normalization is to increase the brightness of the image and make the details of the image clearer, so as to reduce the influence of light and light intensity on the image.
  • the feature extraction algorithm is a PCA (Principal Component Analysis, principal component analysis) algorithm.
  • the PCA algorithm is a dimensionality reduction algorithm. When two variables are correlated, it is considered that the two variables have certain overlapping information.
  • the PCA algorithm is to delete all the original variables from the repeated variables (closely related variables). Remove redundant variables and establish as few new variables as possible, so that the new variables are not related to each other, and the new variables retain the original information as much as possible, that is, map m-dimensional features to n-dimensions (n ⁇ m), and the resulting n-dimension Features are brand new orthogonal features called principal components.
  • the dimensionality reduction process of the PCA algorithm includes: finding the average value of each feature in the m features; finding the matrix after removing the mean value; calculating the eigenvalues of the covariance matrix; Projection matrix; find n-dimensional eigenvalues according to the projection matrix.
  • each feature in the obtained feature sequence is an n-dimensional feature.
  • the size normalization and grayscale normalization processing in this step can make the extraction of face features in the image more convenient, and then the feature data is dimensionally reduced by the PCA algorithm, which makes the feature processing more efficient.
  • the judging module 120 is further configured to:
  • the data interaction request is rejected, and warning information is sent.
  • the adjustment module 140 is configured to input the feature sequence into an expression recognition model to obtain the target expression category of the user, and dynamically adjust the playback strategy of the first video data according to the target expression category.
  • the expression recognition model is composed of a recurrent neural network model and a random forest model cascaded, and the output of the recurrent neural network model is the input of the random forest model.
  • the feature sequence is input into the recurrent neural network model for multiple nonlinear transformations and representations to obtain more representative advanced features, and the advanced features are input into the random forest model to obtain the user's target expression category.
  • the dynamic adjustment of the playback strategy of the first video data according to the target expression category includes:
  • the expression categories include happy, angry, annoyed, annoyed, excited, and puzzled.
  • the level data table includes three levels of expressions, the first level of expressions in the three levels of expressions includes anger and boredom, the second level of expressions includes doubts and surprises, and the third level of expressions includes happiness and excitement .
  • G2. Determine the target expression level corresponding to the target expression category according to the grade data table
  • G3. Determine a target video playback strategy corresponding to the target expression level according to the predetermined mapping relationship between the expression level and the video playback strategy, and adjust the playback of the first video data according to the target video playback strategy.
  • mapping relationship between the predetermined expression level and the video playback strategy includes:
  • the video playback strategy corresponding to the first-level expression level is to stop playing the first video data and issue a warning message
  • the video playback strategy corresponding to the second-level expression level is to slow down the playback speed of the first video data, or repeatedly play the first video data within a preset time period (for example, the first 5 minutes);
  • the video playback strategy corresponding to the third-level expression level is to speed up the playback speed of the first video data.
  • the adjustment module 140 is further configured to:
  • I1 obtain the screen shot of the client in real time, and judge whether the screen shot is a preset picture
  • the preset picture is that only the first video data being played is displayed in the picture. Taking online learning as an example, only the playback interface with course videos needs to be displayed on the screen. The purpose of this is to prevent students from playing games, watching movies, browsing news and other behaviors while learning.
  • FIG. 3 a schematic structural diagram of an electronic device for implementing a method for online real-time data interaction provided by an embodiment of the present application.
  • the electronic device 1 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
  • the electronic device 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud based on cloud computing composed of a large number of hosts or network servers, wherein cloud computing is a kind of distributed computing, A super virtual computer consisting of a collection of loosely coupled computers.
  • the electronic device 1 includes, but is not limited to, a memory 11 , a processor 12 , and a network interface 13 that can be communicatively connected to each other through a system bus.
  • the memory 11 stores a wired real-time data interaction program 10 .
  • the real-time data interaction program 10 can be executed by the processor 12 .
  • FIG. 1 only shows the electronic device 1 having the components 11-13 and the online real-time data interaction program 10. Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute a limitation on the electronic device 1. Fewer or more components are included than shown, or some components are combined, or a different arrangement of components.
  • the memory 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the electronic device 1;
  • the readable storage medium can be, for example, flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM) ), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. non-volatile storage media.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1; in other embodiments, the non-volatile storage medium may also be an external storage unit of the electronic device 1
  • a storage device such as a pluggable hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card), etc. equipped on the electronic device 1.
  • the readable storage medium of the memory 11 is generally used to store the operating system and various application software installed in the electronic device 1 , for example, to store the code of the online real-time data interaction program 10 in an embodiment of the present application.
  • the memory 11 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is generally used to control the overall operation of the electronic device 1, such as performing control and processing related to data interaction or communication with other devices.
  • the processor 12 is configured to run the program code or process data stored in the memory 11, for example, run the online real-time data interaction program 10 and the like.
  • the network interface 13 may include a wireless network interface or a wired network interface, and the network interface 13 is used to establish a communication connection between the electronic device 1 and a client (not shown in the figure).
  • the electronic device 1 may further include a user interface, and the user interface may include a display (Display), an input unit such as a keyboard (Keyboard), and an optional user interface may also include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the online real-time data interaction program 10 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 12, it can realize:
  • the first video data is played, the third video data of the user is acquired in real time, and feature processing is performed on the third video data to obtain the User's feature sequence;
  • the feature sequence is input into an expression recognition model to obtain a target expression category of the user, and a playback strategy of the first video data is dynamically adjusted according to the target expression category.
  • the second video data and the third video data may also be stored in a node of a blockchain.
  • the integrated modules/units of the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium, which can be stored in a computer-readable storage medium. It is volatile and can also be non-volatile.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • An online real-time data interaction program is stored on the computer-readable storage medium, and the online real-time data interaction program can be executed by one or more processors to realize the following steps:
  • the first video data is played, the third video data of the user is acquired in real time, and feature processing is performed on the third video data to obtain the User's feature sequence;
  • the feature sequence is input into an expression recognition model to obtain a target expression category of the user, and a playback strategy of the first video data is dynamically adjusted according to the target expression category.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Educational Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Educational Administration (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

一种线上实时数据交互方法,包括:响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识(S1);采集用户在预设时间段内的第二视频数据及音频数据,根据第二视频数据及音频数据判断用户所处的用户环境是否满足预设要求(S2);当判断用户所处的用户环境满足预设要求时,播放用户请求对应的第一视频数据,并实时获取用户的第三视频数据,对第三视频数据执行特征处理,得到用户的特征序列(S3);将特征序列输入表情识别模型,得到用户的目标表情类别,根据目标表情类别动态调整第一视频数据的播放策略(S4)。该技术方案还涉及区块链技术,第二、第三视频数据存储于区块链中,可提高线上实时数据交互效率。

Description

线上实时数据交互方法、装置、电子设备及存储介质
本申请要求于2020年7月19日提交中国专利局、申请号为CN202010695107.8,发明名称为“线上实时数据交互方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及大数据处理,尤其涉及一种线上实时数据交互方法、装置、电子设备及存储介质。
背景技术
随着互联网技术的发展,线上数据交互的应用越来越广泛,例如,在线学习以学习时间灵活、学习地点无限制等优势迅速融入人们的生活中。
发明人意识到当前的在线学习方式,学员在线学习课程视频,教师无法获知学员的学习环境,也无法及时掌握学员的学习状态,通常只能通过问卷调查及测验等事后反馈机制来了解学员的学习情况,从而不能及时根据学员的学习情况来调整课程视频的播放策略,导致学员的学习效率低下。因此,亟需一种线上实时数据交互方法,以动态调整视频播放策略,提高线上实时数据交互效率。
发明内容
一种线上实时数据交互方法,包括:
响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求;
当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列;
将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
一种线上实时数据交互装置,所述装置包括:
请求模块,用于响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
判断模块,用于采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求;
播放模块,用于当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列;
调整模块,用于将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
一种电子设备,所述电子设备包括:
至少一个处理器;以及,
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行以下步骤:
响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一 视频数据的标识;
采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求;
当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列;
将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
一种计算机可读存储介质,所述计算机可读存储介质上存储有线上实时数据交互程序,所述线上实时数据交互程序可被一个或者多个处理器执行,以实现以下步骤:
响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求;
当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列;
将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
本申请提高了线上实时数据交互效率。
附图说明
图1为本申请一实施例提供的线上实时数据交互方法的流程示意图;
图2为本申请一实施例提供的线上实时数据交互装置的模块示意图;
图3为本申请一实施例提供的实现线上实时数据交互方法的电子设备的结构示意图;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本申请提供一种线上实时数据交互方法。参照图1所示,为本申请一实施例提供的线上实时数据交互方法的流程示意图。该方法可以由一个电子设备执行,该电子设备可以由软件和/或硬件实现。
本实施例中,线上实时数据交互方法包括:
S1、响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
S2、采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求。
本实施例以数据交互请求为在线学习请求为例进行说明,所述请求中包括待学习的课程视频标识。当接收到用户的在线学习请求后,采集用户5~15秒的音、视频数据,以确认用户的学习环境是否满足学习要求。本实施例中通过摄像头采集视频数据,通过麦克风采集音频数据。
所述根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求包括:
A1、根据所述第二视频数据判断所述用户所处的用户环境是否发生变化;
A2、根据所述音频数据判断所述用户所处的用户环境是否存在噪声;
A3、若判断所述用户环境未发生变化,且所述用户环境无噪声,则判断所述用户所处的用户环境满足预设要求。
本实施例中,所述根据所述第二视频数据判断所述用户所处的用户环境是否发生变化包括:
B1、对所述第二视频数据进行分帧,得到第一图像序列;
B2、将所述第一图像序列中相邻两张图像作为一个图像组,得到多个图像组;
本实施例中,将第一图像序列中的图像进行编码,例如,按照自然数从小到大的顺序编码,假设第一图像序列中共10张图像,则其编码分别为1,2,3…10,然后从第一图像序列中的第一张图像开始,将编码为相邻奇偶数的两张图像作为一个图像组,即编码1及编码2对应的两张图像为一个图像组,编码3及编码4对应的两张图像为一个图像组,共可得到5个图像组。
在其他实施例中,也可将第一图像序列中任意相邻的两张图像作为一个图像组,例如,编码1及编码2对应的两张图像为一个图像组,编码2及编码3对应的两张图像为一个图像组,编码3及编码4对应的两张图像为一个图像组。
B3、基于图像匹配算法计算所述多个图像组中每个图像组的图像相似度值;
B4、若所述多个图像组中图像相似度值大于第一阈值的图像组的数量大于第二阈值,则判断所述用户所处的用户环境发生变化。
本实施例中,所述图像匹配算法为SIFT(Scale Invariant Feature Transform,尺度不变特征转换)算法。SIFT算法可侦测与描述图像中的局部性特征,对光线、噪声、部分遮蔽、细微视角改变等影响的容忍度极高。SIFT算法计算图像相似度的过程包括构建尺度空间、关键点定位、方向分配、关键点特征描述、特征向量匹配、相似度计算等步骤,因SIFT为现有算法,在此不做赘述。
通过步骤B1~B4可识别出以下两种情况的环境变化:
C1、用户的位置发生变化,例如,用户在步行中或其他交通工具上;
C2、用户的位置未发生变化,但用户所处环境中的移动物体较多,例如,用户在街区,流动的人群和/或车辆较多。
以上两种情况对用户的干扰较大,皆不适合学习。
本实施例中,所述根据所述音频数据判断所述用户所处的用户环境是否存在噪声包括:
D1、计算所述音频数据的多个预设声音特征中的每个预设声音特征对应的第一特征值;
所述预设声音特征包括短时能量、短时过零率、线性预测倒谱系数、对数频率能量系数、子带能量、梅尔倒谱系数、响度、频谱流量、共振频率、基音频率。
本实施例中,将音频数据分帧(例如,按照帧长512,帧移256进行分帧)后,可通过现有声音处理工具(例如,Matlab 6.0)计算各个预设声音特征对应的特征值。
D2、计算预设噪声集中各个噪声数据的所述每个预设声音特征对应的第二特征值;
所述预设噪声集中存储有预先采集的不同环境下的噪声数据。
D3、分别计算所述第一特征值与所述各个噪声数据的所述每个预设声音特征对应的第二特征值之间的声音相似度值;
本实施例中,可采用距离算法计算第一特征值与各个噪声数据的每个预设声音特征对应的第二特征值之间的声音相似度,例如,余弦相似度、曼哈顿距离、欧氏距离、明式距离等距离算法。
D4、根据所述声音相似度值及预先确定的各个预设声音特征对应的权重参数计算所述音频数据与所述各个噪声数据的噪声相似度值;
所述噪声相似度值的计算公式为:
Y i=A 1*B i-1+A 2*B i-2+…+A n*B i-n
其中,Y i为音频数据与第i个噪声数据的噪声相似度值,A 1为第一个预设声音特征对应的权重参数,B i-1为音频数据的第一个预设声音特征对应的第一特征值与第i个噪声数据的第一个预设声音特征对应的第二特征值之间的声音相似度值,A 2为第二个预设声音特征对应的权重参数,B i-2为音频数据的第二个预设声音特征对应的第一特征值与第i个噪声数据的第二个预设声音特征对应的第二特征值之间的声音相似度值,A n为第n个预设声音特征对应的权重参数,B i-n为音频数据的第n个预设声音特征对应的第一特征值与第i个噪声数据的第n个预设声音特征对应的第二特征值之间的声音相似度值。
D5、若所述噪声相似度值大于第三阈值,则判断所述用户所处的用户环境存在噪声。
以请求在线学习为例,通过用户环境是否在变化及用户环境中是否存在噪声判断用户当前的学习环境是否满足学习要求,以实现通过优良的学习环境提高学习效率。
S3、当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列。
以请求在线学习为例,当用户的学习环境满足学习要求时,可播放用户请求的课程视频,并实时获取所述用户在学习过程中的第三视频数据,以实现实时了解用户的学习情况。
为进一步保证上述第二视频数据、第三视频数据的私密和安全性,上述第二视频数据、第三视频数据还可以存储于一区块链的节点中。
本实施例中,所述对所述第三视频数据进行特征处理,得到所述用户的特征序列包括:
E1、对所述第三视频数据进行分帧,得到第二图像序列;
E2、对所述第二图像序列中的每个图像执行尺寸归一化处理,得到第三图像序列;
在本申请的另一个实施例中,所述对所述第二图像序列中的图像执行尺寸归一化处理,包括:
F1、标定特征点:根据[x,y]=ginput(3)函数标定两眼和鼻子三个特征点,获取三个特征点的坐标值;
F2、人脸校正:根据左右两眼的坐标值旋转图像,以保证人脸方向的一致性,其中,两眼之间的距离为d,其中点为O;
F3、人脸裁剪:根据面部特征点和几何模型确定矩形特征区域,以O为基准,左右各剪切d,垂直方向各取0.5d和1.5d的矩形区域进行裁剪。
通过尺寸归一化处理可将图像变换为统一的尺寸,有利于表情特征的提取。
E3、对所述第三图像序列中的每个图像执行灰度归一化处理,得到第四图像序列;
所述灰度归一化处理的公式为:
Figure PCTCN2021095009-appb-000001
其中,Y为归一化后的图像的灰度值,X为原图像的灰度值,min为原图像的最小灰度值,max为原图像的最大灰度值。
灰度归一化的目的是增加图像的亮度,使图像的细节更加清楚,以减弱光线和光照强度对图像的影响。
E4、基于特征提取算法对所述第四图像序列中的每个图像执行特征提取,得到所述用户的特征序列。
本实施例中,所述特征提取算法为PCA(Principal ComponentAnalysis,主成分分析)算法。PCA算法是一种降维算法,当两个变量有相关性时,认为两个变量有一定的信息重叠,PCA算法是对于原有的所有变量,从重复的变量(关系紧密的变量)中删去多余变量,建立尽可能少的新变量,使得新变量两两不相关,且新变量尽可能保留原有信息,即将m维特征映射到n维上(n<m),所得到的n维特征是全新的正交特征,称为主成分。
PCA算法的降维过程包括:求m个特征中每个特征的平均值;求去均值后的矩阵;计算协方差矩阵的特征值;对特征值排序,取前n个特征作为主成分,得到投影矩阵;根据投影矩阵求n维特征值。
本实施例中,采用PCA算法对第四图像序列中的图像处理后,得到的特征序列中的每个特征为一个n维特征。
本步骤中的尺寸归一化、灰度归一化处理可使得图像中人脸特征的提取更为方便,再通过PCA算法将特征数据降维,使得特征处理的效率更高。
本实施例中,在根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求之后,所述线上实时数据交互方法还包括:
若判断所述用户所处的用户环境不满足预设要求,则拒绝所述数据交互请求,并发送警示信息。
S4、将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
本实施例中,所述表情识别模型是由递归神经网络模型和随机森林模型级联组成的,递归神经网络模型的输出为随机森林模型的输入。将特征序列输入递归神经网络模型进行多种非线性变换和表示,得到更具有表征性的高级特征,将高级特征输入随机森林模型,得到用户的目标表情类别。
本实施例中,所述根据所述目标表情类别动态调整所述第一视频数据的播放策略包括:
G1、将各表情类别分别添加至预先配置的等级数据表中;
所述表情类别包括开心、生气、厌烦、惊讶、兴奋、疑惑。
本实施例中,所述等级数据表中包括三个表情等级,三个表情等级中的第一级表情包括生气、厌烦,第二级表情包括疑惑、惊讶,第三级表情包括开心、兴奋。
G2、根据所述等级数据表确定所述目标表情类别对应的目标表情等级;
G3、根据预先确定的表情等级与视频播放策略之间的映射关系确定所述目标表情等级对应的目标视频播放策略,根据所述目标视频播放策略对所述第一视频数据进行播放调整。
本实施例中,所述预先确定的表情等级与视频播放策略之间的映射关系包括:
H1、第一级表情等级对应的视频播放策略为停止播放所述第一视频数据,并发出警示信息;
H2、第二级表情等级对应的视频播放策略为减慢所述第一视频数据的播放速度,或者重复播放预设时间段内(例如,前5分钟)的所述第一视频数据;
H3、第三级表情等级对应的视频播放策略为加快所述第一视频数据的播放速度。
在本申请的另一个实施例中,在根据所述目标表情类别动态调整所述第一视频数据的播放策略之后,所述线上实时数据交互方法还包括:
I1、实时获取所述客户端的屏幕截图,判断所述屏幕截图是否为预设画面;
I2、若判断所述屏幕截图不是预设画面,则停止播放所述第一视频数据,并发送警示信息至所述客户端。
所述预设画面为画面中仅显示有正在播放的第一视频数据。以在线学习为例,画面中需仅显示有课程视频的播放界面,此举的目的是为了防止学员在学习的同时,还存在玩游戏、看电影、浏览新闻等其他行为。
由上述实施例可知,本申请提出的线上实时数据交互方法,首先,采集用户在预设时间段内的第二视频数据及音频数据,判断用户所处的用户环境是否满足预设要求,本步骤的目的是通过验证用户环境是否优良来提升交互效率;接着,当判断用户所处的用户环境满足预设要求时,播放第一视频数据,并实时获取用户的第三视频数据,对第三视频数据执行特征处理,得到用户的特征序列,通过本步骤可实时获悉用户的当前状况;最后,将特征序列输入表情识别模型,得到用户的目标表情类别,根据目标表情类别动态调整第一视频数据的播放策略,本步骤将播放策略与用户的表情关联起来,使得实时交互效率更高。故而,本申请提高了线上实时数据交互效率。
如图2所示,为本申请一实施例提供的线上实时数据交互装置的模块示意图。
本申请所述线上实时数据交互装置100可以安装于电子设备中。根据实现的功能,所述线上实时数据交互装置100可以包括请求模块110、判断模块120、播放模块130及调整模块140。本申请所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。
在本实施例中,关于各模块/单元的功能如下:
请求模块110,用于响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
判断模块120,用于采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求。
本实施例以数据交互请求为在线学习请求为例进行说明,所述请求中包括待学习的课程视频标识。当接收到用户的在线学习请求后,采集用户5~15秒的音、视频数据,以确认用户的学习环境是否满足学习要求。本实施例中通过摄像头采集视频数据,通过麦克风采集音频数据。
所述根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求包括:
A1、根据所述第二视频数据判断所述用户所处的用户环境是否发生变化;
A2、根据所述音频数据判断所述用户所处的用户环境是否存在噪声;
A3、若判断所述用户环境未发生变化,且所述用户环境无噪声,则判断所述用户所处的用户环境满足预设要求。
本实施例中,所述根据所述第二视频数据判断所述用户所处的用户环境是否发生变化包括:
B1、对所述第二视频数据进行分帧,得到第一图像序列;
B2、将所述第一图像序列中相邻两张图像作为一个图像组,得到多个图像组;
本实施例中,将第一图像序列中的图像进行编码,例如,按照自然数从小到大的顺序编码,假设第一图像序列中共10张图像,则其编码分别为1,2,3…10,然后将编码为相邻奇偶数的两张图像作为一个图像组,即编码1及编码2对应的两张图像为一个图像组,编码3及编码4对应的两张图像为一个图像组,共可得到5个图像组。
在其他实施例中,也可将第一图像序列中任意相邻的两张图像作为一个图像组,例如,编码1及编码2对应的两张图像为一个图像组,编码2及编码3对应的两张图像为一个图像组,编码3及编码4对应的两张图像为一个图像组。
B3、基于图像匹配算法计算所述多个图像组中每个图像组的图像相似度值;
B4、若所述多个图像组中图像相似度值大于第一阈值的图像组的数量大于第二阈值,则判断所述用户所处的用户环境发生变化。
本实施例中,所述图像匹配算法为SIFT(Scale Invariant Feature Transform,尺度不变特征转换)算法。SIFT算法可侦测与描述图像中的局部性特征,对光线、噪声、部分遮蔽、细微视角改变等影响的容忍度极高。SIFT算法计算图像相似度的过程包括构建尺度空间、关键点定位、方向分配、关键点特征描述、特征向量匹配、相似度计算等步骤,因SIFT为现有算法,在此不做赘述。
通过步骤B1~B4可识别出以下两种情况的环境变化:
C1、用户的位置发生变化,例如,用户在步行中或其他交通工具上;
C2、用户的位置未发生变化,但用户所处环境中的移动物体较多,例如,用户在街区,流动的人群和/或车辆较多。
以上两种情况对用户的干扰较大,皆不适合学习。
本实施例中,所述根据所述音频数据判断所述用户所处的用户环境是否存在噪声包括:
D1、计算所述音频数据的多个预设声音特征中的每个预设声音特征对应的第一特征值;
所述预设声音特征包括短时能量、短时过零率、线性预测倒谱系数、对数频率能量系数、子带能量、梅尔倒谱系数、响度、频谱流量、共振频率、基音频率。
本实施例中,将音频数据分帧(例如,按照帧长512,帧移256进行分帧)后,可通过现有声音处理工具(例如,Matlab 6.0)计算各个预设声音特征对应的特征值。
D2、计算预设噪声集中各个噪声数据的所述每个预设声音特征对应的第二特征值;
所述预设噪声集中存储有预先采集的不同环境下的噪声数据。
D3、分别计算所述第一特征值与所述各个噪声数据的所述每个预设声音特征对应的第二特征值之间的声音相似度值;
本实施例中,可采用距离算法计算第一特征值与各个噪声数据的每个预设声音特征对应的第二特征值之间的声音相似度,例如,余弦相似度、曼哈顿距离、欧氏距离、明式距离等距离算法。
D4、根据所述声音相似度值及预先确定的各个预设声音特征对应的权重参数计算所述音频数据与所述各个噪声数据的噪声相似度值;
所述噪声相似度值的计算公式为:
Y i=A 1*B i-1+A 2*B i-2+…+A n*B i-n
其中,Y i为音频数据与第i个噪声数据的噪声相似度值,A 1为第一个预设声音特征对应的权重参数,B i-1为音频数据的第一个预设声音特征对应的第一特征值与第i个噪声数据的第一个预设声音特征对应的第二特征值之间的声音相似度值,A 2为第二个预设声音特征对应的权重参数,B i-2为音频数据的第二个预设声音特征对应的第一特征值与第i个噪声数据的第二个预设声音特征对应的第二特征值之间的声音相似度值,A n为第n个预设声音特征对应的权重参数,B i-n为音频数据的第n个预设声音特征对应的第一特征值与第i个噪声数据的第n个预设声音特征对应的第二特征值之间的声音相似度值。
D5、若所述噪声相似度值大于第三阈值,则判断所述用户所处的用户环境存在噪声。
以请求在线学习为例,通过用户环境是否在变化及用户环境中是否存在噪声判断用户当前的学习环境是否满足学习要求,以实现通过优良的学习环境提高学习效率。
播放模块130,用于当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列。
以请求在线学习为例,当用户的学习环境满足学习要求时,可播放用户请求的课程视频,并实时获取所述用户在学习过程中的第三视频数据,以实现实时了解用户的学习情况。
为进一步保证上述第二视频数据、第三视频数据的私密和安全性,上述第二视频数据、第三视频数据还可以存储于一区块链的节点中。
本实施例中,所述对所述第三视频数据进行特征处理,得到所述用户的特征序列包括:
E1、对所述第三视频数据进行分帧,得到第二图像序列;
E2、对所述第二图像序列中的每个图像执行尺寸归一化处理,得到第三图像序列;
在本申请的另一个实施例中,所述对所述第二图像序列中的图像执行尺寸归一化处理,包括:
F1、标定特征点:根据[x,y]=ginput(3)函数标定两眼和鼻子三个特征点,获取三个特征点的坐标值;
F2、人脸校正:根据左右两眼的坐标值旋转图像,以保证人脸方向的一致性,其中,两眼之间的距离为d,其中点为O;
F3、人脸裁剪:根据面部特征点和几何模型确定矩形特征区域,以O为基准,左右各剪切d,垂直方向各取0.5d和1.5d的矩形区域进行裁剪。
通过尺寸归一化处理可将图像变换为统一的尺寸,有利于表情特征的提取。
E3、对所述第三图像序列中的每个图像执行灰度归一化处理,得到第四图像序列;
所述灰度归一化处理的公式为:
Figure PCTCN2021095009-appb-000002
其中,Y为归一化后的图像的灰度值,X为原图像的灰度值,min为原图像的最小灰度值,max为原图像的最大灰度值。
灰度归一化的目的是增加图像的亮度,使图像的细节更加清楚,以减弱光线和光照强度对图像的影响。
E4、基于特征提取算法对所述第四图像序列中的每个图像执行特征提取,得到所述用户的特征序列。
本实施例中,所述特征提取算法为PCA(Principal ComponentAnalysis,主成分分析)算法。PCA算法是一种降维算法,当两个变量有相关性时,认为两个变量有一定的信息重叠,PCA算法是对于原有的所有变量,从重复的变量(关系紧密的变量)中删去多余变量,建立尽可能少的新变量,使得新变量两两不相关,且新变量尽可能保留原有信息,即将m维特征映射到n维上(n<m),所得到的n维特征是全新的正交特征,称为主成分。
PCA算法的降维过程包括:求m个特征中每个特征的平均值;求去均值后的矩阵;计算协方差矩阵的特征值;对特征值排序,取前n个特征作为主成分,得到投影矩阵;根据投影矩阵求n维特征值。
本实施例中,采用PCA算法对第四图像序列中的图像处理后,得到的特征序列中的每个特征为一个n维特征。
本步骤中的尺寸归一化、灰度归一化处理可使得图像中人脸特征的提取更为方便,再通过PCA算法将特征数据降维,使得特征处理的效率更高。
本实施例中,在根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求之后,所述判断模块120还用于:
若判断所述用户所处的用户环境不满足预设要求,则拒绝所述数据交互请求,并发送警示信息。
调整模块140,用于将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
本实施例中,所述表情识别模型是由递归神经网络模型和随机森林模型级联组成的,递归神经网络模型的输出为随机森林模型的输入。将特征序列输入递归神经网络模型进行多种非线性变换和表示,得到更具有表征性的高级特征,将高级特征输入随机森林模型,得到用户的目标表情类别。
本实施例中,所述根据所述目标表情类别动态调整所述第一视频数据的播放策略包括:
G1、将各表情类别分别添加至预先配置的等级数据表中;
所述表情类别包括开心、生气、厌烦、惊讶、兴奋、疑惑。
本实施例中,所述等级数据表中包括为三个表情等级,三个表情等级中的第一级表情包括生气、厌烦,第二级表情包括疑惑、惊讶,第三级表情包括开心、兴奋。
G2、根据所述等级数据表确定所述目标表情类别对应的目标表情等级;
G3、根据预先确定的表情等级与视频播放策略之间的映射关系确定所述目标表情等级对应的目标视频播放策略,根据所述目标视频播放策略对所述第一视频数据进行播放调整。
本实施例中,所述预先确定的表情等级与视频播放策略之间的映射关系包括:
H1、第一级表情等级对应的视频播放策略为停止播放所述第一视频数据,并发出警示信息;
H2、第二级表情等级对应的视频播放策略为减慢所述第一视频数据的播放速度,或者重复播放预设时间段内(例如,前5分钟)的所述第一视频数据;
H3、第三级表情等级对应的视频播放策略为加快所述第一视频数据的播放速度。
在本申请的另一个实施例中,在根据所述目标表情类别动态调整所述第一视频数据的播放策略之后,所述调整模块140还用于:
I1、实时获取所述客户端的屏幕截图,判断所述屏幕截图是否为预设画面;
I2、若判断所述屏幕截图不是预设画面,则停止播放所述第一视频数据,并发送警示信息至所述客户端。
所述预设画面为画面中仅显示有正在播放的第一视频数据。以在线学习为例,画面中需仅显示有课程视频的播放界面,此举的目的是为了防止学员在学习的同时,还存在玩游戏、看电影、浏览新闻等其他行为。
如图3所示,为本申请一实施例提供的实现线上实时数据交互方法的电子设备的结构示意图。
所述电子设备1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述电子设备1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。
在本实施例中,电子设备1包括,但不仅限于,可通过系统总线相互通信连接的存储器11、处理器12、网络接口13,该存储器11中存储有线上实时数据交互程序10,所述线上实时数据交互程序10可被所述处理器12执行。图1仅示出了具有组件11-13以及线上实时数据交互程序10的电子设备1,本领域技术人员可以理解的是,图1示出的结构并不构成对电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
其中,存储器11包括内存及至少一种类型的可读存储介质。内存为电子设备1的运行提供缓存;可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等的非易失性存储介质。在一些实施例中,可读存储介质可以是电子设备1的内部存储单元,例如该电子设备1的硬盘;在另一些实施例中,该非易失性存储介质也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。本实施例中,存储器11的可读存储介质通常用于存储安装于电子设备1的操作系统和各类应用软件,例如存储本申请一实施例中的线上实时数据交互程序10的代码等。此外,存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制 器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述电子设备1的总体操作,例如执行与其他设备进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行线上实时数据交互程序10等。
网络接口13可包括无线网络接口或有线网络接口,该网络接口13用于在所述电子设备1与客户端(图中未画出)之间建立通信连接。
可选的,所述电子设备1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。可选的,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。
所述电子设备1中的所述存储器11存储的线上实时数据交互程序10是多个指令的组合,在所述处理器12中运行时,可以实现:
响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求;
当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列;
将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
具体地,所述处理器12对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。需要强调的是,为进一步保证上述第二视频数据、第三视频数据的私密和安全性,上述第二视频数据、第三视频数据还可以存储于一区块链的节点中。
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中,该计算机可读存储介质可以是易失性的,也可以是非易失性的。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。所述计算机可读存储介质上存储有线上实时数据交互程序,所述线上实时数据交互程序可被一个或者多个处理器执行,以实现以下步骤:
响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求;
当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列;
将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的 划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种线上实时数据交互方法,其中,所述方法包括:
    响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
    采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求;
    当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列;
    将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
  2. 如权利要求1所述的线上实时数据交互方法,其中,所述根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求包括:
    根据所述第二视频数据判断所述用户所处的用户环境是否发生变化;
    根据所述音频数据判断所述用户所处的用户环境是否存在噪声;
    若判断所述用户环境未发生变化,且所述用户环境无噪声,则判断所述用户所处的用户环境满足预设要求。
  3. 如权利要求2所述的线上实时数据交互方法,其中,所述根据所述第二视频数据判断所述用户所处的用户环境是否发生变化包括:
    对所述第二视频数据进行分帧,得到第一图像序列;
    将所述第一图像序列中相邻两张图像作为一个图像组,得到多个图像组;
    基于图像匹配算法计算所述多个图像组中每个图像组的图像相似度值;
    若所述多个图像组中图像相似度值大于第一阈值的图像组的数量大于第二阈值,则判断所述用户所处的用户环境发生变化。
  4. 如权利要求2所述的线上实时数据交互方法,其中,所述根据所述音频数据判断所述用户所处的用户环境是否存在噪声包括:
    计算所述音频数据的多个预设声音特征中的每个预设声音特征对应的第一特征值;
    计算预设噪声集中各个噪声数据的所述每个预设声音特征对应的第二特征值;
    分别计算所述第一特征值与所述各个噪声数据的所述每个预设声音特征对应的第二特征值之间的声音相似度值;
    根据所述声音相似度值及预先确定的各个预设声音特征对应的权重参数计算所述音频数据与所述各个噪声数据的噪声相似度值;
    若所述噪声相似度值大于第三阈值,则判断所述用户所处的用户环境存在噪声。
  5. 如权利要求1所述的线上实时数据交互方法,其中,所述对所述第三视频数据进行特征处理,得到所述用户的特征序列包括:
    对所述第三视频数据进行分帧,得到第二图像序列;
    对所述第二图像序列中的每个图像执行尺寸归一化处理,得到第三图像序列;
    对所述第三图像序列中的每个图像执行灰度归一化处理,得到第四图像序列;
    基于特征提取算法对所述第四图像序列中的每个图像执行特征提取,得到所述用户的特征序列。
  6. 如权利要求5所述的线上实时数据交互方法,其中,所述根据所述目标表情类别动态调整所述第一视频数据的播放策略包括:
    将各表情类别分别添加至预先配置的等级数据表中;
    根据所述等级数据表确定所述目标表情类别对应的目标表情等级;
    根据预先确定的表情等级与视频播放策略之间的映射关系确定所述目标表情等级对应的目标视频播放策略,根据所述目标视频播放策略对所述第一视频数据进行播放调整。
  7. 如权利要求1-6任一项所述的线上实时数据交互方法,其中,在根据所述目标表情类别动态调整所述第一视频数据的播放策略之后,所述方法还包括:
    实时获取所述客户端的屏幕截图,判断所述屏幕截图是否为预设画面;
    若判断所述屏幕截图不是预设画面,则停止播放所述第一视频数据,并发送警示信息至所述客户端。
  8. 一种线上实时数据交互装置,其中,所述装置包括:
    请求模块,用于响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
    判断模块,用于采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求;
    播放模块,用于当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列;
    调整模块,用于将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
  9. 一种电子设备,其中,所述电子设备包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下步骤:
    响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
    采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求;
    当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列;
    将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
  10. 如权利要求9所述的电子设备,其中,所述根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求包括:
    根据所述第二视频数据判断所述用户所处的用户环境是否发生变化;
    根据所述音频数据判断所述用户所处的用户环境是否存在噪声;
    若判断所述用户环境未发生变化,且所述用户环境无噪声,则判断所述用户所处的用户环境满足预设要求。
  11. 如权利要求10所述的电子设备,其中,所述根据所述第二视频数据判断所述用户所处的用户环境是否发生变化包括:
    对所述第二视频数据进行分帧,得到第一图像序列;
    将所述第一图像序列中相邻两张图像作为一个图像组,得到多个图像组;
    基于图像匹配算法计算所述多个图像组中每个图像组的图像相似度值;
    若所述多个图像组中图像相似度值大于第一阈值的图像组的数量大于第二阈值,则判断所述用户所处的用户环境发生变化。
  12. 如权利要求10所述的电子设备,其中,所述根据所述音频数据判断所述用户所 处的用户环境是否存在噪声包括:
    计算所述音频数据的多个预设声音特征中的每个预设声音特征对应的第一特征值;
    计算预设噪声集中各个噪声数据的所述每个预设声音特征对应的第二特征值;
    分别计算所述第一特征值与所述各个噪声数据的所述每个预设声音特征对应的第二特征值之间的声音相似度值;
    根据所述声音相似度值及预先确定的各个预设声音特征对应的权重参数计算所述音频数据与所述各个噪声数据的噪声相似度值;
    若所述噪声相似度值大于第三阈值,则判断所述用户所处的用户环境存在噪声。
  13. 如权利要求9所述的电子设备,其中,所述对所述第三视频数据进行特征处理,得到所述用户的特征序列包括:
    对所述第三视频数据进行分帧,得到第二图像序列;
    对所述第二图像序列中的每个图像执行尺寸归一化处理,得到第三图像序列;
    对所述第三图像序列中的每个图像执行灰度归一化处理,得到第四图像序列;
    基于特征提取算法对所述第四图像序列中的每个图像执行特征提取,得到所述用户的特征序列。
  14. 如权利要求13所述的电子设备,其中,所述根据所述目标表情类别动态调整所述第一视频数据的播放策略包括:
    将各表情类别分别添加至预先配置的等级数据表中;
    根据所述等级数据表确定所述目标表情类别对应的目标表情等级;
    根据预先确定的表情等级与视频播放策略之间的映射关系确定所述目标表情等级对应的目标视频播放策略,根据所述目标视频播放策略对所述第一视频数据进行播放调整。
  15. 如权利要求9-14任一项所述的电子设备,其中,在根据所述目标表情类别动态调整所述第一视频数据的播放策略之后,所述至少一个处理器还执行以下步骤:
    实时获取所述客户端的屏幕截图,判断所述屏幕截图是否为预设画面;
    若判断所述屏幕截图不是预设画面,则停止播放所述第一视频数据,并发送警示信息至所述客户端。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有线上实时数据交互程序,所述线上实时数据交互程序可被一个或者多个处理器执行,以实现如下步骤:
    响应用户基于客户端发出的数据交互请求,解析所述请求,得到所述请求对应的第一视频数据的标识;
    采集所述用户在预设时间段内的第二视频数据及音频数据,根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求;
    当判断所述用户所处的用户环境满足预设要求时,播放所述第一视频数据,并实时获取所述用户的第三视频数据,对所述第三视频数据执行特征处理,得到所述用户的特征序列;
    将所述特征序列输入表情识别模型,得到所述用户的目标表情类别,根据所述目标表情类别动态调整所述第一视频数据的播放策略。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述根据所述第二视频数据及音频数据判断所述用户所处的用户环境是否满足预设要求包括:
    根据所述第二视频数据判断所述用户所处的用户环境是否发生变化;
    根据所述音频数据判断所述用户所处的用户环境是否存在噪声;
    若判断所述用户环境未发生变化,且所述用户环境无噪声,则判断所述用户所处的用户环境满足预设要求。
  18. 如权利要求17所述的计算机可读存储介质,其中,所述根据所述第二视频数据判断所述用户所处的用户环境是否发生变化包括:
    对所述第二视频数据进行分帧,得到第一图像序列;
    将所述第一图像序列中相邻两张图像作为一个图像组,得到多个图像组;
    基于图像匹配算法计算所述多个图像组中每个图像组的图像相似度值;
    若所述多个图像组中图像相似度值大于第一阈值的图像组的数量大于第二阈值,则判断所述用户所处的用户环境发生变化。
  19. 如权利要求17所述的计算机可读存储介质,其中,所述根据所述音频数据判断所述用户所处的用户环境是否存在噪声包括:
    计算所述音频数据的多个预设声音特征中的每个预设声音特征对应的第一特征值;
    计算预设噪声集中各个噪声数据的所述每个预设声音特征对应的第二特征值;
    分别计算所述第一特征值与所述各个噪声数据的所述每个预设声音特征对应的第二特征值之间的声音相似度值;
    根据所述声音相似度值及预先确定的各个预设声音特征对应的权重参数计算所述音频数据与所述各个噪声数据的噪声相似度值;
    若所述噪声相似度值大于第三阈值,则判断所述用户所处的用户环境存在噪声。
  20. 如权利要求16所述的计算机可读存储介质,其中,所述对所述第三视频数据进行特征处理,得到所述用户的特征序列包括:
    对所述第三视频数据进行分帧,得到第二图像序列;
    对所述第二图像序列中的每个图像执行尺寸归一化处理,得到第三图像序列;
    对所述第三图像序列中的每个图像执行灰度归一化处理,得到第四图像序列;
    基于特征提取算法对所述第四图像序列中的每个图像执行特征提取,得到所述用户的特征序列。
PCT/CN2021/095009 2020-07-19 2021-05-21 线上实时数据交互方法、装置、电子设备及存储介质 WO2022016977A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010695107.8 2020-07-19
CN202010695107.8A CN111738887B (zh) 2020-07-19 2020-07-19 线上实时数据交互方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022016977A1 true WO2022016977A1 (zh) 2022-01-27

Family

ID=72656037

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095009 WO2022016977A1 (zh) 2020-07-19 2021-05-21 线上实时数据交互方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN111738887B (zh)
WO (1) WO2022016977A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036127A (zh) * 2023-09-28 2023-11-10 南京诚勤教育科技有限公司 一种基于教育大数据平台的教育资源共享方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738887B (zh) * 2020-07-19 2020-12-04 平安国际智慧城市科技股份有限公司 线上实时数据交互方法、装置、电子设备及存储介质
CN113099305A (zh) * 2021-04-15 2021-07-09 上海哔哩哔哩科技有限公司 播放控制方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104469239A (zh) * 2014-12-05 2015-03-25 宁波菊风系统软件有限公司 一种智能移动终端的浸入式视频呈现方法
CN107801096A (zh) * 2017-10-30 2018-03-13 广东欧珀移动通信有限公司 视频播放的控制方法、装置、终端设备及存储介质
CN107886950A (zh) * 2017-12-06 2018-04-06 安徽省科普产品工程研究中心有限责任公司 一种基于语音识别的儿童视频教学方法
CN108377422A (zh) * 2018-02-24 2018-08-07 腾讯科技(深圳)有限公司 一种多媒体内容的播放控制方法、装置及存储介质
US20190313014A1 (en) * 2015-06-25 2019-10-10 Amazon Technologies, Inc. User identification based on voice and face
CN111738887A (zh) * 2020-07-19 2020-10-02 平安国际智慧城市科技股份有限公司 线上实时数据交互方法、装置、电子设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103078599B (zh) * 2011-12-16 2016-04-06 深圳Tcl新技术有限公司 一种音视频播放设备及音量控制方法
CN104427083B (zh) * 2013-08-19 2019-06-28 腾讯科技(深圳)有限公司 调节音量的方法和装置
CN104135705B (zh) * 2014-06-24 2018-05-08 惠州Tcl移动通信有限公司 一种根据不同场景模式自动调整多媒体音量的方法及系统
CN106358029B (zh) * 2016-10-18 2019-05-03 北京字节跳动科技有限公司 一种视频图像处理方法和装置
CN106875767B (zh) * 2017-03-10 2019-03-15 重庆智绘点途科技有限公司 在线学习系统及方法
CN107801097A (zh) * 2017-10-31 2018-03-13 上海高顿教育培训有限公司 一种基于用户交互的视频课程播放方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104469239A (zh) * 2014-12-05 2015-03-25 宁波菊风系统软件有限公司 一种智能移动终端的浸入式视频呈现方法
US20190313014A1 (en) * 2015-06-25 2019-10-10 Amazon Technologies, Inc. User identification based on voice and face
CN107801096A (zh) * 2017-10-30 2018-03-13 广东欧珀移动通信有限公司 视频播放的控制方法、装置、终端设备及存储介质
CN107886950A (zh) * 2017-12-06 2018-04-06 安徽省科普产品工程研究中心有限责任公司 一种基于语音识别的儿童视频教学方法
CN108377422A (zh) * 2018-02-24 2018-08-07 腾讯科技(深圳)有限公司 一种多媒体内容的播放控制方法、装置及存储介质
CN111738887A (zh) * 2020-07-19 2020-10-02 平安国际智慧城市科技股份有限公司 线上实时数据交互方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036127A (zh) * 2023-09-28 2023-11-10 南京诚勤教育科技有限公司 一种基于教育大数据平台的教育资源共享方法
CN117036127B (zh) * 2023-09-28 2023-12-15 南京诚勤教育科技有限公司 一种基于教育大数据平台的教育资源共享方法

Also Published As

Publication number Publication date
CN111738887B (zh) 2020-12-04
CN111738887A (zh) 2020-10-02

Similar Documents

Publication Publication Date Title
WO2022016977A1 (zh) 线上实时数据交互方法、装置、电子设备及存储介质
CN111062871B (zh) 一种图像处理方法、装置、计算机设备及可读存储介质
WO2019105432A1 (zh) 文本推荐方法、装置及电子设备
WO2022105115A1 (zh) 问答对匹配方法、装置、电子设备及存储介质
US20210209402A1 (en) Weapon detection and tracking
CN112287126B (zh) 一种适于多模态知识图谱的实体对齐方法及设备
US20150178321A1 (en) Image-based 3d model search and retrieval
CN106973244A (zh) 使用弱监督为图像配字幕
US11461298B1 (en) Scoring parameter generation for identity resolution
WO2022100337A1 (zh) 人脸图像质量评估方法、装置、计算机设备及存储介质
WO2021151313A1 (zh) 证件鉴伪方法、装置、电子设备及存储介质
WO2022105496A1 (zh) 智能回访方法、装置、电子设备及可读存储介质
CN112183091A (zh) 问答对生成方法、装置、电子设备及可读存储介质
CN111860377A (zh) 基于人工智能的直播方法、装置、电子设备及存储介质
US20200394448A1 (en) Methods for more effectively moderating one or more images and devices thereof
CN110855648A (zh) 一种网络攻击的预警控制方法及装置
WO2022160442A1 (zh) 答案生成方法、装置、电子设备及可读存储介质
CN111444387A (zh) 视频分类方法、装置、计算机设备和存储介质
CN116662488A (zh) 业务文档检索方法、装置、设备及存储介质
US9922029B1 (en) User feedback for low-confidence translations
CN113268597B (zh) 文本分类方法、装置、设备及存储介质
WO2021189908A1 (zh) 基于深度学习的图像分类方法、装置、服务器及介质
CN116635911A (zh) 动作识别方法及相关装置,存储介质
CN113348514B (zh) 预测化学结构性质的方法和系统
CN110210572B (zh) 图像分类方法、装置、存储介质及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21847070

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21847070

Country of ref document: EP

Kind code of ref document: A1