CN115132027A

CN115132027A - Intelligent programming learning system and method based on multi-mode deep learning

Info

Publication number: CN115132027A
Application number: CN202210759154.3A
Authority: CN
Inventors: 马平川; 付婉莹; 徐方勤; 钱志伟; 周萍; 栾世杰; 王琳; 徐一鸣; 张达意; 张晨旭; 肖旸
Original assignee: Shanghai Publishing and Printing College
Current assignee: Shanghai Publishing and Printing College
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-09-30

Abstract

The application discloses a smart programming learning system based on multi-mode deep learning, a smart programming learning method and a smart classroom, under a smart learning environment, in the learning process, learning information of a target programming learner is obtained, learning evaluation of the target programming learner is obtained according to the learning information, multi-mode characteristics including facial expression characteristics and learning behavior characteristics of the programming learner are obtained, the learning state of the target programming learner is obtained according to the multi-mode characteristics, finally, a teaching evaluation result is obtained according to the learning state obtained through the learning evaluation, and a corresponding programming learning strategy is determined. Compared with the prior art, the method and the device have the advantages that a multi-mode evaluation mode is adopted, the real learning state of the learner is obtained according to the facial expression and the learning behavior of the programmed learner, an accurate and complete programming teaching evaluation system is established by combining the learning evaluation and the learning state, a targeted programming learning strategy is formulated, programming personalized guidance is realized, accurate teaching of a teacher is promoted, and the programming capability of the programmed learner is practically improved.

Description

Intelligent programming learning system and method based on multi-mode deep learning

Technical Field

The application relates to the field of programming teaching, in particular to an intelligent programming learning method based on multi-mode deep learning, an intelligent programming learning system, computer equipment, a storage medium and an intelligent classroom.

Background

With the deep advance of information-based teaching, concepts and means such as intelligent classrooms, intelligent classrooms and information-based teaching means are continuously popularized.

The intelligent classroom is the materialization of a typical intelligent learning environment, is the high-end form of multimedia and network classroom, is a novel classroom established by means of internet of things, cloud computing technology, intelligent technology and the like, comprises tangible physical space and intangible digital space, and is a novel classroom which is used for assisting teaching content presentation, facilitating learning resource acquisition and promoting classroom interaction development through various intelligent devices and realizes the functions of context awareness and environment management. The intelligent classroom aims to provide a humanized and intelligent interaction space for teaching activities; through the combination of the physical space and the digital space and the combination of the local space and the remote space, the relation between people and the learning environment is improved, the natural interaction between people and the environment is realized in the learning space, the classroom becomes simple, efficient and intelligent, and the development of the independent thinking and learning capacity of students is facilitated.

In view of the characteristics of the smart classroom, it has been widely applied to practice of multiple types of teaching, such as subject-class (e.g., mathematics, English, etc.) course learning, operation-class learning (e.g., Legao), programming-class learning, and the like. The programming learning may include not only conventional programming language learning, but also robot programming and other learning.

Taking programming learning as an example, in the whole programming teaching process, the interaction, interaction and communication between students, students and teachers, and between students and intelligent equipment need to be concerned, and in the process, learning information such as learning states, learning abilities and learning effect evaluation of students are important contents.

In the related art, in some cases, the programmed learning ability of the learner is determined through learning achievement and the like, for example, the programmed learning ability of the learner is determined to be general when the learning time is long and/or the learning achievement is not good enough; short learning time and/or excellent learning performance, and strong ability of judging learner's programming learning. In some cases, the learning state of the student is judged through the performance of the learner in the classroom, but the performance of the learner in the classroom is often judged manually by a teacher or an evaluator who checks the video content in the classroom by himself, which is time-consuming and labor-consuming. Generally speaking, reference information of teaching evaluation is single, learning conditions of a programmed learner cannot be accurately reflected, the information acquisition mode is simple, manual intervention is still needed in many times, efficiency is low, possibility of judgment errors exists, and improvement of learning effects is not facilitated.

Disclosure of Invention

In view of the above disadvantages of the related art, the present application aims to disclose a smart programming learning method and system based on multi-modal deep learning, a storage medium, and a smart classroom, which are used to solve the problems in the related art that the teaching evaluation method mostly adopts a single mode, which causes deviation of the teaching evaluation result, and the learning status and the teaching effect of the programming learner cannot be accurately reflected.

To achieve the above and other related objects, the present application discloses an intelligent programming learning method based on multi-modal deep learning, comprising the following steps: acquiring learning information of a target programming learner in programming learning, and processing and analyzing the learning information to obtain learning evaluation of the target programming learner; acquiring image information containing a target programming learner, processing and analyzing the image information to obtain learning state characteristic data representing facial expressions and learning behaviors of the target programming learner; analyzing the learning state characteristic data according to a preset multi-mode learning state model to obtain the learning state of the target programming learner; and evaluating the learning evaluation and learning state of the target programming learner according to a preset teaching characteristic evaluation model to obtain a teaching evaluation result related to the target programming learner and determine a corresponding programming learning strategy.

Optionally, the method for obtaining the learning information of the target programming learner in the programming learning, and processing and analyzing the learning information to obtain the learning evaluation of the target programming learner comprises the following steps: acquiring learning information of a target programming learner by means of screen recording or screen capturing, wherein the learning information comprises: the operation content and one or more of operation time related to the operation content, a programming response area corresponding to the operation and an operation track, programming learning content and programming learning content source and programming learning time related to the programming learning content; the operation content comprises writing code and debugging code of a target programming learner in a development environment; and processing and analyzing the learning information to obtain the mastery condition of the target programming learner on the programming specific knowledge point, the programming learning achievement and the reference index related to the programming learning achievement.

Optionally, acquiring image information including the target programmed learner, processing and analyzing the image information to obtain learning state feature data representing facial expressions and learning behaviors of the target programmed learner, comprising the following steps: the method comprises the steps that a first camera device arranged at a position close to a programmed learning terminal is used for shooting a close-range image of a target programmed learner corresponding to the programmed learning terminal during learning; carrying out face recognition processing on the shot close-range image through a face recognition model to determine a face image of the target programming learner; performing expression recognition processing on the facial image of the target programming learner through an expression recognition model to obtain learning state feature data of the facial expression corresponding to the target programming learner; and performing learning behavior identification processing on the close-range image shot by the first camera device through a learning behavior model to obtain learning state characteristic data of the learning behavior corresponding to the target programming learner.

Optionally, acquiring image information including the target programmed learner, processing and analyzing the image information to obtain learning state feature data representing facial expressions and learning behaviors of the target programmed learner, comprising the following steps: the method comprises the steps that a first camera device arranged at a position close to a programmed learning terminal is used for shooting a close-range image of a target programmed learner corresponding to the programmed learning terminal during learning; carrying out face recognition processing on the shot close-range image through a face recognition model to determine a face image of the target programming learner; performing expression recognition processing on the facial image of the target programming learner through an expression recognition model to obtain learning state feature data of a facial expression corresponding to the target programming learner; shooting a medium and distant view image of the target programming learner during learning through a second camera device which is arranged at a position far away from the programming learning terminal; carrying out face recognition processing on the shot medium and long-range view image through a face recognition model or carrying out programming learning terminal positioning on the shot medium and long-range view image to determine a target programming learner; performing learning behavior recognition processing on the shot medium and long-range view image through a learning behavior model to obtain learning state characteristic data of a learning behavior corresponding to the target programming learner; wherein the learning behaviors corresponding to the target programming learner include the behaviors of the target programming learner and the interactive behaviors of the target programming learner and other related programming learners.

Optionally, the expression recognition processing is performed on the facial image of the target programming learner through an expression recognition model, and the method includes the following steps: extracting facial key point characteristics of the target programming learner from the facial image of the target programming learner; mapping the facial key point features into facial expression features; and identifying and matching facial expression characteristics through an expression identification model to obtain the facial expression of the target programming learner.

The application further discloses wisdom programming learning system based on multimode deep learning, includes: the learning information acquisition module is used for acquiring learning information of a target programming learner in programming learning; the learning evaluation module is used for processing and analyzing the learning information to obtain the learning evaluation of the target programming learner; the image acquisition module is used for acquiring image information containing a target programming learner; the facial expression recognition module is used for processing and analyzing the image information to obtain learning state feature data representing the facial expression of the target programming learner; the learning behavior identification module is used for processing and analyzing the image information to obtain learning state characteristic data representing the learning behavior of the target programming learner; the learning state evaluation module is used for analyzing the learning state feature data representing the facial expression and the learning behavior of the target programming learner to obtain the learning state of the target programming learner; the teaching evaluation module is used for evaluating the learning evaluation and learning state of the target programming learner to obtain a teaching evaluation result related to the target programming learner; and the learning strategy module is used for determining a corresponding programming learning strategy according to the obtained teaching evaluation result related to the target programming learner.

The application further discloses a computer device, including: a memory for storing at least one program; and the processor is connected with the memory and is used for executing and realizing the intelligent programming learning method based on the multi-mode deep learning when the at least one program is run.

The present application further discloses a non-transitory storage medium of a computer device storing at least one program which, when executed by a processor, executes and implements the intelligent programmed learning method based on multimodal deep learning as described above.

The application still further discloses a wisdom classroom of programming study, include: the classroom master equipment is connected with the intelligent programming teaching platform through a network; the programming learning terminals are connected with the classroom master device through a network; the programming learning terminal is configured with programming learning software; the first camera devices are respectively configured on the corresponding programming learning terminals or respectively configured beside the corresponding programming learning terminals; the first camera device is used for shooting a close-range image of a target programming learner corresponding to the programming learning terminal during learning; one or more second camera devices, configured in a position away from the programmed learning terminal in the intelligent classroom, for capturing intermediate and distant view images of the target programmed learner or a plurality of programmed learners including the target programmed learner during learning.

The application also discloses wisdom programming learning system includes: an intelligent programming teaching platform; the intelligent classroom is connected with the intelligent programming teaching platform through a network.

The intelligent programming learning method and system based on multi-mode deep learning, storage medium and intelligent classroom disclosed by the application not only obtain the learning evaluation of a target programming learner by obtaining the learning information of the target programming learner in programming learning, but also obtain the learning state characteristic data representing the facial expression and the learning behavior of the target programming learner by obtaining the image information containing the target programming learner, and obtain the learning state of the target programming learner based on the learning state characteristic data representing the facial expression and the learning behavior of the target programming learner, thereby obtaining the teaching evaluation related to the target programming learner according to the learning evaluation and the learning state of the target programming learner, being capable of more accurately reflecting the learning state and the teaching effect of the programming learner in programming learning, and making a targeted programming learning strategy for the programming learner, the method realizes the programming personalized guidance, promotes accurate teaching of teachers, and practically improves the programming ability of programming learners.

Drawings

The specific features of the invention to which this application relates are set forth in the appended claims. The features and advantages of the invention to which this application relates will be better understood by reference to the exemplary embodiments described in detail below and the accompanying drawings. Brief description of the drawingsthe following:

FIG. 1 is a schematic diagram of an intelligent programming learning system according to an embodiment of the present disclosure. .

Fig. 2 is a block diagram of an intelligent classroom disclosed herein in one embodiment.

Fig. 3 is a flowchart illustrating an intelligent programming learning method based on multi-modal deep learning according to an embodiment of the present invention.

FIG. 4 is a simplified structural entity diagram of the intelligent programming learning system based on multi-modal deep learning according to an embodiment of the present invention.

FIG. 5 is a simplified block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present application is provided for illustrative purposes, and other advantages and capabilities of the present application will become apparent to those skilled in the art from the present disclosure.

In the following description, reference is made to the accompanying drawings that describe several embodiments of the application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present disclosure. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

Also, as used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

With the development of network technology and intelligent equipment and the deep advance of informatization teaching, intelligent teaching taking an intelligent classroom as a core is greatly popularized and applied. The intelligent teaching is an intelligent, digital and personalized modern education system which is established on the basis of emerging technologies such as Internet of things, cloud computing, big data processing and wireless broadband networks and on the basis of intelligent equipment and Internet technologies. For example, in the case of programming learning, many programming learners can intelligently and personally learn programming in an intelligent classroom or in a small group, so that the classroom becomes simple, efficient and intelligent, and the development of the ability of the programming learners to think and learn autonomously is facilitated.

However, in the intelligent teaching, how to accurately and effectively evaluate the learning effect and the teaching effect and quality is a challenge. In the prior art, either the teaching evaluation mode is single, for example, evaluation is performed only by learning time and learning results, learning operation process or learning state is not emphasized, or the information acquisition mode in the teaching evaluation is simple and low in efficiency, for example, the learning state of a student is artificially judged by checking video content in a classroom in a manual mode.

In view of the above, the present application discloses a method and a system for intelligent programming learning based on multi-modal deep learning, a storage medium, and an intelligent classroom, which are used to solve the problems in the related art that the teaching evaluation result is biased due to the limitation that a single mode is adopted in a teaching evaluation manner, the learning status and the teaching effect of a programming learner cannot be accurately reflected, and the programming learning of the programming learner is effectively improved.

The intelligent programming learning method based on the multi-mode deep learning is applied to intelligent teaching, for example, programming learning, the intelligent teaching can be intelligent programming learning, and a place where the intelligent programming teaching is applied can be a special intelligent classroom, but not limited to this, and the method can also be applied to a family room of an individual.

In the following description, we will use an intelligent classroom as an example for explanation.

Please refer to fig. 1, which is a schematic diagram of an intelligent programming learning system according to an embodiment of the present disclosure. As shown in fig. 1, the intelligent programming learning system disclosed in the present application may include: the intelligent programming teaching platform 2 and at least one intelligent classroom 1, wherein the at least one intelligent classroom 1 can establish a communication connection with the intelligent programming teaching platform 2 through a network, and the network can include but is not limited to a wired communication network (e.g., a fiber optic communication network, etc.), a wireless communication network (e.g., a Wi-Fi network, etc.), or a mobile communication network (e.g., a 4G/5G mobile network, etc.). The intelligent classroom 1 is configured with at least a programmed learning terminal for a programmed learner to learn, which may include components such as memory, memory controllers, one or more processing units (CPU), peripheral interfaces, RF circuitry, audio circuitry, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports, which communicate via one or more communication buses or signal lines. The programming learning terminal includes, but is not limited to, a desktop computer (e.g., PC computer, Mac computer, etc.), a laptop computer, a tablet computer, or other smart devices (e.g., smart phone, combination of smart phone and display, etc.).

The intelligent programming teaching platform 2 is used as a management center of the intelligent programming learning system, for example, the intelligent programming teaching platform 2 can be used to uniformly manage the programming learning terminals in each intelligent classroom 1.

In some embodiments, the intelligent programming teaching platform 2 may be configured in a cloud and include a server, and the server may be disposed on one or more physical servers according to various factors such as functions, loads, and the like. When distributed in a plurality of entity servers, the server may be composed of servers based on a cloud architecture. For example, a Cloud-based server includes a Public Cloud (Public Cloud) server and a Private Cloud (Private Cloud) server, wherein the Public or Private Cloud server includes Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS), Infrastructure as a Service (IaaS), and Infrastructure as a Service (IaaS). The private cloud service end is used for example for a Mei Tuo cloud computing service platform, an Array cloud computing service platform, an Amazon cloud computing service platform, a Baidu cloud computing platform, a Tencent cloud computing platform and the like. The server may also be formed by a distributed or centralized cluster of servers. For example, the server cluster is composed of at least one entity server. Each entity server is provided with a plurality of virtual servers, each virtual server runs at least one functional module in the intelligent programming learning system based on the multi-mode deep learning, and the virtual servers are communicated with each other through a network.

In some embodiments, the intelligent programming teaching platform 2 further comprises a plurality of function modules, for example, the intelligent programming teaching platform 2 may comprise a database, an authentication management module, and a learning management module.

The database is used for storing programming course data, information related to programming learners (such as student identity information, programming courses and learning time learned, class/group in which the learner is located, programming learning operation content, programming learning achievement, programming skill and the like), and information related to programming lecturers (such as lecturer identity information, teaching courses and teaching time, class/group taught, programming learning modification content, teaching ability and the like).

The authentication management module can be used for authenticating a programming learner and a programming lecturer, for example, authenticating the programming learner, and the authentication management module can authenticate the student data (such as student identity information, the learned programming course and the like) of the programming learner, the content of the completed programming operation and the like.

The learning management module can be used for managing various aspects involved in programming learning, such as managing various basic information of a programming learner and a programming instructor, recording the performance of the programming learner in the programming learning (such as operation content, learning state including facial expressions and learning behaviors, learning achievement and the like), recording the performance of the programming instructor in the programming explanation, managing programming courses (such as adjusting the programming content according to the performance of the programming learner and the like), and storing the recording content and the management result in a database.

The intelligent classroom 1 is a novel classroom constructed by means of internet of things technology, cloud computing technology, intelligent technology and the like, and comprises a tangible physical space and an intangible digital space, and is generally configured with intelligent equipment related to intelligent learning, for example, in the case of programming learning, the intelligent classroom 1 can be configured with a programming learning terminal 11, and the programming learning terminal 11 is configured with programming learning software, so that a programming learner can perform intelligent programming learning in the intelligent classroom 1 through the programming learning terminal 11.

It can be seen that the intelligent classroom 1 can not only connect people in different regions, overcome the regional limitation and teaching level difference, receive the intelligent programming learning content at the same time or under the same condition, but also realize the interactive learning, including the interaction and communication between the programming learners and the programming lecturer. Taking the interaction and communication among the programmed learners as an example, the interaction and communication among the programmed learner individuals and the programmed learning groups in the same intelligent classroom, and the interaction and communication among the programmed learner individuals and the programmed learning groups in different intelligent classrooms, greatly improves the learning efficiency of the programmed learners, and has particularly outstanding effect particularly in the field of programmed learning.

Referring to fig. 2, a block diagram of an embodiment of a smart classroom is shown. As shown in fig. 2, the smart classroom disclosed in the present application may include: a classroom master device 10, a programming learning terminal 11, a first imaging device 12, and a second imaging device 13.

The classroom master device 10 is connected with the intelligent programming teaching platform through a network. The classroom master device 10 can be used as a center of a smart classroom for managing each programming learning terminal in the smart classroom and ensuring information transmission between each programming learning terminal and the smart programming teaching platform. In some embodiments, the classroom master device 10 can be, for example, a host device or a server. In some embodiments, the classroom master device 10 can be, for example, a routing device.

The programming learning terminal 11 is network-connected to the classroom master 10. The number of the programming learning terminals 11 is not limited, and may be adjusted according to the space size of the intelligent classroom, the number of the programmed learners, or the like. In some embodiments, a plurality of programming learning terminals 11 may each be networked with the classroom master device 10. In some embodiments, a plurality of the programmed learning terminals 11 can also be combined into a group and connected with the classroom master 10 through a network, so that each programmed learner in the group can be used as a programmed learning group.

Programming learning terminal 11 is configured with programming learning software. In certain embodiments, programming learning terminal 11 may include components such as memory, memory controllers, one or more processing units (CPU), peripheral interfaces, RF circuits, audio circuits, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports, which communicate via one or more communication buses or signal lines. In practical applications, the programming learning terminal includes, but is not limited to, a desktop computer (e.g., PC computer, Mac computer, etc.), a notebook computer, a tablet computer, or other smart devices (e.g., smart phone, combination of smart phone and display, etc.).

The first camera device 12 is used for taking a close-up image of the programmed learner. In some embodiments, the number of the first camera devices 12 may be the same as the number of the programmed learning terminals 11, that is, each programmed learning terminal 11 is provided with a corresponding first camera device 12.

In some embodiments, the first camera device 12 is disposed on the corresponding programming learning terminal 11. In some embodiments, the first camera device 12 is self-contained by the programming learning terminal 11, for example, the first camera device 12 may be a front camera of a laptop, tablet, or smartphone, or the first camera device 12 may be a camera on a display of a desktop computer. In some embodiments, the first camera device 12 is a stand-alone device or a separate device that is attached to the programming learning terminal 11 by a certain connection (e.g., clip-on, adhesive, tool lock, etc.).

In some embodiments, the first camera device 12 is disposed beside the corresponding programming learning terminal 11. For example, the first camera device 12 is a stand-alone device or a device that can be sold separately and is disposed beside a desk or on a fence or the like for placing the programming and learning terminal 11 by a certain connection method (for example, a clamp clamping method, an adhesion method, a tool locking method, etc.).

The second camera device 13 is used for taking a medium and long view image of the programmed learner. The number of the second cameras 13 may be one or more, and the specific number may be adjusted according to the space size of the smart classroom or the number of programmed learners, for example, if the space of the smart classroom is large or the number of programmed learners is large, the number of the second cameras 13 may be increased, and if the space of the smart classroom is small or the number of programmed learners is small, the number of the second cameras 13 may be decreased, or even one of them may be used. In some embodiments, when several programmed learning terminals are constructed in a combined manner in a smart classroom and finally form a plurality of groups, the number of the second cameras 13 may be consistent with the number of the groups, so that each second camera 13 can capture the perspective images of the members (i.e., programmed learners) in the corresponding group.

To be able to take the programmed learner's intermediate vision image, generally, the second camera 13 is disposed away from the single programmed learning terminal 11. In some embodiments, the second camera 13 may be disposed on a wall of the smart classroom, for example, at each upper corner of the smart classroom. In some embodiments, the second camera device 13 may be disposed on a stand or column having a height to obtain a desired field of view.

In addition, the second camera device 13 can be adjustable. For example, the second camera 13 may be disposed through a pan/tilt head with which the angle of the field of view of the second camera 13 is adjustable, including but not limited to a horizontal rotation pan/tilt head that can rotate left and right or an omni-directional pan/tilt head that can rotate both left and right and up and down. For another example, the second camera device 13 is disposed on a support or a column, the second camera device 13 can move up and down along the support or the column, and the support or the column can move in a horizontal plane through a rail or a pulley.

Please refer to fig. 3, which is a flowchart illustrating an intelligent programming learning method based on multi-modal deep learning according to an embodiment of the present invention. As shown in fig. 3, the intelligent programming learning method based on multi-modal deep learning in the embodiment includes the following steps:

and step S31, acquiring the learning information of the target programming learner in the programming learning, processing and analyzing the learning information, and obtaining the learning evaluation of the target programming learner.

Step S33, acquiring image information including the target programmed learner, processing and analyzing the image information, and obtaining learning state characteristic data representing the facial expression and learning behavior of the target programmed learner.

And step S35, analyzing the learning state characteristic data according to a preset multi-mode learning state model to obtain the learning state of the target programming learner.

And step S37, evaluating the learning evaluation and learning state of the target programming learner according to a preset teaching characteristic evaluation model to obtain a teaching evaluation result related to the target programming learner, and determining a corresponding programming learning strategy.

In the embodiment, the target programming learner performs the programming learning through the programming learning terminal 11 in the intelligent classroom. The targeted programming learner enters information based on the requirements of the programming study (e.g., programming practice or programming exam questions). For example, in some embodiments, for a programming problem or a programming question of a common programming class, when a target programming learner answers the question, the content can be input through a combination of a keyboard and a mouse or a combination of a keyboard and a screen touch. In some embodiments, programming problems or programming questions for a new programming class (e.g., Python, scratch, Delphi, etc.) may be answered by "dragging," where the "dragging" implementation includes, but is not limited to, any of mouse, screen touch, and keyboard implementations.

In step S31, the learning information of the target learner in the program learning is uploaded to the intelligent programming teaching platform by the programming learning terminal 11 corresponding to the target learner in the intelligent classroom, and the intelligent programming teaching platform performs corresponding processing and analysis to obtain the learning evaluation.

Specifically, the obtaining of the learning information of the target learner in the programming learning, the processing and the analysis of the learning information, and the obtaining of the learning evaluation of the target learner may further include the following steps:

and step 311, acquiring learning information of the target programming learner in a screen recording or screen capturing mode.

The learning information may include operational content including writing code and debugging code of a target programming learner in a development environment.

Regarding the screen recording manner, taking the programming problem or the programming examination problem as an example, in some embodiments, the screen recording manner may employ a method including: taking pages of the programming exercises or the programming examination questions as recording objects; when detecting that a target programming learner starts inputting contents on the programming problem or the programming test question to answer, recording the display contents of the page of the programming problem or the programming test question of the recording object and related information (such as an audio source or a video source and the like) to form a screen recording file. In some embodiments, the method adopted by the screen recording mode may include: taking pages of the programming exercises or the programming examination questions as recording objects; providing a screen recording button when detecting that a page of the programming problem or the programming test question is opened or detecting that a target programming learner inputs contents on the programming problem or the programming test question to answer the question; and then, when the screen recording button is detected to be triggered, recording the display content of the page of the programming problem or the programming test question of the recording object and related information (such as an audio source or a video source and the like) of the page to form a screen recording file.

Regarding the screen capture mode, taking the programming problem or the programming examination problem as an example, in some embodiments, the screen capture mode may employ a method including: taking a page of the programming exercise or the programming examination as a screen capture object; and when detecting that the target programming learner inputs new contents on the programming problem or the programming question, screen-capturing the display contents of the page of the screen-capturing object or the programming question to form a screen-capturing image, so that the input contents of each time of the target programming learner on the programming problem or the programming question can be screen-captured to form a screen-capturing image until the target programming learner finishes the programming problem or the programming question, and finally, taking a plurality of screen-capturing images as screen-capturing files for the screen-capturing object. In some embodiments, the screen capture mode may employ a method comprising: taking pages of the programming exercises or the programming examination questions as screen capture objects; the method comprises the steps of capturing the screen of the display content of the page of the programming problem or the programming question of a screen capture object to form a screen capture image when detecting that a target programming learner starts to input content on the programming problem or the programming question to answer the question, capturing the screen of the display content of the page of the programming problem or the programming question of the screen capture object to form a screen capture image at preset time intervals, until the target programming learner finishes the programming problem or the programming question, and finally taking a plurality of screen capture images as screen capture files for the screen capture object.

In some embodiments, the screen capture mode may employ a method comprising: taking pages of the programming exercises or the programming examination questions as screen capture objects; providing a screen capture button when detecting that a page of the programming problem or the programming test question is opened or detecting that a target programming learner inputs contents on the programming problem or the programming test question to answer the question; then, when the screen capture button is detected to be triggered, screen capture is carried out on the display content of the page of the programming problem or the programming examination question of the screen capture object so as to form a screen capture image, then, each time the screen capture button is detected to be triggered, screen capture is carried out on the display content of the page of the programming problem or the programming examination question of the screen capture object so as to form a screen capture image until the target programming learner finishes the programming problem or the programming examination question, and finally, a plurality of screen capture images are used as screen capture files for the screen capture object.

Learning information of the target programming learner can be acquired through screen recording or screen capturing, and in some embodiments, the learning information includes, but is not limited to, operation contents of the target programming learner on a page of the programming problem or the programming examination problem and one or more of operation time, a programming response area corresponding to the operation and an operation track related to the operation contents. Wherein the operation content includes not only the content remained on the page of the programming problem or the programming question in the final saved version, but also the content input by the target programming learner on the page of the programming problem or the programming question in the answering process and deleted or modified for replacement. For the programming problem or programming question of the "drag" class (for example, Python, scratch, Delphi, etc.), all operation tracks involved in the answering process are recorded, including the effective operation track and the operation track with deleted or modified middle.

Further, the learning information may include programmed learning content.

Regarding the screen recording manner, in some embodiments, the screen recording manner may employ a method including: taking a page of the programmed learning content as a recording object; when it is detected that a target programming learner opens programming learning content stored in a local or cloud terminal or retrieves and opens a webpage of the programming learning content through a network, recording display content of the webpage of the programming learning content of the recording object and related information (such as a sound source or a video source) of the webpage to form a screen recording file. In some embodiments, the method adopted by the screen recording mode may include: taking a page of the programmed learning content as a recording object; providing a screen recording button when it is detected that a page of the programmed learning content is opened or it is detected that a target programmed learner inputs content on the page of the programmed learning content to take notes; and then, when the screen recording button is detected to be triggered, recording the display content of the page of the programmed learning content of the recording object and related information (such as an audio source or a video source) of the page to form a screen recording file.

Regarding the screen capture mode, in some embodiments, the screen capture mode may employ a method including: taking a page of the programming learning content as a screen capture object; and each time the target programming learner is detected to refresh or scroll the page of the programming learning content or input new content on the page of the programming learning content, the display content of the page of the programming learning content of the screen capture object is screen captured to form a screen capture image. In some embodiments, the screen capture mode may include: taking a page of the programming learning content as a screen capture object; and each time the target programming learner is detected to refresh or scroll the page of the programming learning content or input new content on the page of the programming learning content, the display content of the page of the programming learning content of the screen capture object is screen captured to form a screen capture image, and thereafter, the display content of the page of the programming learning content of the screen capture object is screen captured to form a screen capture image at intervals of a preset time.

In some embodiments, the screen capture mode may employ a method comprising: taking a page of the programming learning content as a screen capture object; providing a screen capture button when it is detected that the target programmed learner refreshes or scrolls the page of the programmed learning content or inputs new content on the page of the programmed learning content; then, when the screen capture button is detected to be triggered, the display content of the page of the programming learning content of the screen capture object is captured to form a screen capture image, and then, every time the screen capture button is detected to be triggered, the display content of the page of the programming learning content of the screen capture object is captured to form a screen capture image.

Learning information of the target programming learner can be acquired through screen recording or screen capturing, and in some embodiments, the learning information includes, but is not limited to, programmed learning content used by the target programming learner for learning, and a programmed learning content source and a programmed learning time related to the programmed learning content.

Step S312, the learning information is processed and analyzed to obtain the mastering condition of the target programming learner on the programming specific knowledge point, the programming learning achievement and the reference index related to the programming learning achievement.

Taking the programmed learning achievement as an example, in some embodiments, the processing and analyzing the learning information to obtain the programmed learning achievement of the target programmed learner may include the following steps:

firstly, obtaining learning information of a target programming learner on a learning interface in a screen recording or screen capturing mode, namely obtaining an image of operation contents of the target programming learner on a page of a programming problem or a programming examination problem.

Next, a programming answer region for each topic is determined in the image of the programming problem or programming question. In some embodiments, the programming answering area for each topic determined in the image of the programming problem or programming exam problem can be determined based on the identification of the topic (e.g., the topic number of the topic, the topic stem keyword of the topic, etc.). In some embodiments, the programming response region for each topic determined in the image of the programming problem or programming question may be determined according to characteristic information (e.g., size of the programming response region, etc.) of the programming response region for the topic. In some embodiments, the programming response region for each topic in the image of the programming problem or programming question can be determined according to other preset identifications (e.g., the beginning identification of the topic and the ending identification of the topic, etc.).

And then, carrying out image recognition on the image of the programming answering area of each programming problem or each programming examination question to obtain the image characteristics of each question, and processing the image characteristics to obtain the answering content of each programming problem or each programming examination question. In some embodiments, the step of performing image recognition on the image of the programming answer area of each programming problem or programming question to obtain the image features of each question, and processing the image features to obtain the answer content of each programming problem or programming question can be realized by a deep learning network model, wherein the deep learning network model is obtained by training the image containing various types of programming problems or programming questions. In some embodiments, the step of processing the image features to obtain the answer content of each programming problem or programming problem can be realized by a convolutional neural network model or the like, wherein the convolutional neural network model or the like is obtained by training the image containing various types of programming problems or programming problems.

Finally, the answering content of the programming exercises or the programming examination questions is compared with the standard answers of the corresponding programming exercises or the corresponding programming examination questions, and corresponding programming learning scores are obtained according to the comparison result. In some embodiments, the step of obtaining the corresponding programming learning achievement according to the comparison result may also be implemented by a deep learning network model or a convolutional neural network model, where the deep learning network model or the convolutional neural network model is obtained by training an image including various types of programming problems or programming problems according to the comparison analysis of the answering content of the programming problems or the programming problems and the standard answers of the corresponding programming problems or programming problems.

In addition, in step S312, processing and analyzing the learning information to obtain a reference index related to the programmed learning achievement is further included. In some embodiments, the reference indicators include, but are not limited to, answering time, number of times modification is deleted, answering order, and the like.

The mastery of the target programmed learner for programming a particular knowledge point may be obtained through programmed learning content in addition to the programmed learning achievement of the target programmed learner. For example, the mastery of a programmed specific knowledge point by a target programmed learner, such as the learning habits of the programmed learner, the learning subjective initiative, the learning preferences for certain programmed knowledge points, or the learning course throughout the programmed knowledge learning, is obtained by learning a specific chapter of the programmed learning material that the target programmed learner is currently viewing for learning or the target programmed learner retrieving the specific programmed knowledge point over a network.

And processing and analyzing the learning information to obtain a reference index related to the programmed learning content. In certain embodiments, the reference indicators include, but are not limited to, programming learning content sources, programming learning times, and the like.

Regarding step S33, the first camera device or the combination of the first camera device and the second camera device in the intelligent classroom uploads the image information of the target programmed learner in the programmed learning process to the intelligent programmed teaching platform through the classroom master device or directly, and the intelligent programmed teaching platform performs corresponding processing and analysis to obtain learning state feature data representing the facial expression and learning behavior of the target programmed learner.

Specifically, the acquiring of the image information including the target programmed learner, the processing and the analyzing of the image information, and the obtaining of the learning state feature data representing the facial expression and the learning behavior of the target programmed learner may further include the steps of: acquiring image information including a target programming learner, processing and analyzing the image information to obtain learning state feature data representing the facial expression of the target programming learner; and acquiring image information including the target programming learner, processing and analyzing the image information to obtain learning state characteristic data representing the learning behavior of the target programming learner.

The method comprises the following steps of obtaining image information containing a target programming learner, processing and analyzing the image information to obtain learning state characteristic data representing the facial expression of the target programming learner, wherein the method comprises the following steps:

and step 331, shooting a close-range image of the target programming learner corresponding to the programming learning terminal during learning through a first camera device.

As mentioned above, the first camera device is disposed near the learner-driven terminal, for example, on the learner-driven terminal or beside the learner-driven terminal, and is used for capturing a close-up image of the target learner-driven terminal, wherein the close-up image may include the face of the target learner-driven terminal.

And step S333, carrying out face recognition processing on the shot close-range image through the face recognition model, and determining the face image of the target programming learner.

Step S335, performing expression recognition processing on the facial image of the target programming learner through the expression recognition model to obtain learning state feature data of the facial expression corresponding to the target programming learner.

Generally, selecting images containing the face of a target programmed learner from acquired close-up images of the programmed learner captured by a first camera device; then, carrying out face recognition processing on the shot close-range image to determine a face image of the target programming learner; and then, performing expression recognition processing on the facial image of the target programming learner through an expression recognition model to obtain learning state feature data of the facial expression corresponding to the target programming learner.

In some embodiments, during processing, the facial image of the target programming learner may be recognized, and corresponding facial features and facial expression features may be extracted, where the extracted facial features may be used for face recognition, and the extracted facial expression features may be used for expression recognition. Here, the facial features may be the overall features of the face, and the facial expressive features may be the features of prominent regions of the face (also referred to as facial key-point features), wherein the facial key-point features include, but are not limited to, mouth, eyes, nose, eyebrows, etc.

The face recognition processing of the shot close-range image can be executed through a face recognition model, the face recognition model can be realized through a deep learning network model, and the deep learning network model is obtained through training of various face images. The face recognition algorithm adopted by the face recognition model can include but is not limited to: an LBPFace method (Local Binary Patterns), a characteristic face method (Eigenface) based on a Principal Component Analysis (PCA), a Fisherface method based on a Linear Discriminant Analysis (LDA), and the like. Of course, the face recognition model can also be realized by a convolutional neural network model and the like.

The expression recognition process for the facial image of the target programmed learner may be performed by an expression recognition model, which may be implemented by a deep learning network model (or, for example, a convolutional neural network model, etc.) trained on various types of facial images. The expression recognition algorithm adopted by the expression recognition model can include but is not limited to: the main methods include a Principal Component Analysis (PCA) method based on a characteristic face, an Independent Component Analysis (ICA) method, a Fisher's Linear Discriminant (FLD) method, a Local Feature Analysis (LFA) method, a Fisher motion method (Fisher Actions), a Hidden Markov Model (HMM) method, and a cluster Analysis method; the Local recognition method is a method of separating each feature part of a human face during recognition, and the main methods include a Facial motion Code analysis (FACS), a Local principal component analysis (Local PCA), a Gabor wavelet method, a neural network method, and the like. The deformation extraction method is to recognize deformation of each part of a human face when expressing various expressions, and the main methods include Principal Component Analysis (PCA), Gabor wavelet, motion template (ASM) and Point Distribution Model (PDM). The motion method is based on the principle that some specific characteristic parts can move correspondingly when the human face expresses various specific expressions. The main methods are Optical Flow method (Optical Flow) and Face motion Parameter method (FAP) in MPEG-4. The geometric feature method is a principal component analysis method based on motion units (AU) and is mainly used for extracting feature vectors according to shapes and positions (including mouth, eyes, eyebrows and nose) of parts of a human face, wherein the feature vectors represent geometric features of the human face, and different expressions are recognized according to the feature vectors.

As mentioned above, the expression recognition can be performed according to the extracted facial expression features, which can be facial key point features, including but not limited to mouth, eyes, nose, eyebrows, etc. Namely, the expression recognition processing is carried out on the facial image of the target programming learner through an expression recognition model, and the expression recognition processing comprises the following steps: extracting facial key point characteristics of the target programming learner from the facial image of the target programming learner; mapping the facial key point features into facial expression features; and identifying and matching facial expression characteristics through an expression identification model to obtain the facial expression of the target programming learner.

The expression recognition algorithm adopted by the expression recognition model can extract LBP and Gabor fusion characteristics of an expression salient region, and combines an Artificial Neural Network (ANN) or Support Vector Machine (SVM) classifier for recognition, and can also locate the characteristics of key points of the face, analyze the relation among the characteristics and classify the input expression into the corresponding expression category.

The expression categories may be classified in a variety of ways, and in some embodiments, the expression categories may include six basic emotions: anger, happiness, sadness, surprise, disgust, and fear. In some embodiments, the expression categories can be expanded and extended on the basis of basic emotions, such as happiness, excitement, anger, coolness, tiredness, anxiety, depression, frustration, shame, chagrin, and the like.

For example, the expression categories of learners may be analyzed according to different combinations of the state of eyes and the closing condition of the mouth (even in combination with the change of eyebrows). If the target programming learner's eyes are opened all the time (eyebrows are raised and unfolded) and have a certain blink frequency, the state of the mouth is opened or raised, and the expression of the learner is in a happy or excited state; if the target programs that the eyes of the learner are open (the eyebrows are wrinkled), but the blinking frequency is reduced, and the mouth is in a pulling-down or mouth-left state, the expression of the learner is in a depressed state; if the blinking frequency of the target programming learner is too high or too low and the target programming learner is accompanied by yawning by a big mouth, the target programming learner expresses that the expression of the learner is in a tired state.

Regarding the expression recognition model, it can be obtained through training. In some embodiments, the training method may comprise: reading a picture sample in a picture data set; identifying a picture sample containing a valid face; extracting facial expression features in the picture sample containing the effective face; and identifying the extracted facial expression features to obtain an expression identification result. Alternatively, in some embodiments, the training method may comprise: reading a picture sample in a face picture data set; extracting facial expression features in the picture sample; and identifying the extracted facial expression features to obtain an expression identification result.

The method comprises the following steps of obtaining image information including a target programming learner, processing and analyzing the image information to obtain learning state characteristic data representing the learning behavior of the target programming learner, wherein the method comprises the following steps:

and step S332, shooting a close-range image of the target programming learner corresponding to the programming learning terminal during learning through a first camera device.

As mentioned above, the first camera device is disposed near the programmed learning terminal, for example, on or beside the programmed learning terminal, and is used for capturing a close-up image of the target programmed learner, where the close-up image may include the action of the target programmed learner.

Step 334, performing face recognition processing on the close-range image captured by the first camera device through a face recognition model, and determining a target programming learner.

For the method of face recognition, reference is made to the foregoing description, and details are not repeated here.

Step S336, performing learning behavior recognition processing on the close-range image captured by the first camera device through a learning behavior model to obtain learning state feature data of the learning behavior corresponding to the target programmed learner. The learning behavior model may adopt a deep learning network model (or, for example, a convolutional neural network model, etc.), and the deep learning network model (or, for example, a convolutional neural network model, etc.) is obtained by training various types of learning behaviors.

In step S336, the close-range image of the target learner may be identified, the action and behavior characteristics corresponding to the target learner may be extracted, and then the learning behavior model may identify the extracted behavior characteristics to obtain the learning state characteristic data of the learning behavior corresponding to the target learner.

Generally, learning behaviors corresponding to a target programming learner include, but are not limited to: body posture, head and shoulder movements, hand movements, etc. The body gestures include, but are not limited to: whether sitting or standing, sitting (e.g., leaning forward, leaning backward, leaning left or right) and sitting orientations, standing and standing orientations, and the like. The head and shoulder actions include, but are not limited to: head position and orientation, angle between the head and the shoulder (e.g., head bending, forward leaning, backward leaning), shoulder position (e.g., shoulders flush, shoulder shrugging, shoulder sinking, high shoulders), etc. The hand motion includes, but is not limited to: hanging down with hands, putting on a table, grabbing ears and cheeks, scratching heads, grabbing hair, crossing hands, closing hands, holding a fist, pointing fingers to a screen, writing with a pen, swinging a pen or other small objects, holding a cup, drinking, and the like.

It is to be noted that, in the above-described manner, in connection with step S33, image information including the target programmed learner is acquired, and the image information is processed and analyzed to obtain learning state feature data representing the facial expression and the learning behavior of the target programmed learner.

In the foregoing embodiment, if the close-range image of the target programmed learner captured by the first camera device includes the face of the target programmed learner, the close-range image is processed and analyzed to obtain learning state feature data representing the facial expression of the target programmed learner; and if the close-range image of the target programming learner, which is shot by the first camera device, comprises the action behavior of the target programming learner, processing and analyzing the close-range image to obtain the learning state characteristic data representing the learning behavior of the target programming learner. Therefore, in some embodiments, for a same frame of close-up image, facial expression recognition and learning behavior recognition may be performed on the frame of image respectively to obtain learning state feature data representing a facial expression of the target programming learner and learning state feature data identifying a learning behavior of the target programming learner, that is, corresponding facial features and facial expression features and corresponding action behavior features are extracted from the same frame of image respectively, and then, facial recognition is performed by a facial recognition model according to the facial features, facial expression recognition is performed by an expression recognition model according to the facial expression features, and learning behavior recognition is performed by a learning behavior model according to the action behavior features.

In addition, in some embodiments, the aforementioned face recognition model and the expression recognition model may be combined into a facial expression recognition model, that is, for the same frame of image, the facial expression recognition model extracts facial features and facial expression features corresponding to the target programming learner from the frame of image and performs face recognition and facial expression recognition according to the extracted facial features and facial expression features to obtain learning state feature data representing facial expressions of the target programming learner, and the learning behavior model extracts action behavior features corresponding to the target programming learner from the frame of image and performs learning behavior recognition according to the extracted action behavior features to obtain learning state feature data representing learning behaviors of the target programming learner.

In some embodiments, the aforementioned face recognition model, the expression recognition model, and the learning behavior model may be combined into an expression and behavior combined recognition model, that is, for a same frame of image, the expression and behavior combined recognition model extracts the face features and the face expression features and the action behavior features corresponding to the target programming learner in the frame of image, the face features and the face expression features and the action behavior features belonging to the same frame of image are fused to form a fusion feature of the frame of image, and the expression and behavior combined recognition model performs recognition according to the fusion feature in the frame of image to obtain learning state feature data representing the facial expression and the learning behavior of the target programming learner.

However, some learning behaviors of the programmed learner may not always be completely represented in the close-up image, for example, the learning behaviors occurring in the close-up image may be local, or some learning behaviors of the programmed learner may be interactive behaviors related to other programmed learners.

In some embodiments, the acquiring image information including the target programmed learner, processing and analyzing the image information to obtain learning state characteristic data representing the learning behavior of the target programmed learner, further comprises:

in step S332', a second camera device disposed at a position far away from the programmed learning terminal captures a medium-and-long-view image of the target programmed learner during learning.

As mentioned above, the second camera device is configured relatively far away from the programmed learning terminal, for example, on the wall of the intelligent classroom, and is used for taking the intermediate-distant view image of the target programmed learner, and the intermediate-distant view image may include the action of the target programmed learner and the action of other programmed learners.

And step 334', the face recognition processing is carried out on the shot intermediate and distant view images through the face recognition model or the programming learning terminal positioning is carried out on the shot intermediate and distant view images, and the target programming learner is determined.

As previously mentioned, it is likely that more than one target-programmed learner may be included in the intermediate vision image, and one or more other target-programmed learners may be included, and thus, it may be desirable to identify a target-programmed learner in the intermediate vision image.

During processing, the central perspective image is recognized by the face recognition model, the face features of each programming learner appearing in the central perspective image are extracted, and then face recognition processing is carried out according to the face features so as to determine the target programming learner.

The face recognition processing of the captured intermediate-distant view image may be performed by a face recognition model, and the face recognition model may be implemented by a deep learning network model (or, for example, a convolutional neural network model) obtained by training various types of face images. The face recognition algorithm adopted by the face recognition model can include but is not limited to: an LBPFace method (Local Binary Patterns), a characteristic face method (Eigenface) based on a Principal Component Analysis (PCA), a Fisherface method based on a Linear Discriminant Analysis (LDA), and the like.

Or, in some embodiments, generally, the programmed learner sits or stands at the corresponding programmed learning terminal for operation learning, that is, the programmed learner corresponds to the programmed learning terminal, so that the target programmed learner can also be determined by positioning the programmed learning terminal in the captured intermediate and long-range view image, that is, by identifying and positioning the programmed learning terminal in the captured intermediate and long-range view image, the target programmed learner corresponding to the target programmed learning terminal is determined according to the information of the target programmed learning terminal obtained by positioning.

In step S336', the captured intermediate and distant view images are subjected to learning behavior recognition processing by the learning behavior model, so as to obtain learning state feature data of the learning behavior corresponding to the target programmed learner. The learning behavior model may adopt a deep learning network model (or, for example, a convolutional neural network model, etc.), and the deep learning network model (or, for example, a convolutional neural network model, etc.) is obtained by training various types of learning behaviors.

In step S336', the perspective image of the target programming learner is identified, the action and behavior characteristics corresponding to the target programming learner are extracted, and then the learning behavior model identifies the extracted behavior characteristics corresponding to the target programming learner, so as to obtain the learning state characteristic data of the learning behavior corresponding to the target programming learner.

Generally, learning behaviors corresponding to a target programming learner include, but are not limited to: body posture, head and shoulder movements, hand movements, etc. The body gestures include, but are not limited to: whether sitting or standing, sitting (e.g., leaning forward, leaning backward, leaning left or right) and sitting orientations, standing and standing orientations, and the like. The head and shoulder actions include, but are not limited to: head position and orientation, angle between the head and the shoulder (e.g., head bending, forward leaning, backward leaning), shoulder position (e.g., shoulders flush, shoulder shrugging, shoulder sinking, high shoulders), etc. The hand motion includes, but is not limited to: hanging two hands down, putting on a table, grabbing ears and scratching cheeks, scratching heads, grabbing hair, crossing two hands, closing two hands, holding a fist, pointing fingers to a screen, holding a pen for writing, swinging the pen or other small objects, holding a cup for drinking, and the like.

In some embodiments, the target programming learner also performs discussion communication with the adjacent-table programming learner or other programming learners in the group, and thus, after the target programming learner is determined through the face recognition, in step S336', the recognition of the perspective image of the target programming learner to extract corresponding action and behavior characteristics includes not only extracting the action and behavior characteristics corresponding to the target programming learner but also extracting the action and behavior characteristics corresponding to other programming learners related to the target programming learner, wherein the other programming learners related to the target programming learner may include other programming learners within a certain range from the target programming learner, and the other programming learners may include one or more. And then, identifying the extracted behavior characteristics corresponding to the target programming learner and the action behavior characteristics corresponding to other programming learners related to the target programming learner by the learning behavior model to obtain learning state characteristic data of the learning behavior corresponding to the target programming learner, wherein the learning behavior corresponding to the target programming learner comprises the behavior of the target programming learner and the interactive behavior of the target programming learner and the other related programming learners.

The interaction of the target programming learner with other relevant programming learners includes but is not limited to: verbal communication discussions (e.g., self-expression to others, discussion between two persons, communication between a panelist and multiple persons, etc.), collaborative practices (e.g., presentation or demonstration of programming efforts to others, collaborative programming, etc.). The interactive behaviors can be obtained by comprehensively analyzing processed information such as body gestures, limb actions, facial expressions and the like of the target programming learner and other related programming learners. Furthermore, in some embodiments, the interactive behavior of the verbal communication discussion includes a debate (e.g., a debate over a certain point of programming knowledge or a certain programming result, etc.).

Because multiple programming learners are involved, training a learning behavior model for the target programming learner's interaction with other programming learners is more complicated than training a single programming learner's learning behavior. For example, a manner of training the interactive behavior of a target programming learner with other programming learners using a learning behavior model may include: acquiring an interactive behavior between a sample image and a programmed learner corresponding to the sample image as a sample set, and performing iterative training based on the sample set.

The multi-modal learning state model can adopt a deep learning network model (or, for example, a convolutional neural network model) which is obtained by training various types of learning state feature data.

In step S35, the multi-modal learning state model analyzes the learning state feature data representing the facial expression of the target programmed learner and the learning state feature data representing the learning behavior of the target programmed learner, to obtain the learning state of the target programmed learner.

In some embodiments, learning state feature data of facial expressions and learning state feature data of learning behaviors belonging to the same target programming learner may be fused to form a fused learning state feature with the target programming learner, and recognized by a multi-modal learning state model based on the fused learning state feature of the target programming learner to derive a learning state representative of the target programming learner.

For example, if the eyes of the target programmed learner are opened all the time (eyebrow is raised and unfolded) and have a certain blinking frequency, the state of the mouth is opened or raised, which means that the expression of the learner is happy or excited, and the target programmed learner is subjected to arm shaking or arm raising, or keyboard input or mouse operation is faster, and the analysis target programmed learner is focused and in a concentration state; if the eyes of the target programming learner are open (the eyebrows are wrinkled), but the blinking frequency is reduced, and the mouth is in a state of pulling down or left-falling, the expression of the learner is depressed, and the target programming learner is slow or long-time pause in the states of beating chest and stopping feet, beating brain bags, grabbing hair, biting pen points, swinging and touching pens, inputting a keyboard or operating a mouse, or the target programming learner stops from time to ask other peripheral programming learners for teaching or discussion, analyzes that the attention of the learner is dispersed, and judges that the learner is in a state of confusion or passivity; if the blinking frequency of the target programming learner is too high or too low, the target programming learner opens a big mouth to yawn, and the target programming learner looks up, holds the head with two hands, holds the chest with two hands or holds the desktop of a headrest, analyzes that the attention of the learner is dispersed, and judges that the learner is in a tired state.

The teaching feature evaluation model can adopt a deep learning network model (or, for example, a convolutional neural network model) which is obtained by training various learning evaluation and learning states.

In step S37, the teaching characteristic evaluation model analyzes the learning evaluation and learning state according to the learning evaluation and learning state of the target learner, and obtains a teaching evaluation result related to the target learner.

In some embodiments, learning evaluations and learning states belonging to the same target-programmed learner may be fused to form a fused teaching feature of the target-programmed learner, and a teaching feature evaluation model identifies the target-programmed learner from the fused teaching feature of the target-programmed learner to obtain a teaching evaluation result representing the target-programmed learner.

Subsequently, according to the teaching evaluation result, a corresponding programming learning strategy can be determined.

In certain embodiments, the programming learning strategies include, but are not limited to: customizing a new programming teaching course for the programming learner according to the teaching evaluation result; according to the teaching evaluation result, pushing corresponding knowledge skill learning materials such as programming technology, specification requirements, code debugging skills, code paradigms and the like to the programming learner according to weak knowledge points mastered by the programming learner; and adjusting the programmed learning content or the programmed learning time length aiming at the programmed learner according to the teaching evaluation result. These programming learning strategies may be implemented individually or in combination.

In certain embodiments, the programming learning strategies include, but are not limited to: teaching strategies for programming instructors and learning strategies for programming learners. By utilizing the teaching strategy, a programming instructor is helped to make targeted teaching. By utilizing the learning strategy, the programmed learner can make corresponding improvement according to the advantages and disadvantages and the learning characteristics.

From the above, the intelligent programming learning method based on the multi-modal deep learning disclosed by the application is applied to programming teaching, and breaks through the way of performing teaching evaluation by adopting a single angle in the prior art, but adopts a multi-modal evaluation way, namely, in the learning process of a programming learner, multi-modal characteristics including facial expression characteristics and learning behavior characteristics of the programming learner are obtained, learning state characteristic data representing the facial expression of the target programming learner and learning state characteristic data of the learning behavior are respectively obtained according to the facial expression characteristics and the learning behavior characteristics, and then the real learning state corresponding to the target programming learner is accurately obtained according to the learning state characteristic data of the facial expression and the learning behavior, so that the teaching evaluation result is obtained according to the real learning state characteristic data, and the corresponding programming learning strategy is determined. Compared with the prior art, the intelligent programming learning method based on the multi-mode deep learning adopts a multi-mode evaluation mode, obtains the real learning state of a programmed learner according to the facial expression and the learning behavior of the programmed learner, establishes an accurate and complete programming teaching evaluation system, formulates a targeted personalized programming learning strategy, realizes programming personalized guidance, promotes accurate teaching of teachers, and practically improves the programming ability of the programmed learner.

Please refer to fig. 4, which is a simplified structural entity diagram of the intelligent programming learning system based on multimodal deep learning according to an embodiment of the present application. As shown in fig. 4, the system for intelligently programming learning based on multimodal deep learning includes: a learning information acquisition module 41, a learning evaluation module 42, an image acquisition module 43, a facial expression recognition module 44, a learning behavior recognition module 45, a learning state evaluation module 46, a teaching evaluation module 47, and a learning strategy module 48.

In a specific application scenario, the learning information acquisition module 41, the learning evaluation module 42, the image acquisition module 43, the facial expression recognition module 44, the learning behavior recognition module 45, the learning state evaluation module 46, and the teaching evaluation module 47 may be software modules or devices that may be deployed on a server, or a virtual machine on a server, or a container on a server. In addition, the software modules may be deployed on the same server or different servers according to actual needs, but are not limited to the above examples.

The learning information acquisition module 41 is used for acquiring learning information of a target programming learner in programming learning. In some embodiments, the learning information acquiring module 41 may be, for example, an image recording module, which may acquire the learning information of the target programmed learner on the learning interface by means of screen recording or screen capturing.

Regarding the screen recording manner, taking a programming problem or a programming examination problem as an example, in some embodiments, the screen recording manner may include: taking pages of the programming exercises or the programming examination questions as recording objects; when detecting that a target programming learner starts inputting contents on a programming problem or a programming test question to answer, recording the display contents of the page of the programming problem or the programming test question of the recording object and related information (such as an audio source or a video source and the like) to form a screen recording file. In some embodiments, the screen recording method may include: taking a page of the programming exercise or the programming examination as a recording object; providing a screen recording button when detecting that a page of the programming problem or the programming test question is opened or detecting that a target programming learner inputs contents on the programming problem or the programming test question to answer the question; and then, when the screen recording button is detected to be triggered, recording the display content of the page of the programming problem or the programming exam of the recording object and related information (such as an audio source or a video source) of the page to form a screen recording file.

Regarding the screen capture mode, taking the programming problem or the programming examination problem as an example, in some embodiments, the screen capture mode method may include: taking pages of the programming exercises or the programming examination questions as screen capture objects; and when detecting that the target programming learner inputs new contents on the programming problem or the programming question, screen-capturing the display contents of the page of the screen-capturing object or the programming question to form a screen-capturing image, so that the input contents of each time of the target programming learner on the programming problem or the programming question can be screen-captured to form a screen-capturing image until the target programming learner finishes the programming problem or the programming question, and finally, taking a plurality of screen-capturing images as screen-capturing files for the screen-capturing object. In some embodiments, the screen capture method may include: taking pages of the programming exercises or the programming examination questions as screen capture objects; every time a target programming learner is detected to start inputting contents on a programming problem or a programming test for answering, screen capturing is carried out on the display contents of the pages of the programming problem or the programming test of the screen capturing object so as to form a screen capturing image, then, every preset time interval, screen capturing is carried out on the display contents of the pages of the programming problem or the programming test of the screen capturing object so as to form a screen capturing image until the target programming learner finishes the programming problem or the programming test, and finally, a plurality of screen capturing images are taken as screen capturing files for the screen capturing object.

In some embodiments, the screen capture method may include: taking pages of the programming exercises or the programming examination questions as screen capture objects; providing a screen capture button when detecting that a page of the programming problem or the programming test question is opened or detecting that a target programming learner inputs contents on the programming problem or the programming test question to answer the question; then, when the screen capture button is detected to be triggered, screen capture is carried out on the display content of the page of the programming problem or the programming examination question of the screen capture object so as to form a screen capture image, then, each time the screen capture button is detected to be triggered, screen capture is carried out on the display content of the page of the programming problem or the programming examination question of the screen capture object so as to form a screen capture image until the target programming learner finishes the programming problem or the programming examination question, and finally, a plurality of screen capture images are used as screen capture files for the screen capture object.

Further, the learning information may include programmed learning content.

Regarding the screen recording manner, in some embodiments, the screen recording manner method may include: taking a page of the programmed learning content as a recording object; when it is detected that a target programming learner opens programming learning materials stored in a local end or a cloud end or a webpage of the programming learning content is opened through network retrieval, the display content of the webpage of the programming learning content of the recording object and related information (such as a sound source or a video source) of the webpage are recorded to form a screen recording file. In some embodiments, the screen recording method may include: taking a page of the programmed learning content as a recording object; providing a screen recording button when detecting that a page of the programmed learning content is opened or detecting that a target programmed learner inputs content on the page of the programmed learning content to take notes; and then, when the screen recording button is detected to be triggered, recording the display content of the page of the programmed learning content of the recording object and related information (such as an audio source or a video source) of the page to form a screen recording file.

Regarding the screen capture mode, in some embodiments, the screen capture mode method may include: taking a page of the programming learning content as a screen capture object; and each time the target programming learner is detected to refresh or scroll the page of the programming learning content or input new content on the page of the programming learning content, the display content of the page of the programming learning content of the screen capture object is screen captured to form a screen capture image. In some embodiments, the screen capture mode method may include: taking a page of the programming learning content as a screen capture object; and each time the target programming learner is detected to refresh or scroll the page of the programming learning content or input new content on the page of the programming learning content, the display content of the page of the programming learning content of the screen capture object is screen captured to form a screen capture image, and thereafter, the display content of the page of the programming learning content of the screen capture object is screen captured to form a screen capture image at intervals of a preset time.

In some embodiments, the screen capture mode method may include: taking a page of the programming learning content as a screen capture object; providing a screen capture button when it is detected that the target programmed learner refreshes or scrolls the page of the programmed learning content or inputs new content on the page of the programmed learning content; then, when the screen capture button is detected to be triggered, the display content of the page of the programming learning content of the screen capture object is captured to form a screen capture image, and then, every time the screen capture button is detected to be triggered, the display content of the page of the programming learning content of the screen capture object is captured to form a screen capture image.

The learning evaluation module 42 is used for processing and analyzing the learning information to obtain the learning evaluation of the target programming learner. In some embodiments, the learning evaluation module 42 for processing and analyzing the learning information to obtain the learning evaluation of the target programmed learner comprises: and processing and analyzing the learning information to obtain the mastering condition of the target programming learner on the programming specific knowledge point, the programming learning achievement and a reference index related to the programming learning achievement.

Taking the programmed learning achievement as an example, processing and analyzing the learning information to obtain the programmed learning achievement of the target programmed learner may include the following steps:

Next, a programming answer region for each topic is determined in the image of the programming problem or programming question.

And then, carrying out image recognition on the image of the programming answering area of each programming problem or each programming examination question to obtain the image characteristics of each question, and processing the image characteristics to obtain the answering content of each programming problem or each programming examination question.

And finally, comparing and analyzing the answering content of the programming exercises or the programming test questions with the standard answers of the corresponding programming exercises or the corresponding programming test questions, and obtaining corresponding programming learning scores according to the comparison result.

The reference index includes, but is not limited to, answering time, deletion number of modifications, answering sequence, etc.

The mastery of the target programming learner for programming a particular knowledge point may be obtained through programming learning content in addition to the programming learning performance of the target programming learner. For example, the mastery of a programmed specific knowledge point by a target programmed learner, such as the learning habits of the programmed learner, the learning subjective initiative, the learning preferences for certain programmed knowledge points, or the learning course throughout the programmed knowledge learning, is obtained by learning a specific chapter of the programmed learning material that the target programmed learner is currently viewing for learning or the target programmed learner retrieving the specific programmed knowledge point over a network.

The image acquisition module 43 acquires image information containing a target programmed learner.

In some embodiments, the image acquisition module 43 acquires a close-up image including the target programmed learner via a first camera device disposed in a smart classroom. In some embodiments, the image acquisition module 43 acquires a close-up image including the target programmed learner via a first camera device disposed in the smart classroom and a mid-distant image including the target programmed learner via a second camera device disposed in the smart classroom. In some embodiments, the image acquisition module 43 acquires a mid-perspective image containing the target programmed learner via a second camera device disposed in the smart classroom.

The facial expression recognition module 44 is configured to process and analyze the image information acquired by the image acquisition module 44 to obtain learning state feature data representing facial expressions of the target programmed learner.

In some embodiments, the facial expression recognition module 44 is configured to process and analyze the image information acquired by the image acquisition module 43 to obtain learning status feature data representing facial expressions of the target programmed learner, and may include: selecting images containing the face of a target programmed learner from acquired close-up images of the programmed learner captured by a first camera device; then, carrying out face recognition processing on the shot close-range image to determine a face image of the target programming learner; and then, performing expression recognition processing on the facial image of the target programmed learner through an expression recognition model to obtain the learning state feature data of the facial expression corresponding to the target programmed learner.

And performing expression recognition according to the extracted facial expression features, wherein the facial expression features can be facial key point features, and the facial key point features include but are not limited to mouth, eyes, nose, eyebrows and the like. Namely, the expression recognition processing is carried out on the facial image of the target programming learner through an expression recognition model, and the expression recognition processing comprises the following steps: extracting facial key point features of the target programming learner from the facial image of the target programming learner; mapping the facial key point features into facial expression features; and identifying and matching facial expression characteristics through the expression identification model to obtain the facial expression of the target programming learner.

There may be a plurality of classifications for expression categories, and in some embodiments, the expression categories may include six basic emotions: anger, happiness, sadness, surprise, disgust, and fear. In some embodiments, the expression categories can be expanded and extended on the basis of basic emotions, such as happiness, excitement, anger, coolness, tiredness, anxiety, depression, frustration, shame, chagrin, and the like.

The learning behavior recognition module 45 is configured to process and analyze the image information to obtain learning state feature data representing the learning behavior of the target programmed learner.

In some embodiments, the learning behavior recognition module 45 recognizes the close-range image of the target programming learner, extracts the action behavior feature corresponding to the target programming learner, and then recognizes the extracted behavior feature by the learning behavior model to obtain the learning state feature data of the learning behavior corresponding to the target programming learner.

In some embodiments, the learning behavior recognition module 45 recognizes the perspective image of the target programming learner, extracts the action behavior feature corresponding to the target programming learner, and then recognizes the extracted behavior feature by the learning behavior model to obtain the learning state feature data of the learning behavior corresponding to the target programming learner.

In general, learning behaviors corresponding to a targeted programming learner include, but are not limited to: body posture, head and shoulder movements, hand movements, etc. The body gestures include, but are not limited to: whether sitting or standing, sitting and sitting orientations, standing and standing orientations, etc. The head and shoulder actions include, but are not limited to: head position and orientation, angle of head to shoulder (e.g., whether head is tilted), state of shoulder (e.g., whether shoulder is shrunken), etc. The hand motion includes, but is not limited to: hanging two hands down, putting on a table, grabbing ears and scratching cheeks, scratching heads, grabbing hair, crossing two hands, closing two hands, holding a fist, pointing fingers to a screen, holding a pen for writing, swinging the pen or other small objects, holding a cup for drinking, and the like.

The learning state evaluation module 46 is configured to analyze the learning state feature data representing the facial expressions and the learning behaviors of the target learner, so as to obtain the learning state of the target learner.

The teaching evaluation module 47 is used for evaluating the learning evaluation and learning state of the target programming learner to obtain a teaching evaluation result related to the target programming learner.

The learning strategy module 48 is used for determining a corresponding programmed learning strategy according to the obtained teaching evaluation result related to the target programmed learner.

And determining a corresponding programming learning strategy according to the teaching evaluation result.

In certain embodiments, the programming learning strategies include, but are not limited to: customizing a new programming teaching course for the programming learner according to the teaching evaluation result; pushing corresponding programming teaching resources to the programming learner according to the teaching evaluation result and the weak knowledge points mastered by the programming learner; and adjusting the programmed learning content or the programmed learning time length aiming at the programmed learner according to the teaching evaluation result. These programming learning strategies may be implemented individually or in combination.

In some embodiments, the programming learning strategies include, but are not limited to: teaching strategies for programming instructors and learning strategies for programming learners. The teaching strategy is used for helping the programming instructor to make targeted teaching. By utilizing the learning strategy, the programmed learner can make corresponding improvement according to the advantages and disadvantages and the learning characteristics.

Please refer to fig. 5, which is a simplified schematic structural diagram of the computer device according to an embodiment of the present application. As shown, the computer device includes a memory 51 and a processor 52.

In certain embodiments, the storage device is, for example, network attached storage accessed via RF circuitry or external ports and a communications network, which may be the internet, one or more intranets, Local Area Networks (LANs), wide area networks (WLANs), storage local area networks (SANs), etc., or a suitable combination thereof. The memory controller may control access to the memory by other components of the device, such as the CPU and peripheral interfaces. The memory optionally includes high-speed random access memory, and optionally also includes non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state memory devices. Access to the memory is optionally controlled by a memory controller by other components of the device, such as a CPU and peripheral interfaces.

In some embodiments, the Memory 51 may also include a Volatile Memory (Volatile Memory), such as a Random Access Memory (RAM); the Memory may also include a Non-Volatile Memory (Non-Volatile Memory), such as a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk Drive (HDD), or a Solid-State Drive (SSD).

The processor 52 can coordinate the memory 51 for executing the intelligent programming learning method based on multi-modal deep learning as described in any one of the embodiments shown in fig. 3.

In some embodiments, the processing device comprises an integrated circuit chip having signal processing capabilities; or a general purpose processor which may be a microprocessor, or any conventional processor such as a Central Processing Unit (CPU). For example, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a discrete gate or transistor logic device, a discrete hardware component, may implement or execute the methods, steps and logic blocks disclosed in the embodiments of the present application, for example, the method for intelligent programming learning based on multi-modal deep learning as described in any of the embodiments shown in fig. 3.

The application discloses a computer readable storage medium, which stores at least one program, when being called, the at least one program realizes the intelligent programming learning method based on multi-modal deep learning as described in any one of the embodiments shown in fig. 3.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application.

In the embodiments provided herein, the computer-readable and writable storage medium may include read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, a USB flash drive, a removable hard disk, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable-and-writable storage media and data storage media do not include connections, carrier waves, signals or other transitory media, but are intended to be non-transitory, tangible storage media. Disk and disc, as used in this application, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

In one or more exemplary aspects, the functions described in the computer program of the data processing method described herein may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. The steps of a method or algorithm disclosed herein may be embodied in a processor-executable software module, which may be located on a tangible, non-transitory computer-readable and/or writable storage medium. Tangible, non-transitory computer readable and writable storage media may be any available media that can be accessed by a computer.

The flowcharts and block diagrams in the figures described above of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims

1. An intelligent programming learning method based on multi-mode deep learning is characterized by comprising the following steps:

acquiring learning information of a target programming learner in programming learning, and processing and analyzing the learning information to obtain learning evaluation of the target programming learner;

acquiring image information containing a target programming learner, processing and analyzing the image information to obtain learning state characteristic data representing facial expressions and learning behaviors of the target programming learner;

analyzing the learning state characteristic data according to a preset multi-mode learning state model to obtain the learning state of the target programming learner; and

and evaluating the learning evaluation and learning state of the target programming learner according to a preset teaching characteristic evaluation model to obtain a teaching evaluation result related to the target programming learner and determine a corresponding programming learning strategy.

2. The intelligent programmed learning method based on multi-modal deep learning as claimed in claim 1, wherein learning information of the target programmed learner in the programmed learning is obtained, and the learning information is processed and analyzed to obtain the learning evaluation of the target programmed learner, comprising the following steps:

acquiring learning information of a target programming learner by means of screen recording or screen capturing, wherein the learning information comprises: the operation content and one or more of operation time related to the operation content, a programming response area corresponding to the operation and an operation track, programming learning content and programming learning content source and programming learning time related to the programming learning content; the operation content comprises writing code and debugging code of a target programming learner in a development environment; and

and processing and analyzing the learning information to obtain the mastery condition of the target programming learner on the programming specific knowledge point, the programming learning achievement and the reference index related to the programming learning achievement.

3. The intelligent programmed learning method based on multi-modal deep learning of claim 1, wherein image information including a target programmed learner is acquired, processed and analyzed to obtain learning state feature data representing facial expressions and learning behaviors of the target programmed learner, comprising the steps of:

a first camera device arranged near a programming learning terminal is used for shooting a close-range image of a target programming learner corresponding to the programming learning terminal during learning; carrying out face recognition processing on the shot close-range image through a face recognition model to determine a face image of the target programming learner; performing expression recognition processing on the facial image of the target programming learner through an expression recognition model to obtain learning state feature data of the facial expression corresponding to the target programming learner; and

and performing learning behavior identification processing on the close-range image shot by the first camera device through a learning behavior model to obtain learning state characteristic data of the learning behavior corresponding to the target programming learner.

4. The intelligent programmed learning method based on multi-modal deep learning of claim 1, wherein image information including a target programmed learner is acquired, processed and analyzed to obtain learning state feature data representing facial expressions and learning behaviors of the target programmed learner, comprising the steps of:

shooting a medium and long-range view image of the target programming learner during learning through a second camera device arranged at a position far away from the programming learning terminal; carrying out face recognition processing on the shot medium and long-range view image through a face recognition model or carrying out programming learning terminal positioning on the shot medium and long-range view image to determine a target programming learner; performing learning behavior recognition processing on the shot medium and long-range view image through a learning behavior model to obtain learning state characteristic data of a learning behavior corresponding to the target programming learner; wherein the learning behaviors corresponding to the target programming learner include the behaviors of the target programming learner and the interactive behaviors of the target programming learner and other related programming learners.

5. The intelligent programmed learning method based on multi-modal deep learning according to claim 3 or 4, wherein the facial image of the target programmed learner is subjected to expression recognition processing through an expression recognition model, and the method comprises the following steps:

extracting facial key point features of the target programming learner from the facial image of the target programming learner;

mapping the facial key point features into facial expression features; and

and identifying and matching facial expression characteristics through an expression identification model to obtain the facial expression of the target programming learner.

6. An intelligent programming learning system based on multi-modal deep learning, comprising:

the learning information acquisition module is used for acquiring learning information of a target programming learner in programming learning;

the learning evaluation module is used for processing and analyzing the learning information to obtain the learning evaluation of the target programming learner;

the image acquisition module is used for acquiring image information containing a target programming learner;

the facial expression recognition module is used for processing and analyzing the image information to obtain learning state feature data representing the facial expression of the target programming learner;

the learning behavior identification module is used for processing and analyzing the image information to obtain learning state characteristic data representing the learning behavior of the target programming learner;

the learning state evaluation module is used for analyzing the learning state feature data representing the facial expression and the learning behavior of the target programming learner to obtain the learning state of the target programming learner;

the teaching evaluation module is used for evaluating the learning evaluation and learning state of the target programming learner to obtain a teaching evaluation result related to the target programming learner; and

and the learning strategy module is used for determining a corresponding programming learning strategy according to the obtained teaching evaluation result related to the target programming learner.

7. A computer device, comprising:

a memory for storing at least one program;

a processor, connected to the memory, for executing the at least one program to perform and implement the intelligent programming learning method based on multi-modal deep learning as claimed in any one of claims 1 to 5.

8. A non-transitory storage medium of a computer device, storing at least one program that when executed by a processor performs and implements the intelligent programmed learning method based on multimodal deep learning according to any one of claims 1 to 5.

9. A smart classroom for programming learning, comprising:

the classroom master equipment is connected with the intelligent programming teaching platform through a network;

the programming learning terminals are connected with the classroom master device through a network; the programming learning terminal is configured with programming learning software;

the first camera devices are respectively configured on the corresponding programming learning terminals or respectively configured beside the corresponding programming learning terminals; the first camera device is used for shooting a close-range image of a target programming learner corresponding to the programming learning terminal during learning; and

one or more second camera devices, configured in a position away from the programmed learning terminal in the intelligent classroom, for capturing intermediate and distant view images of the target programmed learner or a plurality of programmed learners including the target programmed learner during learning.

10. An intelligent programming learning system, comprising:

an intelligent programming teaching platform; and

the at least one smart classroom of claim 9 wherein the at least one smart classroom is networked to the smart programming teaching platform.