CN116844166B

CN116844166B - Video positioning device and method based on learning behavior

Info

Publication number: CN116844166B
Application number: CN202311068107.5A
Authority: CN
Inventors: 周印伟; 殷述军
Original assignee: Qingdao Robotpen Digital Technology Co ltd
Current assignee: Qingdao Robotpen Digital Technology Co ltd
Priority date: 2023-08-24
Filing date: 2023-08-24
Publication date: 2023-11-24
Anticipated expiration: 2043-08-24
Also published as: CN116844166A

Abstract

Some embodiments of the application provide a video positioning device and method based on learning behavior, which relate to the technical field of artificial intelligence. The handwriting data comprise a driving track input by the characteristic pen and the occurrence time of the driving track. And then executing feature recognition on the driving track, associating the occurrence time with the target video when the driving track is a template track, and detecting the sub-data stream of the target video according to the occurrence time. And extracting target text of the sub-data stream, and executing slicing processing on the target video according to the occurrence time and the target text to generate the target slice video. And pushing the target slice video to the user terminal. According to the method, the specific driving track is input into the target area, so that the automatic positioning and slicing processes of the target video can be realized, and the positioning efficiency of the target video is further improved.

Description

Video positioning device and method based on learning behavior

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a video positioning device and method based on learning behaviors.

Background

In the course of classroom learning, teaching teachers can sequentially explain knowledge points in real time according to course arrangement, so that students can grasp all knowledge points through explanation contents. However, in the course of classroom learning, students may not be able to grasp knowledge points in time due to problems such as fast explanation progress and distraction.

Therefore, in order to consolidate the classroom content, the teaching teacher can record and broadcast the classroom normally, and record the explanation content into the form of classroom video, and one class corresponds to one class video. In this way, students can review the explanation content of the knowledge points by playing back the classroom videos.

However, a section of classroom video corresponds to a plurality of knowledge points, and students need to manually search the knowledge points which are not mastered in the classroom video. And repeated manual searching can consume longer time, and the specific knowledge points are not easy to position, so that the knowledge point positioning efficiency of the classroom video is reduced.

Disclosure of Invention

The application provides a video positioning device and method based on learning behaviors, which are used for solving the problem of low knowledge point positioning efficiency in classroom videos.

In a first aspect, some embodiments of the present application provide a video positioning device based on learning behavior, including a feature pen, a handwriting book, and a processor. Wherein the feature pen is configured to input handwriting data; the handwriting book is configured to receive the handwriting data, and the handwriting book includes a target area; the processor is configured to:

acquiring handwriting data of the target area, wherein the handwriting data comprises a driving track input by the characteristic pen and the occurrence time of the driving track;

performing feature recognition on the driving track;

when the driving track is a template track, associating the occurrence time with a target video, and detecting a sub-data stream of the target video according to the occurrence time;

extracting target text of the sub-data stream;

executing slicing processing on the target video according to the occurrence time and the target text to generate a target slice video;

pushing the target slice video to a user terminal.

Optionally, the handwriting book includes a non-target area; the processor executes the handwriting data of the acquired target area and is configured to: analyzing a point code file corresponding to the handwriting data, wherein the point code file comprises a page number mark, a region mark and a dot matrix pattern; if the area identifier is a target area identifier, marking the handwriting data as handwriting data of a target area, and extracting the handwriting data of the target area; and if the area identifier is a non-target area identifier, marking the handwriting data as handwriting data of a non-target area.

Optionally, the processor performs feature recognition on the driving track, and is configured to: extracting a characteristic vector of the driving track; calculating the similarity between the driving track and the template track through an identification model; if the similarity is greater than or equal to a similarity threshold, marking the driving track as a template track; and if the similarity is smaller than the similarity threshold, marking the driving track as a non-template track, and deleting handwriting data corresponding to the non-template track.

Optionally, the processor executing the extracting the target text of the sub data stream is configured to: extracting audio data of the sub data stream; converting the audio data into language text; extracting keywords of the language text based on a natural language processing algorithm; matching the keywords with preset template keywords to generate the target text; the target text includes the same keywords as the template keyword content.

Optionally, the processor performs slicing processing on the target video according to the occurrence time and the target text, and is configured to: detecting a slicing time threshold; calculating a slice time point, the slice time point being a time point earlier than the occurrence time by the slice time threshold; dividing the target video according to the slicing time point and the occurrence time to generate the target slicing video; the starting time of the target slice video is the slice time point, and the ending time of the target slice video is the occurrence time; and establishing an association relation between the target text and the target slice video.

Optionally, the processor performs slicing processing on the target video according to the occurrence time and the target text, and is configured to: acquiring a target data stream of the target video; extracting audio data and video data in the target data stream; converting the audio data into language text; extracting a picture frame of the video data and identifying language text in the picture frame; separating the language text including the audio data of the target text and/or the video data of the language text including the target text into the target slice video; and establishing an association relation between the target text and the target slice video.

Optionally, the processor is further configured to: inquiring an associated video according to the target text, wherein the associated video comprises the target text label; and pushing the associated video to the user terminal when pushing the target slice video.

Optionally, the processor is further configured to: detecting the play frequency of the target video and the associated video; setting the arrangement priority of the target slice video and the associated video according to the play frequency, wherein the arrangement priority and the play frequency are in a proportional relation; and pushing the target slice video and the associated video to the user terminal according to the order of the arranged priorities.

Optionally, the apparatus further comprises a camera configured to acquire a target image of a target person; the processor is configured to: controlling the camera to capture the face characteristics of the target image; identifying identity information of the target person based on the face features; inquiring a category label of the target video according to the identity information; and establishing an association relation between the category label and the target video.

In a second aspect, some embodiments of the present application further provide a video positioning method based on learning behavior, including:

obtaining handwriting data of a target area, wherein the handwriting data comprises a driving track input by a characteristic pen and the occurrence time of the driving track;

performing feature recognition on the driving track;

extracting target text of the sub-data stream;

pushing the target slice video to a user terminal.

According to the technical scheme, the video positioning device and the video positioning method based on learning behaviors, which are provided by the embodiments of the application, can acquire handwriting data of a target area. The handwriting data comprise a driving track input by the characteristic pen and the occurrence time of the driving track. And then executing feature recognition on the driving track, associating the occurrence time with the target video when the driving track is a template track, and detecting the sub-data stream of the target video according to the occurrence time. And extracting target text of the sub-data stream, and executing slicing processing on the target video according to the occurrence time and the target text to generate the target slice video. And pushing the target slice video to the user terminal. According to the method, the specific driving track is input into the target area, so that the automatic positioning and slicing processes of the target video can be realized, and the positioning efficiency of the target video is further improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a cross-sectional view of the internal structure of a feature pen provided in some embodiments of the application;

FIG. 2 is a block diagram of a learning behavior-based video positioning apparatus according to some embodiments of the present application;

FIG. 3 is a flowchart of a learning behavior-based video positioning method according to some embodiments of the present application;

FIG. 4 is a schematic diagram showing the effect of a handwriting book according to some embodiments of the present application;

FIG. 5 is a flowchart illustrating feature recognition performed on a driving track according to some embodiments of the present application;

fig. 6 is a schematic structural diagram of a frequency positioning device with a camera according to some embodiments of the present application;

FIG. 7 is a flow chart illustrating a slicing process performed based on a slicing time threshold according to some embodiments of the present application;

fig. 8 is a flow chart of performing slicing processing based on language text according to some embodiments of the present application.

Detailed Description

For the purposes of making the objects and embodiments of the present application more apparent, an exemplary embodiment of the present application will be described in detail below with reference to the accompanying drawings in which exemplary embodiments of the present application are illustrated, it being apparent that the exemplary embodiments described are only some, but not all, of the embodiments of the present application.

Also, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed or inherent to such product or apparatus, but may include other elements not expressly listed or inherent to such product or apparatus.

In the course of classroom learning, teaching teachers can sequentially explain knowledge points in real time according to course arrangement, so that students can grasp all knowledge points through explanation contents. Moreover, students can record notes of the explanation content of the knowledge points, so that the knowledge points can be reviewed and summarized conveniently.

Thus, in some embodiments, notes are taken on knowledge point content by the feature pen 100. The feature pen 100 may be an electromagnetic pen or a dot matrix pen. The electromagnetic pen and the dot matrix pen can record the occurrence time point of any handwriting in the note, and can also carry out electronic restoration on any handwriting in the note in real time, so that the convenience of note recording is improved.

As shown in fig. 1, in some embodiments, the feature pen 100 includes a high speed camera 110, a cartridge 120, a main controller 130, a pressure sensor 140, a battery 150, a bluetooth communication board 160, and a charger 170. The high-speed camera 110 is disposed at the front end of the pen 100, and is used for capturing the motion track of the pen tip of the pen 100. The pressure sensor 140 is used for detecting pressure data generated when the feature pen 100 is in use, so as to determine the pen down state of the feature pen 100. During use, the pressure sensor 140 transmits the generated pressure data back to the main controller 130. The main controller 130 generates handwriting data of the feature pen 100 by driving the trajectory and pressure data. The handwriting data records a driving track when the characteristic pen 100 is dropped and an occurrence time corresponding to the driving track. After generating the handwriting data, the feature pen 100 establishes wireless communication connection with other terminal devices through the bluetooth communication board 160, so as to realize transmission and interaction of the handwriting data.

In some embodiments, cartridge 120 of feature pen 100 employs an ink cartridge, such as a D1 standard cartridge, or the like. The writing handwriting of the feature pen 100 can be kept in paper through the ink pen core, so that the problem that the user needs to watch the electronic screen in real time during paperless writing is solved, and the use experience of the user can be improved.

In some embodiments, the charger 170 of the wand 100 employs a magnetically attractable charger. The magnetic charger adopts a magnetic charging mode, namely a magnet male-female opposite-attraction mode is adopted to achieve the effect of switching on charging. The magnetic charger can improve the transmission efficiency and convenience during charging.

In some embodiments, the feature pen 100 further includes a transmission interface, where the transmission interface may establish a wired communication connection with other terminal devices through a data line, so as to implement transmission interaction of handwriting data.

In some embodiments, the wand 100 also includes memory. Handwriting data is stored in a memory, which is communicatively coupled to a host controller 130. After the feature pen 100 generates handwriting data, the memory may upload the handwriting data to the cloud for storage, so as to realize the offline backup and offline storage functions of the handwriting data in the feature pen 100.

Because students may explain the factors such as fast progress, god, etc., the content of the knowledge points cannot be mastered in time. Therefore, in some embodiments, the classroom teaching time also adopts the form of online live broadcast classroom or offline recorded broadcast classroom to generate corresponding classroom video. Therefore, the user can play back the classroom video after class and review the explanation content of the knowledge points.

However, a class video corresponds to a plurality of knowledge point contents, and a user needs to manually search a specific knowledge point explanation segment in the class video. The manual searching method is difficult to accurately find the corresponding segment, and can consume a long time, so that the knowledge point positioning efficiency of the classroom video is reduced.

In the application, taking classroom learning as an example, the target video is recorded video generated in an off-line recorded video mode or an on-line live video mode, and the user terminal is terminal equipment used by a user for watching the target video and can be other intelligent display equipment such as a mobile phone, a tablet computer, a notebook computer and the like.

Based on the above application scenario, in order to solve the problem of low knowledge point positioning efficiency in classroom video, some embodiments of the present application provide a video positioning device based on learning behavior, as shown in fig. 2, including a feature pen 100, a handwriting 200 and a processor 300. Wherein the stylus 100 is configured to input handwriting data; the handwriting 200 is configured to receive handwriting data, and the handwriting 200 includes a target area; as shown in fig. 3, the processor 300 is configured to perform the following program steps:

s1: and acquiring handwriting data of the target area.

The handwriting data comprise a driving track input by the characteristic pen and the occurrence time of the driving track. A target area is defined in the script 200, which is used to monitor the user-specific handwriting, so as to mark the knowledge point of the current interpretation by the specific template handwriting. For example, a template script may employ a specific character "? ", I! "or", "etc., and the template handwriting may be one or more specific characters.

For example, the user is watching a live broadcast for learning a certain class of mathematics, and when the teaching teacher teaches the knowledge point content of the "trigonometric function", the user does not understand the grasp. At this time, the user may draw a specific character "? ", to mark the location of the current knowledge point.

It will be appreciated that the template track may take the form of chinese characters, english characters or other shapes not specifically defined, etc. The application is not limited in this regard.

In some embodiments, when the feature pen 100 is an electromagnetic pen, the handwriting book 200 employs a handwriting pad (Handwriting tablet) of the electromagnetic pen. The handwriting board of the electromagnetic pen is an input device based on electromagnetic technology, and can be in communication connection with other terminal equipment. The electromagnetic pen can send out electromagnetic signal of specific frequency when using, and the inside of handwriting pad has microcontroller and two-dimensional antenna array, and microcontroller scans antenna board's X axle and Y axle in proper order, calculates the absolute coordinate of electromagnetic pen according to the size of signal again to the terminal equipment that is connected with it carries out the processing with the coordinate information.

In order to enhance the use experience of the user, in some embodiments, writing paper is laid on the handwriting board of the electromagnetic pen, and when the user makes a note through the feature pen 100, the user can directly grasp the note recording condition in real time by observing the handwriting presented on the writing paper.

In some embodiments, handwriting 200 takes a lattice book when feature pen 100 is a lattice pen. The dot matrix book is paper printed or printed with a layer of invisible dot matrix pattern. The dot pattern is composed of pixel-level dot codes, and a plurality of dot codes may constitute a bit code unit, and a bit code unit may represent a set of (x, y) coordinate values. In other words, the lattice pattern comprises at least one bit-code cell, which can be represented by a coordinate range value.

That is, the dot matrix pattern may divide a sheet of paper into individual bit code units, each of which contains the same number of pixels. In order to distinguish the bit code units, marking is carried out on specific pixel points in each bit code unit, the characteristic of each bit code unit is different through different arrangement and combination of the marked pixel points, namely, a unique bit code unit can be decoded through the marked pixel points, and the bit code units and coordinates are in one-to-one correspondence.

In some embodiments, handwriting 200 includes a target region and a non-target region. The target area is used for inputting a specific template track, and the non-target area is used for recording other note contents. In order to improve the utilization rate of the tablet, the area of the non-target area is larger than that of the target area, and the target area is disposed at the edge position of the handwriting pad 200.

For example, referring to fig. 4, fig. 4 is a schematic diagram showing the effect of a handwriting book. As shown in fig. 4, the non-target area is located on the right side of the handwriting pad, and is a note area; the target area is positioned at the left side of the handwriting book and is a query area; and the area of the target area is much larger than the area of the non-target area. When a user makes a question, a template track may be entered in the question area by the stylus 100.

In order to facilitate distinguishing between different pages and regions in the handwriting 200, in some embodiments, the dot code file corresponding to the handwriting data includes a page identifier, a region identifier, and a dot pattern. The page identifier is used for distinguishing the pages of the handwriting 200; the region identifier is used for distinguishing the region where the handwriting data is located. By identifying the above identification and dot matrix pattern, it can be determined which page and which region of the handwriting book 200 the current handwriting data is.

Thus, in some embodiments, handwriting 200 includes non-target areas and target areas. When executing the handwriting data of the target area, the processor 300 analyzes the point code file corresponding to the handwriting data. The dot code file comprises page number identification, area identification and dot matrix patterns. If the region identifier is the target region identifier, marking the handwriting data as the handwriting data of the target region, and extracting the handwriting data of the target region. If the region identifier is a non-target region identifier, marking the handwriting data as handwriting data of the non-target region. In this way, the target area and the non-target area can be distinguished according to the area identifier of the dot code file, so as to determine the input area of the handwriting data in the handwriting 200.

S2: feature recognition is performed on the drive trajectory.

After the handwriting data of the target area are obtained, characteristic recognition is carried out on the handwriting data of the target area. Because the user may have a situation of pen error in the target area, after the handwriting data is generated in the target area is monitored, the handwriting data needs to be subjected to feature recognition, so that whether the current learning behavior of the user is in doubt is determined. Namely, whether the driving track of the handwriting data is a preset template track or not needs to be detected, and if the handwriting data is the template track, the current handwriting data can be judged to be used for representing the study behavior in question; if the handwriting data is not the template track, the current handwriting data can be judged to be the handwriting data input by the user by mistake.

In some embodiments, the processor 300 also performs sample training based on the template trajectory. The template track in various writing forms is obtained as an input sample, and the input sample is trained to generate the recognition model. For example, when the template track is a character "? "when obtaining writing formats of different sizes, different pen strokes radians, etc.? "as input samples and training recognition models through the input samples so as to recognize irregular template tracks. In the using process of the video positioning device, handwriting data of a target area is detected through a trained recognition model, so that whether the current handwriting data is a template track or not is judged.

Obviously, different interactive logics are adopted in the target area and the non-target area, the handwriting data of the non-target area do not need to be subjected to feature recognition, and the video positioning device only recognizes the handwriting data of the target area. Therefore, the processing time of feature recognition can be saved, and the recognition accuracy of the template track can be improved.

Thus, as shown in fig. 5, in some embodiments, when feature recognition is performed on a driving track, feature vectors of the driving track are extracted, and the similarity of the driving track and a template track is calculated through a recognition model. If the similarity is greater than or equal to the similarity threshold, marking the driving track as a template track; if the similarity is smaller than the similarity threshold, marking the driving track as a non-template track, and deleting handwriting data corresponding to the non-template track.

Because the video positioning device only carries out feature recognition on the handwriting data of the target area, if the handwriting data is not a template track, the handwriting data of the current target area is indicated as a user error. Therefore, in order to reduce the resource occupation of the handwriting data, when the driving track of the target area is a non-template track, the corresponding handwriting data is deleted.

S3: when the driving track is a template track, the occurrence time is associated with the target video, and the sub-data stream of the target video is detected according to the occurrence time.

After the characteristic recognition is carried out on the handwriting data of the target area, whether the driving track of the current handwriting data is a template track or not can be judged. If the driving track of the handwriting data is judged to be the template track, the occurrence time of the handwriting data is related to the target video, and a sub-data stream which is close to the occurrence time in the target video is detected so as to be convenient for positioning the questionable knowledge point. The target video is recorded and broadcast courses of the teaching teacher, and the teaching teacher can start video recording and broadcast of the courses when starting to take class.

To facilitate classifying the target video, as shown in fig. 6, in some embodiments, the video positioning apparatus further includes a camera 400, where the camera 400 is configured to collect target images of the target person, and the recording process of the target video may be implemented by the camera 400. The processor 300 is further configured to control the camera 400 to capture facial features of the target image and to identify identity information of the target person based on the facial features. The identity information is information data pre-stored in the cloud or local, and the camera 400 can be used for recognizing the face of the captured target person to determine the identity information of the current target person. And inquiring the category label of the target video according to the identity information, and establishing the association relation between the category label and the target video.

For example, the target person a is a mathematician teacher, and the pre-stored identity information includes the ID, name and teaching subjects of the target person. After the recording and broadcasting of the A is started, the camera can capture the face of the A and match the identity information corresponding to the A. Through the identified identity information, the teaching discipline of A can be determined to be mathematics, and then the association relation between the mathematics category labels and the target video is established.

In some embodiments, the target video further includes a sub-tag for introducing video content. For example, the labels can be associated and matched by manual selection, classroom information matching or educational administration system, etc. such as the version of the teaching material, the section of the course, the knowledge points covered by the section of the course, etc.

To facilitate detection of sub-data streams, in some embodiments, an analysis time threshold is also obtained. When the driving track of the handwriting data is detected to be the template track, the occurrence time of the handwriting data is obtained, and the analysis time point is calculated. Wherein the analysis time point is a time point earlier than the occurrence time analysis time threshold. And taking the analysis time point as the starting time and the occurrence time as the ending time, and acquiring the sub-data stream corresponding to the time. By analyzing the sub-data stream, knowledge point content that the user has in question can be determined.

For example, template track "? The analysis time threshold is 30s, and the time period from 13 minutes 15 seconds to 33 minutes 18 seconds of the target video is the explanation of a knowledge point 'trigonometric function'. During live viewing of live lessons, the user enters "? "drive trajectory. The video locating device then "is? And (3) performing feature recognition on the handwriting data, acquiring sub-data streams from 29 minutes 47 seconds to 30 minutes 17 seconds after the recognition is successful, and performing intelligent analysis on the sub-data streams.

S4: extracting target text of the sub-data stream.

And after the sub-data stream is detected, extracting a target text of the sub-data stream, and determining the content of the corresponding knowledge point of the current sub-data stream through the target text. The target text is a keyword which appears in the sub-data stream segment and matches the knowledge point word.

Thus, in some embodiments, the audio data of the sub-data stream is extracted and converted to language text. And extracting keywords of the language text based on a natural language processing (Natural Language Processing, NLP) algorithm, and matching the keywords with preset template keywords to generate target text. Wherein the target text includes the same keywords as the template keyword content.

For example, the template keywords are knowledge points preset based on the teaching material version and the course section, including "trigonometric function", "quadrant angle", "axis angle", "arbitrary angle", and the like. And detecting the corresponding sub-data stream in the target video according to the occurrence time of the template track. Extracting audio of the sub-data stream and converting the audio to text language. Keywords are extracted from the converted text language through an NLP algorithm, wherein the extracted keywords are a trigonometric function, an inner angle, an elementary function, an independent variable and the like. And comparing and matching the extracted keywords with the template keywords, and determining that the target text corresponding to the sub-data stream is a 'trigonometric function'.

S5: and performing slicing processing on the target video according to the occurrence time and the target text to generate the target slice video.

In order to save the retrieval time when a user plays back a target video, after extracting a target text, slicing the target video through the occurrence time of a template track and the content of the target text, and accurately dividing knowledge points in question of the user into sliced video fragments. Therefore, when the user plays back, the explanation frequency band of the knowledge points does not need to be repeatedly searched in the target video, and the positioning efficiency of the knowledge points can be improved.

To facilitate performing slicing processing on a target video, in some embodiments, a slicing time threshold is detected and slicing time points are calculated, as shown in fig. 7. Wherein the slicing time point is a time point earlier than the occurrence time by a slicing time threshold; the slicing time threshold is a self-defined time threshold and can be adaptively adjusted according to application scenes. The target video is segmented according to the slicing time point and the occurrence time to generate a target slice video. The starting time of the target slice video is a slice time point, and the ending time of the target slice video is an occurrence time. After the target slice video is generated, the association relation between the target text and the target slice video is established so as to be convenient for classifying and managing the target slice video.

Alternatively, in some embodiments, as shown in fig. 8, a target data stream of a target video is acquired, and audio data and video data in the target data stream are extracted. The audio data is converted into language text, and picture frames of the video data are extracted, and the language text in the picture frames is identified. Audio data of the language text including the target text and/or video data of the language text including the target text are partitioned into target slice videos. And then establishing the association relation between the target text and the target slice video.

That is, the embodiment of the application can slice the target video through the slice time threshold; or, the knowledge points can be extracted according to the voice of the teaching teacher in the video stream and the keywords on the screen courseware, and intelligent slicing can be performed according to the knowledge points.

S6: and pushing the target slice video to the user terminal.

After the slicing process is performed on the target video, the generated target slice video can be pushed to the user terminal for playback and viewing by the user. Thus, the user can input a driving trajectory similar to the template trajectory in the target area through the crayon 100 when a certain knowledge point is not understood during the course of the classroom learning. At this time, the video positioning device can slice the knowledge point content corresponding to the current driving track occurrence time, generate an accurate knowledge point explanation video clip, and timely push the knowledge point explanation video clip to the user. Therefore, the user does not need to search in the whole target video, and the positioning efficiency of the knowledge points can be improved.

In some embodiments, the user terminal is a terminal device with a user account logged in, which may be logged in to a particular website or application. Therefore, the user plays back and views the video through the user account number which is equal to the unique representation user identity in different devices, and the convenience of viewing the target slice video can be improved.

In some embodiments, the user account includes a student account, a teacher account, and a parent account. The teacher account and the parent account are administrator accounts of the student accounts, and the administrator accounts can watch target slice video and handwriting data in the student accounts so as to be convenient for learning conditions of users.

Since the same knowledge points also have different forms of explanation, in order to facilitate the user to consolidate the knowledge points, in some embodiments, the associated video is also queried according to the target text. Wherein the associated video is a video comprising a target text label. And pushing the associated video to the user terminal when pushing the target slice video. Through the determined knowledge points, more associated videos related to the knowledge points are queried, so that the understanding of the knowledge points by the user can be enhanced.

In some embodiments, the playing frequency of the target video and the associated video is detected, and the arrangement priority of the target slice video and the associated video is set according to the playing frequency. Wherein, the arrangement priority and the playing frequency are in a proportional relation, and the more the playing batches, the higher the corresponding arrangement priority. And then pushing the target slice video and the associated video to the user terminal according to the order of the arranged priorities, so that the knowledge point video with higher quality can be recommended to the user preferentially.

Based on the video positioning device based on learning behavior, some embodiments of the present application further provide a video positioning method based on learning behavior, as shown in fig. 3, including the following program steps:

s1: obtaining handwriting data of a target area, wherein the handwriting data comprises a driving track input by a characteristic pen and the occurrence time of the driving track;

s2: performing feature recognition on the driving track;

s3: when the driving track is a template track, associating the occurrence time with a target video, and detecting a sub-data stream of the target video according to the occurrence time;

s4: extracting target text of the sub-data stream;

s5: executing slicing processing on the target video according to the occurrence time and the target text to generate a target slice video;

s6: pushing the target slice video to a user terminal.

It can be understood that the video positioning device and method based on learning behavior provided by the embodiment of the application take video positioning application scenes applied to classroom learning as examples. Obviously, the video positioning device and method based on learning behavior provided by the application can also be applied to application scenes of other video positioning. The application is not limited in this regard.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A learning behavior-based video locating apparatus, comprising:

a character pen configured to input handwriting data;

a handwriting pad configured to receive the handwriting data, the handwriting pad comprising a target area;

a processor configured to:

performing feature recognition on the driving track;

when the driving track is a template track, associating the occurrence time with a target video, acquiring an analysis time threshold, and calculating an analysis time point, wherein the analysis time point is a time point earlier than the analysis time threshold of the occurrence time;

taking the analysis time point as a starting time and the occurrence time as an ending time, and acquiring a sub-data stream corresponding to the target video;

extracting target text of the sub-data stream;

pushing the target slice video to a user terminal.

2. The learning behavior based video localization apparatus of claim 1, wherein the script comprises a non-target area; the processor executes the handwriting data of the acquired target area and is configured to:

analyzing a point code file corresponding to the handwriting data, wherein the point code file comprises a page number mark, a region mark and a dot matrix pattern;

if the area identifier is a target area identifier, marking the handwriting data as handwriting data of a target area, and extracting the handwriting data of the target area;

and if the area identifier is a non-target area identifier, marking the handwriting data as handwriting data of a non-target area.

3. The learning behavior based video localization apparatus of claim 1, wherein the processor performs feature recognition on the drive trajectory configured to:

extracting a characteristic vector of the driving track;

calculating the similarity between the driving track and the template track through an identification model;

if the similarity is greater than or equal to a similarity threshold, marking the driving track as a template track;

and if the similarity is smaller than the similarity threshold, marking the driving track as a non-template track, and deleting handwriting data corresponding to the non-template track.

4. The learning behavior based video localization apparatus of claim 1, wherein the processor executing the extraction of the target text of the sub-data stream is configured to:

extracting audio data of the sub data stream;

converting the audio data into language text;

extracting keywords of the language text based on a natural language processing algorithm;

matching the keywords with preset template keywords to generate the target text; the target text includes the same keywords as the template keyword content.

5. The learning behavior based video localization apparatus of claim 1, wherein the processor performs slicing processing on the target video according to the time of occurrence and the target text, configured to:

detecting a slicing time threshold;

calculating a slice time point, the slice time point being a time point earlier than the occurrence time by the slice time threshold;

dividing the target video according to the slicing time point and the occurrence time to generate the target slicing video; the starting time of the target slice video is the slice time point, and the ending time of the target slice video is the occurrence time;

and establishing an association relation between the target text and the target slice video.

6. The learning behavior based video localization apparatus of claim 1, wherein the processor performs slicing processing on the target video according to the time of occurrence and the target text, configured to:

acquiring a target data stream of the target video;

extracting audio data and video data in the target data stream;

converting the audio data into language text;

extracting a picture frame of the video data and identifying language text in the picture frame;

separating the language text including the audio data of the target text and/or the video data of the language text including the target text into the target slice video;

7. The learning behavior based video locating device of claim 1, wherein the processor is further configured to:

inquiring an associated video according to the target text, wherein the associated video comprises the target text label;

and pushing the associated video to the user terminal when pushing the target slice video.

8. The learning behavior based video locating device of claim 7, wherein the processor is further configured to:

detecting the play frequency of the target video and the associated video;

setting the arrangement priority of the target slice video and the associated video according to the play frequency, wherein the arrangement priority and the play frequency are in a proportional relation;

and pushing the target slice video and the associated video to the user terminal according to the order of the arranged priorities.

9. The learning behavior based video localization apparatus of claim 1, further comprising a camera configured to capture a target image of a target person; the processor is configured to:

controlling the camera to capture the face characteristics of the target image;

identifying identity information of the target person based on the face features;

inquiring a category label of the target video according to the identity information;

and establishing an association relation between the category label and the target video.

10. A method for video localization based on learning behavior, comprising:

performing feature recognition on the driving track;

extracting target text of the sub-data stream;

pushing the target slice video to a user terminal.