CN117789227A - Data processing method, related device, storage medium and computer product - Google Patents

Data processing method, related device, storage medium and computer product Download PDF

Info

Publication number
CN117789227A
CN117789227A CN202211161992.7A CN202211161992A CN117789227A CN 117789227 A CN117789227 A CN 117789227A CN 202211161992 A CN202211161992 A CN 202211161992A CN 117789227 A CN117789227 A CN 117789227A
Authority
CN
China
Prior art keywords
character
stroke
strokes
writing
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211161992.7A
Other languages
Chinese (zh)
Inventor
崔颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202211161992.7A priority Critical patent/CN117789227A/en
Publication of CN117789227A publication Critical patent/CN117789227A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the application relates to the technical field of computers and discloses a data processing method, related equipment, a storage medium and a computer product, wherein the method comprises the following steps: performing character recognition processing on the target character based on the sequence data of the stroke track points of each stroke of the target character to obtain a reference character; and simultaneously, carrying out feature extraction processing on the target character according to the sequence data of the stroke track points of each stroke and the writing sequence of each stroke to obtain morphological features between every two strokes in at least one stroke forming the target character. And then, the morphological characteristics between every two strokes in at least one stroke of the reference character with the composition body structure as the reference body structure are acquired, and the morphological characteristics between the strokes are compared with the morphological characteristics between the two strokes corresponding to the target character, so that a body recognition result for indicating the matching degree of the body structure of the target character and the reference body structure is obtained. By adopting the embodiment of the application, the writing quality can be monitored.

Description

Data processing method, related device, storage medium and computer product
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, a related device, a storage medium, and a computer product.
Background
In writing scenes such as handwriting practice, transcription and the like, not only the correct and incorrect writing contents are required to be paid attention to, but also the writing quality in the writing process, such as whether the writing is standard, whether the stroke sequence is correct and the like, are required to be paid attention to. However, the conventional optical character recognition (OCR, optical Character Recognition) technology can only recognize the correct or incorrect writing content finally written, and cannot monitor the writing quality. Therefore, how to monitor writing quality is a problem that needs to be solved at present.
Disclosure of Invention
Embodiments of the present application provide a data processing method, related apparatus, storage medium, and computer product, capable of monitoring writing quality.
In one aspect, an embodiment of the present application provides a data processing method, including:
determining a target character obtained by writing, and acquiring sequence data of stroke track points of all strokes in at least one stroke forming the target character and writing sequence of all strokes, wherein any stroke consists of a plurality of stroke track points, and the sequence data of the stroke track points of any stroke comprises position information of all stroke track points in any stroke;
performing character recognition processing on the target character based on the sequence data of the stroke track points of each stroke to obtain a reference character;
According to the sequence data of the stroke track points of each stroke and the writing sequence of each stroke, carrying out feature extraction processing on the target character to obtain morphological features between every two strokes in at least one stroke forming the target character;
the method comprises the steps of obtaining morphological characteristics between every two strokes in at least one stroke of a reference character with a composition body structure as the reference body structure, and comparing the morphological characteristics between every two strokes of the obtained reference character with morphological characteristics between two corresponding strokes of the target character to obtain a body recognition result of the target character, wherein the body recognition result is used for indicating the matching degree of the body structure of the target character and the reference body structure.
In one aspect, an embodiment of the present application provides a data processing apparatus, including an acquisition unit and a processing unit, where:
the acquisition unit is used for determining a target character obtained by writing and acquiring sequence data of stroke track points of all strokes in at least one stroke forming the target character and writing sequence of all strokes, wherein any stroke consists of a plurality of stroke track points, and the sequence data of the stroke track points of any stroke comprises position information of all stroke track points in any stroke;
The processing unit is used for carrying out character recognition processing on the target character based on the sequence data of the stroke track points of each stroke to obtain a reference character;
the processing unit is further used for carrying out feature extraction processing on the target character according to the sequence data of the stroke track points of each stroke and the writing sequence of each stroke to obtain morphological features between every two strokes in at least one stroke forming the target character;
the processing unit is further configured to obtain morphological features between every two strokes in at least one stroke of a reference character with a body structure being a reference body structure, and compare the obtained morphological features between every two strokes of the reference character with morphological features between two corresponding strokes of the target character to obtain a body recognition result of the target character, where the body recognition result is used to indicate a matching degree of the body structure of the target character and the reference body structure.
In another aspect, embodiments of the present application provide a computer device including an input interface and an output interface, the computer device further including:
A processor adapted to implement one or more computer programs; the method comprises the steps of,
a computer storage medium storing one or more computer programs adapted to be loaded by the processor and to perform the data processing method described above.
In another aspect, embodiments of the present application provide a computer storage medium storing one or more computer programs adapted to be loaded by a processor and to perform the above-described data processing method.
In another aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the above-described data processing method.
In the embodiment of the application, because the sequence data of the stroke track points of any stroke comprises the position information of each stroke track point in any stroke, morphological characteristics which can represent the relationship of the position, the length, the direction and the like between two strokes can be extracted through the obtained sequence data of the stroke track points of each stroke in at least one stroke forming the target character and the writing sequence of each stroke. Since morphological features can characterize the relationship in terms of position, length, direction, etc. between two strokes, and a character is made up of multiple strokes, if the morphology between every two strokes in the multiple strokes making up the character is in accordance with the writing standard, the morphological structure of the character should also be in accordance with the writing standard. Therefore, the reference character which is the same as the target character and has the body structure of the reference body structure can be identified by performing character identification processing on the target character, and then the body identification result for indicating the matching degree of the body structure of the target character and the reference body structure can be obtained by further comparing the morphological characteristics between every two strokes of the reference character with the morphological characteristics between the corresponding two strokes of the target character. Since the reference body structure refers to a body structure conforming to the writing standard, and the writing quality can be judged from the matching degree of the body structure of the character obtained by writing and the body structure conforming to the writing standard, the writing quality of the target character can be monitored through the obtained body recognition result of the target character.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of eight-direction feature extraction according to an embodiment of the present disclosure;
FIG. 4 is an interface schematic diagram of a writing interface provided in an embodiment of the present application;
FIG. 5 is a flowchart of another data processing method according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of an implied interface provided by embodiments of the application;
FIG. 7 is a schematic diagram of a training process of writing a recognition model according to an embodiment of the present application;
FIG. 8a is a schematic diagram of a writing recognition model according to an embodiment of the present application;
FIG. 8b is a process schematic of a convolution operation provided by an embodiment of the present disclosure;
FIG. 9 is an interface schematic diagram of another writing interface provided by an embodiment of the present application;
FIG. 10 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
In writing scenes such as handwriting practice, transcription, silently writing, writing operation, writing, and the like, not only the correctness of written writing content, but also the writing quality in the writing process, such as whether writing is standard, whether the stroke sequence is correct, and the like, are required to be paid attention. Especially for the people who learn new language characters, the writing quality in the writing process is more important than the writing content.
At present, the writing quality in the writing process can be monitored mainly or manually, people who know the writing content, such as teachers, parents and the like, accompany the students or children, the writing of the students or children is observed in the whole process, if the conditions of wrong writing stroke sequence, standard writing and the like exist in the writing process, and the writing result of the children is detected after the writing is finished. The mode consumes labor, one-to-one monitoring is required, and for a teacher, it is difficult to pay attention to each student in a class at the same time; the method consumes time, and requires the teacher or the parents to concentrate on the writing process, and the writing process can be missed and cannot be remedied if the process is slightly distracted; the method has higher requirements on the monitor, the monitor is required to be familiar with strokes and standards of the written content and to master the strokes and standards correctly, so that the writing quality can be judged and corrected at the first time, and if the monitor is unfamiliar with the written content, the condition of neglecting writing errors or guiding the writing errors of the writer can occur.
The existing technology can only identify the correctness or the error of the written content through optical character recognition (OCR, optical Character Recognition) and other technologies after writing is finished, specifically, the existing technology can only detect the error/correctness of the written character, and cannot detect whether the stroke sequence written by a student is correct or not, whether the written content is standard or not and the like in the writing process reflects the writing quality, but reflects the writing quality which is the important point and the place of the core to be concerned in the teaching process, so that the existing technology lacks detection and feedback to the learning process and cannot accurately measure the problem of the student. In addition, the recognition accuracy of the current technology is not high, for example, the OCR has the generalized characteristic, so that the OCR cannot recognize the character in a plurality of shapes, the part error in one character and the like, and thus erroneous judgment is caused, and the high-accuracy guidance cannot be provided for students.
Based on the above, the embodiment of the application provides a data processing scheme, which can acquire a writing track in the writing process of a target object so as to obtain sequence data of stroke track points of each stroke in at least one stroke forming the target character and the writing sequence of each stroke. Then, the morphological characteristics between every two strokes in at least one stroke forming the target character can be extracted through the sequence data of the stroke track points of each stroke and the writing sequence of each stroke; meanwhile, character recognition processing can be carried out on the target character to obtain a reference character with a body structure which accords with the writing standard, so that morphological characteristics between every two strokes of the reference character and morphological characteristics between two corresponding strokes of the target character can be conveniently compared, and a body recognition result of the target character is obtained, wherein the body recognition result of the target character comprises a matching degree for indicating the body structure of the target character and the reference body structure.
It should be noted that, any one of the strokes is composed of a plurality of stroke track points, and the sequence data of the stroke track points of any one of the strokes includes the position information of each of the stroke track points in any one of the strokes. Specifically, the position information of each stroke track point may include an abscissa and an ordinate of each stroke track point in a preset coordinate system. The preset coordinate system refers to a coordinate system constructed based on a writing scene, for example, when writing on a touch screen such as a touch drawing board and water Mo Bing, the preset coordinate system can be a coordinate system constructed based on the size of drawing paper displayed on the touch screen.
It can be seen that, because the sequence data of the stroke track points of any one stroke in the scheme includes the position information of each stroke track point in any one stroke, the morphological characteristics capable of representing the relationship of position, length, direction and the like between two strokes can be extracted through the obtained sequence data of the stroke track points of each stroke in at least one stroke forming the target character and the writing sequence of each stroke. Since morphological features can characterize the relationship in terms of position, length, direction, etc. between two strokes, and a character is made up of multiple strokes, if the morphology between every two strokes in the multiple strokes making up the character is in accordance with the writing standard, the morphological structure of the character should also be in accordance with the writing standard. Therefore, the reference character which is the same as the target character and has the body structure which accords with the writing standard can be identified by carrying out character identification processing on the target character, and then the body identification result for indicating the matching degree of the body structure of the target character and the reference body structure can be obtained by further comparing the morphological characteristics between every two strokes of the reference character and the morphological characteristics between the corresponding two strokes of the target character. Therefore, in the scheme, the matching degree of the body structure of the target character and the body structure conforming to the writing standard can be determined by comparing the morphological characteristics between every two strokes of the reference character with the morphological characteristics between the two corresponding strokes of the target character, so that the purpose of monitoring the writing quality is achieved.
Wherein the writing track may be obtained in response to a writing operation of the target object. The target object refers to a person with writing ability. In particular, the target object may be a person writing on a writing interface. For example, the target object may be a student, a child, a handwriting practicing adult, etc., without limitation. Further, the target character refers to a single character or a field having a specific meaning composed of a plurality of characters. In particular, the target character refers to a single character if a single character is specified in a certain class of language to represent a particular meaning. For example, each Chinese character in Chinese has its corresponding meaning, so in a Chinese scenario, the target character refers to a single Chinese character. If a field composed of one or more characters is specified in a language of a certain class to represent a specific meaning, the target character refers to a field composed of a plurality of characters having a specific meaning. Illustratively, a word in english is typically composed of a plurality of alphabetic characters, so in an english scenario, the target character refers to a word composed of a plurality of letters, such as the english word "gear" is a target character.
Meanwhile, the reference body structure refers to a body structure conforming to the writing standard. Specifically, the writing standard is a manually specified writing standard, and the writing quality of a character can be reflected by judging whether or not the character meets the writing standard. Therefore, the character whose feature structure is the reference feature structure refers to a character with high writing quality, so that the writing quality of the target character can be reflected by the matching degree of the feature structure of the target character and the reference feature structure. It should be noted that, the character whose body structure is the reference body structure may also be simply referred to as the character whose body structure is accurate, and the character whose body structure is not the reference body structure may also be simply referred to as the character whose body structure is inaccurate, which will not be described in detail later.
In addition, since all lines are composed of points, each stroke is composed of a plurality of stroke track points, the writing track of the target object can be obtained by dotting in the process of obtaining the writing track of the target object. Specifically, the dotting acquisition writing trace refers to acquiring the writing trace when the acquisition time is reached, wherein the acquisition time interval between any one acquisition time and the next acquisition time of any one acquisition time is the same. For example, the acquisition time interval when the dotting acquires the writing track is set to 0.02 seconds. If the student's pupil starts writing on the electronic screen at 11 points 20 minutes 35.00 seconds, it is possible to determine the point in time at which writing is started as the first acquisition time, and acquire the writing trace generated on the electronic screen at 11 points 20 minutes 35.00 seconds. Since the acquisition time interval is 0.02 seconds, the second acquisition time is 11 points 20 minutes 35.02 seconds, and when 11 points 20 minutes 35.02 seconds are reached, the written trace generated on the electronic screen at this time is acquired. Since the acquisition time is a time point, the generated writing trace acquired at each acquisition time is usually a point in the writing trace, i.e. the above-mentioned stroke trace point. It should be noted that, when the acquisition time arrives, there may be no trace (i.e., no writing) of the target object at this time, and then the stroke track point cannot be acquired at the acquisition time. Alternatively, when the stroke track point is not acquired at the acquisition time when there is a continuous target number, the acquisition of the writing track is stopped. The target number may be set manually or by a system, and is not limited thereto. Illustratively, the target number may be a number of 10, 30, 100, etc.
Meanwhile, when each stroke track point is acquired, the acquisition time or the acquisition sequence of each stroke track point and the position information of each stroke track point can be recorded, so that the sequence data of the stroke track point can be obtained through the acquisition time or the acquisition sequence of each stroke track point and the position information of each stroke track point. Specifically, the sequence track data of the stroke track points may be a time sequence in which the abscissa is the acquisition time of the stroke track points and the ordinate is the position information of the stroke track points, or may be a sequence in which the abscissa is the acquisition order of the stroke track points and the ordinate is the position information of the stroke track points, which is not limited herein. The acquisition sequence of each stroke track point can be obtained by converting the acquisition time of each stroke track point.
In addition, the morphological feature refers to a feature of the stroke morphology. In particular, the morphological feature may be a feature of every two strokes in a character in terms of relative position, spacing, relative angle, relative length, and the like. For example, the relative length between the upper and lower horizontal in Chinese characters "two" should be 2:3. The shape of the target character comprises the shape of the whole target character and the shape of the local target character. Specifically, whether a Chinese character is written with standard can be judged from the whole morphological structure of the Chinese character (for example, whether the fourth horizontal line from top to bottom is longer than other horizontal lines in the 'getting' character) and the local morphological structure of the Chinese character (for example, whether the first horizontal line from top to bottom is longer than the second horizontal line in the 'getting' character, whether the third horizontal line from top to bottom is longer than the second horizontal line, and whether the third horizontal line from top to bottom is longer than the first horizontal line) and the like.
Based on the above data processing method, the embodiment of the present application provides a data processing system, which may refer to fig. 1, where the data processing system shown in fig. 1 may include a plurality of terminal devices 101 and a plurality of servers 102, where a communication connection is established between any one of the terminal devices and any one of the servers. Terminal device 101 may include any one or more of a smart phone, tablet, notebook, desktop, smart car, and smart wearable device. A wide variety of Applications (APP) may be running within the terminal device 101, such as a multimedia play client, social client, browser client, information streaming client, educational client, and so forth. The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligence platform. The terminal device 101 and the server 102 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
In one embodiment, the above data processing method may be executed only by the terminal device 101 in the data processing system shown in fig. 1, and the specific execution process is: the terminal device 101 displays a writing interface first, and then acquires a writing track of the target object in response to a writing operation of the target object on the writing interface, so as to obtain sequence data of stroke track points of each stroke in at least one stroke constituting the target character, and a writing order of each stroke. Thereafter, the terminal device 101 extracts morphological features between every two strokes of at least one of the strokes constituting the target character from the sequence data of the stroke track points of the respective strokes and the writing order of the respective strokes. Meanwhile, the terminal device 101 may perform character recognition processing on the target character to obtain a reference character whose body structure is a reference body structure. Finally, the terminal device 101 compares the morphological feature between every two strokes of the reference character with the morphological feature between the corresponding two strokes of the target character, so as to obtain the body recognition result of the target character. Alternatively, the terminal device 101 may output the feature recognition result of the target character, so that the target object knows how well the feature structure of the target character matches the reference feature structure by writing itself.
In another embodiment, the above data processing method may be run in a data processing system, which may include a terminal device and a server. Specifically, the data processing method may be performed by the terminal device 101 and the server 102 included in the data processing system shown in fig. 1, and the specific implementation process is: the terminal device 101 displays a writing interface first, and then acquires a writing track of the target object in response to a writing operation of the target object on the writing interface, so as to obtain sequence data of stroke track points of each stroke in at least one stroke constituting the target character, and a writing order of each stroke. The terminal device 101 then transmits the sequence data of the stroke track points of the respective strokes of the at least one stroke constituting the target character and the writing order of the respective strokes to the server 102, and the server 102 extracts morphological features between every two strokes of the at least one stroke constituting the target character by the sequence data of the stroke track points of the respective strokes and the writing order of the respective strokes. Meanwhile, the server 102 may perform character recognition processing on the target character to obtain a reference character with a body structure being a reference body structure, and compare morphological features between every two strokes of the reference character with morphological features between two strokes corresponding to the target character to obtain a body recognition result of the target character. Finally, the server 102 transmits the shape recognition result of the target character to the terminal device 101. Alternatively, the terminal device 101 may output the feature recognition result of the target character after receiving the feature recognition result of the target character, so that the target object knows the matching degree of the feature structure of the target character and the reference feature structure obtained by writing.
Based on the above data processing scheme and the data processing system, the embodiment of the application provides a data processing method. Referring to fig. 2, a flow chart of a data processing method according to an embodiment of the present application is provided. The data processing method shown in fig. 2 may be performed by a terminal device. The data processing method shown in fig. 2 may include steps S201 to S204:
s201, determining a target character obtained by writing a target object, and acquiring sequence data of stroke track points of all strokes in at least one stroke forming the target character and writing sequence of all strokes.
In the embodiment of the present application, the target object may be written on a touch screen such as a touch panel or water Mo Bing, may be written in the air by means of gestures or the like, or may be written by means of a tool such as a touch pen or a touch handle, which is not limited herein. Meanwhile, in the writing process of the target object, the writing track of the target object can be obtained by dotting, the obtaining time or the obtaining sequence of each writing track point and the position information of each writing track point are recorded, and therefore the sequence data of the character track points of the target character are obtained.
The specific way to obtain the sequence data of the stroke track points of each stroke in the at least one stroke composing the target character may be: deconstructing the sequence data of the character track points of the target character to obtain the sequence data of the stroke track points of all strokes in at least one stroke forming the target character. Alternatively, deconstructing the sequence data of the character track points of the target character obtained in real time to obtain the sequence data of the stroke track points of each stroke in at least one stroke forming the target character in real time; or the sequence data of the character track points of the target character is obtained from the target database, and then deconstructed to obtain the sequence data of the stroke track points of each stroke in at least one stroke composing the target character. The target database may be a local database or a remote database of the terminal device or the server, which is not limited herein. That is, the writing track data (e.g., the sequence data of the character track points of the target character) acquired in real time may be processed to immediately obtain the sequence data of the corresponding stroke track points, or the acquired writing track data may be stored in the target database, and when the writing quality of the target character needs to be monitored, the writing track data may be acquired from the target database and processed, which is not limited herein.
Specifically, the deconstructing process is to break down the sequence data of the character track points of one character into the sequence data of the stroke track points of the individual strokes constituting the character. In an actual writing process, each stroke has a pen-up and a pen-down, wherein the pen-up represents a first stroke track point of one stroke, and the pen-down represents a last stroke track point of one stroke. In general, the time interval between the drop of any stroke and the start of the next stroke of any stroke is relatively large. Therefore, the specific way of deconstructing the sequence data of the character track points of the target character to obtain the sequence data of the stroke track points of each stroke in at least one stroke forming the target character may be: determining a target track point in the sequence data of the character track point, wherein the time interval between the target track point and the next track point is larger than the time interval between the target track point and the last track point of the target track point, and the time interval between the target track point and the next track point is larger than the time interval between the next two track points of the target track point; and then cutting the sequence data of the character track points of the target character based on the target track points to obtain the sequence data of the stroke track points of all strokes in at least one stroke forming the target character.
Optionally, in the actual writing process, the distance between the pen-down of any stroke and the pen-up of the next stroke of any stroke is also greater than the distance between the pen-up of the stroke track points belonging to the same stroke, so that the specific way of deconstructing the sequence data of the character track points of the target character to obtain the sequence data of the stroke track points of each stroke in at least one stroke forming the target character may be as follows: determining a distance track point in the sequence data of the character track point, wherein the distance track point refers to a track point with a distance between the next track point and a distance between the target track point and the last track point of the distance track point and a distance between the next track point of the distance track point; and then cutting the sequence data of the character track points of the target character based on the distance track points to obtain the sequence data of the stroke track points of all strokes in at least one stroke forming the target character. It should be noted that, the distance between two track points may be calculated by the position information of two track points in the sequence data of track points.
In one possible implementation, since the writing habit of a person prefers to write a piece of text at a time, the actually obtained writing track point data is often writing track point data of a plurality of characters. The specific way to obtain the sequence data of the stroke track points of each stroke in the at least one stroke composing the target character may be: acquiring sequence data of a text track point of a target text written by a target object, wherein the target text comprises a plurality of target characters; then, carrying out data slicing processing on the sequence data of the text track points to obtain the sequence data of the character track points of each target character in the text written by the target object; deconstructing the sequence data of the character track points of each target character to obtain the sequence data of the stroke track points of each stroke in at least one stroke forming each target character.
Specifically, the specific way of performing the data slicing processing on the sequence data of the text track points may be: determining a target text written by a target object, wherein the target text comprises a plurality of target characters; performing character segmentation processing on the target text to obtain each target character; and finally, determining the sequence data of the character track points matched with each target character in the sequence data of the text track points.
Specifically, the character segmentation process recognizes each character in a text composed of a plurality of characters, and segments each character from other characters in the text. The specific manner of the character segmentation process may be one or more of a plurality of character segmentation algorithms such as an average segmentation algorithm, a color filling segmentation algorithm (CFS), a dripping segmentation algorithm, and the like, which is not limited herein. Alternatively, the specific manner of determining the sequence data of the character track points matched with each target character in the sequence data of the text track points may be: determining a first acquisition time or a first acquisition sequence of a first character track point of each target character and a second acquisition time or a second acquisition sequence of a last character track point of each target character, and dividing sequence data of the text track points based on the first acquisition time or the first acquisition sequence of each target character and the second acquisition time or the second acquisition sequence of each target character to obtain sequence data of the character track points of each target character.
S202, performing character recognition processing on the target character based on the sequence data of the stroke track points of each stroke to obtain a reference character.
In the embodiment of the present application, the method for the character recognition processing may specifically be: firstly, carrying out direction characteristic extraction processing on sequence data of stroke track points of each stroke to obtain direction track characteristics of a target character in a plurality of preset directions; then, splicing the plurality of direction track features to obtain target track features; and finally searching characters with track characteristics matched with the target track characteristics in a preset database to obtain reference characters. The target database may be a local database or a remote database of the terminal device or the server, which is not limited herein.
In addition, a plurality of template characters with the body structures conforming to the writing standard and track characteristics of the template characters can be stored in the preset database. Specifically, the sequence data of the stroke track points of each stroke in at least one stroke forming each template character can be subjected to direction feature extraction processing to obtain template direction track features of each template character in a plurality of preset directions; then, splicing the track features of the template directions to obtain track features of each template character; and finally, storing each template character and the obtained track characteristics into a preset database correspondingly.
Alternatively, when no character with the track characteristic matching the target track characteristic is found in the preset database, a character recognition result for indicating that the target character is wrongly written can be generated; then outputting a character recognition result to prompt the target object to write again; finally, the character obtained by the target object being rewritten is taken as the target character, and steps S201 to S204 are performed.
In addition, all preset directions in the plurality of preset directions are added, and 360-degree all directions can be obtained. Specifically, the number of the plurality of preset directions may be four, eight, or another number, which is not limited herein. For example, the direction feature extraction processing may be performed by a four-direction feature extraction algorithm, an eight-direction feature extraction algorithm, or the like, which is not limited herein.
Then, the specific way of extracting the direction characteristics from the sequence data of the stroke track points of each stroke to obtain the direction track characteristics of the target character in a plurality of preset directions may be: determining a direction vector of each stroke track point in each stroke based on the position information of each stroke track point in the sequence data of the stroke track points of each stroke, wherein the direction vector of any stroke track point is obtained through the position information of any stroke track point, the position information of the last stroke track point of any stroke track point and the position information of the next stroke track point of any stroke track point; and then, based on the position information of each stroke track point and the direction vector of each stroke track point, converting each stroke track point into a characteristic plane in a preset direction matched with the direction indicated by the direction vector of each stroke track point so as to obtain the direction track characteristics of each preset direction. The preset direction matched with the direction indicated by the direction vector of each stroke track point refers to the direction indicated by the direction vector of each stroke track point being in the direction range of the preset direction.
For example, the direction indicated by the direction vector of the stroke track point a is 35 degrees, and the direction range of the preset direction B is 0 to 45 degrees, it may be determined that the direction indicated by the direction vector of the stroke track point a falls within the direction range of the preset direction B, and thus it may be determined that the direction indicated by the direction vector of the stroke track point a matches the preset direction B.
In a specific implementation, please refer to fig. 3, a schematic diagram of eight-direction feature extraction direction is shown. Wherein the number of preset directions in the eight-direction feature extraction is 8, and the eight preset directions refer to eight directions of D1 to D8 as shown in fig. 3. It should be noted that, the direction in which D1 points may be set to be 0 degrees, and the direction in which D5 points may be set to be 180 degrees, and then the range of the direction of D1 may be determined to be-22.5 to 22.5 degrees; d2 points in a direction of 45 degrees and in a range of 22.5 to 67.5 degrees; d3 points in a direction of 90 degrees in a range of 67.5 to 112.5 degrees; d4 points in a direction of 135 degrees and in a range of 112.5 to 157.5 degrees; d5 has a direction range of 157.5 to 202.5 degrees; d6 points in a direction of 225 degrees and in a range of 202.5 to 247.5 degrees; d7 points in a direction of 270 degrees and in a range of 247.5 to 292.5 degrees; d8 points in a direction 315 degrees and in a range of 292.5 to 337.5 degrees.
After the preset direction is determined, any stroke track point P in the sequence data of the stroke track points of each stroke can be obtained m Position information (x) m ,y m ),P m Is the last stroke track point P m-1 Position information (x) m-1 ,y m-1 ) And P m Is the next stroke track point P m+1 Position information (x) m+1 ,y m+1 ). Then, as shown in the following formula, based on P m Position information (x) m ,y m ),P m-1 Position information (x) m-1 ,y m-1 ),P m+1 Position information (x) m+1 ,y m+1 ) P can be calculated m Direction vector V of (2) m
Wherein, when P m In order to initiate a stroke track point,when P m In the case of a non-terminating stroke track point,when P m To terminate strokesWhen track points are located, < >>
After the direction vector of each stroke track point is calculated, each stroke track point is converted to a feature plane of a preset direction matched with the direction indicated by the direction vector of each stroke track point based on the position information of each stroke track point and the direction vector of each stroke track point. The feature plane is composed of 8 grids with the size of 8 x 8, the size of the feature plane corresponding to each preset direction is 64 x 64, and the feature intensity of the stroke track points on the feature plane can be quantified by using a convolution formula of a gaussian filter as follows:
wherein F is d Refers to the direction track characteristics of the preset direction corresponding to the characteristic plane, (x) i ,y i ) Refers to the position information, x, of the stroke track point on the feature plane i In abscissa, y i Is the ordinate; g (x) is a gaussian filter, the calculation formula of which is:
where λ refers to the sampling wavelength of the gaussian filter, the value may be 8, and n refers to the control parameter, typically twice λ.
After the convolution operation on the feature planes of eight preset directions is completed, 8 feature vectors (i.e., direction track features) of 8 x 8 dimensions can be obtained respectively, then the feature vectors of 8 x 8 dimensions can be obtained by stitching the feature vectors of eight preset directions, and finally vectorization operation (such as vector product) is performed on the feature vectors of 8 x 8 dimensions, so that feature vectors of 512 dimensions (i.e., target track features) can be obtained.
S203, performing feature extraction processing on the target character according to the sequence data of the stroke track points of each stroke and the writing sequence of each stroke to obtain morphological features between every two strokes in at least one stroke forming the target character.
In the embodiment of the present application, the specific manner of the feature extraction process may be: obtaining stroke description information based on the position information of each stroke track point in the sequence data of the stroke track points of each stroke; wherein the stroke description information includes, but is not limited to, a position of each stroke, a stroke length of each stroke, a stroke direction of each stroke, and the like for describing the information of the stroke; then, analyzing and processing the stroke description information of every two strokes to determine the stroke relation information between every two strokes; wherein the stroke relation information includes information indicating an azimuth relation between two strokes such as a relative stroke position, a stroke interval, and a stroke angle, which is not limited herein; and finally, carrying out feature coding processing on the stroke relation information between every two strokes to obtain morphological features between every two strokes of the target character.
Wherein, the stroke description information of every two strokes is analyzed and processed, and the specific mode for determining the stroke relation information between every two strokes can comprise one or more of the following: based on the positions of the strokes in each two strokes, calculating the relative positions of each two strokes; based on the stroke length of each stroke in every two strokes, calculating the relative length of every two strokes; based on the stroke direction of each stroke, the intersection position, the intersection angle and the like of every two strokes are calculated.
Therefore, the morphological characteristics between every two strokes of the target character are obtained by characteristic coding of stroke relation information such as stroke relative positions, stroke distances and stroke angles for indicating morphological relations between the two strokes, so that the morphological characteristics can represent stroke morphologies, and the morphological structure of the character is composed of single stroke morphologies, so that the stroke morphologies can represent the morphological structure details of the character, and further, the matching degree of the morphological structure of the character and a reference morphological structure (namely whether the written character is standard or not) is favorably diagnosed.
S204, the morphological characteristics between every two strokes in at least one stroke of the reference character with the composition body structure being the reference body structure are obtained, and the morphological characteristics between every two strokes of the obtained reference character are compared with the morphological characteristics between the two strokes corresponding to the target character, so that the body recognition result of the target character is obtained.
In the embodiment of the application, the feature recognition result of the target character is used for indicating the matching degree of the feature structure of the target character and the reference feature structure. In one example, the degree of matching of the feature structure of the target character to the reference feature structure is the degree of similarity of the feature structure of the target character to the reference feature structure. For example, if the matching degree of the feature structure of the target character and the reference feature structure is 80%, the difference between the feature structure of the target character and the reference feature structure is small, and if the matching degree of the feature structure of the target character and the reference feature structure is 30%, the difference between the feature structure of the target character and the reference feature structure is large. In another example, the degree of matching of the feature structure of the target character to the reference feature structure may indicate whether the feature structure of the target character is identical to the reference feature structure. Specifically, when the matching degree of the body structure of the target character and the reference body structure is greater than or equal to the preset matching degree, that is to say, the body structure of the target character is identical to the reference body structure; when the matching degree of the body structure of the target character and the reference body structure is smaller than the preset matching degree, that is, the body structure of the target character is different from the reference body structure. The preset matching degree may be set manually or may be set systematically, which is not limited herein. For example, the preset matching degree may be a value of 100%, 95%, or the like.
The morphological characteristics can represent the relative position, angle, length and other morphological information of the strokes, so the morphological structure of the target character can be represented by the morphological characteristics between all two strokes corresponding to the target character, the difference between the morphological structure of the target character and the morphological characteristics of the reference character can be determined by comparing the accurate reference character of the morphological structure with the morphological characteristics of the target character, the accuracy of the morphological structure of the target character can be reflected by the size of the morphological structure difference, and the aim of judging the matching degree of the morphological structure of the target character and the reference morphological structure can be further achieved.
Optionally, the specific manner of comparing the morphological feature between every two strokes of the acquired reference character with the morphological feature between the corresponding two strokes of the target character may be: and calculating the similarity of the morphological characteristics between every two strokes of the acquired reference character and the morphological characteristics between the corresponding two strokes of the target character.
The similarity is obtained by comparing the similarity of two things and calculating the distance between the characteristics of the things, and if the distance is small, the similarity is large; if the distance is large, the similarity is small. Therefore, the method for calculating the similarity between the two morphological features may be, but not limited to, calculating the euclidean distance, the manhattan distance (Manhattan Distance), the pearson correlation coefficient (Pearson Correlation Coefficient), or the cosine distance of the two morphological features.
Then, since a character has at least one stroke, the target character has one or more morphological features, and thus each morphological feature is compared with the corresponding morphological feature of the reference character, thereby obtaining one or more similarities. Then, the specific way to compare the morphological feature between every two strokes of the obtained reference character with the morphological feature between the corresponding two strokes of the target character to obtain the body recognition result of the target character may be: calculating the similarity of morphological characteristics between every two strokes of the acquired reference character and morphological characteristics between two strokes corresponding to the target character to obtain one or more similarities; determining the number of the similarities which are larger than a preset similarity in the one or more similarities; if the number is greater than or equal to the preset number, generating a body recognition result for indicating that the body structure of the target character is accurate, and if the number is less than the preset number, generating a body recognition result for indicating that the body structure of the target character is inaccurate.
The preset similarity may be set manually or may be set systematically, which is not limited herein. Illustratively, the preset similarity may be 70%, 0.89, 0.952, 100%, etc. The preset number may be set manually or may be set systematically, which is not limited herein. Illustratively, the preset number may be 3, 6, 2, etc. Alternatively, since the number of strokes of different characters is different, the number of morphological features is also different, and thus, a different preset number may be set for different characters, or a different preset number may be set for characters of different number of strokes. For example, a preset number of "three" words of stroke number 3 may be set to 2, and a preset number of "one" words of stroke number 7 may be set to 5.
It should be noted that, after obtaining the one or more similarities, the average similarity of the one or more similarities may be calculated to obtain a similarity between the feature structure of the target character and the reference feature structure, so as to generate a feature recognition result of the target character.
Optionally, weights corresponding to different strokes can be set for different strokes by people or a system, and then the similarity corresponding to each stroke is weighted and added based on the weights corresponding to each stroke to obtain the matching degree of the body structure of the target character, so that the body recognition result of the target character is generated. Alternatively, the matching degree of the body structure of the target character may be obtained in other manners, which is not limited herein.
Optionally, the ratio of the similarity greater than the preset similarity to the one or more similarities may be further determined, if the ratio is greater than or equal to the preset ratio, a feature recognition result for indicating that the feature structure of the target character is accurate may be generated, and if the ratio is less than the preset ratio, a feature recognition result for indicating that the feature structure of the target character is inaccurate may be generated. That is, the body structure of the target character can be determined to be accurate in the case that the similarity between the morphological feature of the majority of the target character and the corresponding morphological feature of the reference character is high.
Alternatively, after the feature recognition result of the target character is obtained, the feature recognition result of the target character may be output so that the target object checks its writing quality by the feature recognition result of the target character. Alternatively, when the feature recognition result of the target character is used for indicating that the feature structure of the target character is inaccurate, the recognition information may be output, where the recognition information includes feature prompt information for prompting that the feature structure of the target character is inaccurate. Optionally, the identification information can further include a reference character, and by outputting the reference character with accurate physical structure, the target object can conveniently contrast the reference character, and writing the target character is practiced, so that writing quality is improved. Optionally, the identification information may further include writing standard information of the reference character, where the writing standard information includes writing specification and writing requirement of the character, such as "making" the writing of the character requires that the upper portion be compact, the lower portion be loose, etc.
Further, since the above-mentioned reference is to compare the morphological feature between every two strokes of the obtained reference character with the morphological feature between the two corresponding strokes of the target character, that is, calculate the similarity between the morphological feature between every two strokes of the reference character and the morphological feature between the two corresponding strokes of the target character, it is also possible to precisely know which specific strokes of the target character are nonstandard to write.
Then, after calculating the similarity of the morphological feature between every two strokes of the acquired reference character and the morphological feature between the corresponding two strokes of the target character to obtain a plurality of similarities: two strokes corresponding to a similarity greater than or equal to the preset similarity may be determined as accurate strokes, and two strokes corresponding to a similarity less than the preset similarity may be determined as inaccurate strokes. Therefore, the identification information can also comprise inaccurate strokes, so that a target object can know which strokes are written inaccurately, and therefore, the target object can practice in a targeted manner, and writing quality is improved. Optionally, the identification information may further include an accurate stroke corresponding to the inaccurate stroke in the reference character, and by outputting the accurate stroke, the target object can conveniently contrast the reference character, and writing the target character is practiced, so that writing quality is improved.
In practical application, referring to fig. 4, an interface schematic diagram of a writing interface is shown. The terminal equipment is internally provided with a writing quality monitoring program, the writing quality monitoring program can display a writing interface in a display screen of the terminal equipment, respond to writing operation of students on the writing interface and acquire writing track point data of the students. The spelling of a plurality of words can be displayed in the writing interface, and then students are required to write correct Chinese characters according to the spelling.
As shown in the writing interface 401, the student's Ming's words such as 'primes', 'foundation', 'selection', 'novel', 'recite', 'day and night' are written according to the pinyin, and in the process of Ming's writing, the writing quality monitoring program can acquire the writing track point data of Ming's writing and store the writing track point data in the local database. Ten minutes after writing is completed for the min period, the submission in the writing interface 401 is clicked. After submitting, the writing quality monitoring program can acquire writing track point data from a local database, and then obtain sequence data of stroke track points of all strokes in at least one stroke forming all characters in the writing interface 401 by performing operations such as data slicing processing, deconstructing processing and the like on the writing track point data; then, the writing quality monitoring program can perform character recognition processing on each character to obtain reference characters of each character; performing feature extraction processing on each character to obtain morphological features between every two strokes in at least one stroke forming each character; and finally, the morphological characteristics between every two strokes in at least one stroke of the reference characters forming the characters with accurate body structure are obtained, and the morphological characteristics between every two strokes of the obtained reference characters of the characters are compared with the morphological characteristics between the corresponding two strokes of the characters, so that the body recognition result of the characters is obtained.
After the feature recognition result of each character in the writing interface 401 is obtained, the feature recognition result of each character may be output. For example, the writing interface 402, since the morphological recognition result of the character "medal" and the character "language" is that the physical structure of the character is inaccurate, the character "medal" and the character "language" are circled so that the minling can write exercises for the two characters. Optionally, a "medal" word with accurate body structure and a "speaking" word with accurate body structure can be output in the writing interface 402, so as to facilitate the training of the small contrast.
In the embodiment of the application, because the sequence data of the stroke track points of any stroke comprises the position information of each stroke track point in any stroke, morphological characteristics which can represent the relationship of the position, the length, the direction and the like between two strokes can be extracted through the obtained sequence data of the stroke track points of each stroke in at least one stroke forming the target character and the writing sequence of each stroke. Because the morphological characteristics can represent the relation of morphological aspects such as position, length, direction and the like between two strokes, and the character is composed of a plurality of strokes, the morphological of every two strokes in the plurality of strokes composing the character is accurate, namely the morphological structure of the character is accurate. Therefore, the reference character which is the same as the target character and has an accurate body structure can be identified by performing character identification processing on the target character, and then the body identification result for indicating the matching degree of the body structure of the target character and the reference body structure can be obtained by further comparing the morphological characteristics between every two strokes of the reference character with the morphological characteristics between the corresponding two strokes of the target character. The writing quality can be judged from the accuracy of the body structure of the character obtained by writing, so that the writing quality of the target character can be monitored through the obtained body recognition result of the target character.
In addition, according to the embodiment of the application, the target character is subjected to character recognition processing through the sequence data of the stroke track points of each stroke, and compared with the technologies such as optical character recognition and the like, the position and the direction of each track point are more focused, so that the recognition accuracy is higher. In addition, in the embodiment of the application, after the body recognition result of the target character is obtained, the target object is prompted to check the written target character by outputting the body recognition result of the target character, so that the writing quality is improved.
Based on the above data processing scheme and the data processing system, another data processing method is provided in the embodiments of the present application. Referring to fig. 5, a flow chart of another data processing method according to an embodiment of the present application is shown. The data processing method shown in fig. 5 may be performed by the terminal device shown in fig. 1. The data processing method shown in fig. 5 may include the steps of:
s501, outputting prompt information.
In this embodiment of the present application, the prompt information is used to prompt the target object to write the preset character on the writing interface. In writing scenes such as transcription and dictation, a target object needs to write a specified character (i.e., a preset character). The preset character may be set manually or may be set systematically, which is not limited herein. For example, a teacher may set a sentence that needs to be transcribed by the student at the management end, and then the characters in the sentence set by the teacher are preset characters.
In addition, the specific way of outputting the prompt information may be: and displaying the preset characters on the writing interface, and playing the preset characters by voice. Alternatively, the prompt information may be output in other manners, which is not limited herein. Illustratively, as shown in fig. 4, the manner of outputting the prompt information is pinyin of each word displayed in the writing interface 401, so as to achieve the purpose of prompting the target object to write the corresponding word on the writing interface.
S502, responding to writing operation on a writing interface, and acquiring target characters obtained by writing of a target object.
In the embodiment of the application, the writing operation is initiated by the target object, and then the writing operation is responded to on the writing interface, namely, the writing operation of the target object on the writing interface. In addition, the specific way of obtaining the target character written by the target object may be: and responding to the writing starting operation of the writing interface and the writing ending operation of the writing interface, and acquiring the target character obtained by writing the target object.
Specifically, the writing start operation refers to a writing operation initiated when the target object starts writing, and the writing end operation refers to a writing operation initiated when the target object ends writing. Since there is a pause of a certain time between each stroke during writing, but the stroke pause time between strokes is smaller than the character pause time between characters, it is possible to determine writing operation at the start of writing as writing start operation, and writing operation in which the pause time is longer than the preset pause time in writing operation in a preset period as writing end operation. The preset time period may be set manually or set by a system, which is not limited herein. Such as 1 minute, 30 seconds, etc. The preset dwell time may be set manually or by a system, and is not limited herein. Such as 5 seconds, half minutes, etc.
Optionally, in a writing scene of known preset characters such as transcription, dictation, and the like, the target character may also be an unwritten character. For example, the preset character is "crazy", but the target object only writes the radical "" of "crazy", then the target character is "" at this time.
S503, acquiring sequence data of stroke track points of all strokes in at least one stroke forming the target character and writing sequence of all strokes.
In the embodiment of the present application, the specific implementation of step S503 may refer to the specific implementation of step S201, which is not described herein.
Optionally, when the target character is an unwritten character, the obtained sequence data of the stroke track points of each of the at least one stroke constituting the target character refers to the sequence data of the stroke track points of each of the at least one stroke constituting the unwritten character; similarly, the writing order of the individual strokes, i.e., the writing order of the individual strokes in at least one of the strokes comprising the unwritten character.
S504, performing character recognition processing on the target character based on the sequence data of the stroke track points of each stroke to obtain a reference character.
The specific embodiment of step S504 can be referred to the specific embodiment of step S202, which is not described herein.
S505, whether the similarity between the preset character and the reference character is larger than a preset threshold.
In the embodiment of the present application, if the similarity between the preset character and the reference character is greater than the preset threshold, the execution of steps S506 to S507 is triggered. And if the similarity between the preset character and the reference character is smaller than or equal to a preset threshold value, generating a character recognition result for indicating that the target character is wrongly written. The preset threshold may be set manually or may be set by a system, which is not limited herein. For example, the preset threshold may be set to 60%, 0.96%, 83%, or the like.
Optionally, when the similarity between the preset character and the reference character is greater than a preset threshold, a character recognition result for indicating that the writing of the target character is correct may be generated; further, when the similarity between the preset character and the reference character is smaller than or equal to a preset threshold value, a character recognition result for indicating that the target character is wrongly written can be generated, and the character recognition result is output so as to prompt the target object to write again; finally, the character obtained by re-writing the target object is taken as the target character, and steps S501 to S507 are executed.
Since the reference character is a character which is recognized, should have the same meaning and the same shape and structure as the target character, and the target character is a character which the target object is intended to write, if the similarity between the preset character and the reference character is high, it is indicated that the character which the target object writes is correct. In addition, generally speaking, the monitoring of the writing quality should be based on the writing correctness, so that the accuracy is guaranteed and then the quality is guaranteed, and when the similarity between the preset character and the reference character is greater than the preset threshold value, it is indicated that the writing of the target character is correct, and then the writing quality of the target character can be further monitored at this time, that is, steps S506 to S507 are executed; when the similarity between the preset character and the reference character is smaller than or equal to the preset threshold value, the writing error of the target character is indicated, and then the target object can be required to be rewritten.
Specifically, the obtaining manner of the similarity between the preset character and the reference character may be: performing character feature extraction processing on the preset characters to obtain character features of the preset characters; and carrying out character feature extraction processing on the reference character to obtain character features of the reference character; and finally, determining the similarity between the preset character and the reference character according to the character characteristics of the preset character and the character characteristics of the reference character. The specific manner of determining the similarity between the preset character and the reference character according to the character features of the preset character and the character features of the reference character may be, but is not limited to, calculating the euclidean distance, the manhattan distance (Manhattan Distance), the pearson correlation coefficient (Pearson Correlation Coefficient), or the cosine distance between the character features of the preset character and the character features of the reference character.
S506, performing feature extraction processing on the target character according to the sequence data of the stroke track points of each stroke and the writing sequence of each stroke to obtain morphological features between every two strokes in at least one stroke forming the target character.
The specific embodiment of step S506 can be referred to the specific embodiment of step S203, which is not described herein.
S507, obtaining morphological characteristics between every two strokes in at least one stroke of a reference character with a composition body structure as a reference body structure, and comparing the morphological characteristics between every two strokes of the obtained reference character with morphological characteristics between two strokes corresponding to a target character to obtain a body recognition result of the target character.
In the embodiment of the application, the reference character or the character identifier of the reference character and the morphological characteristics between every two strokes in at least one stroke forming the reference character can be stored in a preset database in advance; the specific way of obtaining morphological features between every two strokes in the at least one stroke constituting the reference character with accurate body structure may be: and searching a reference character or morphological characteristics between every two strokes in at least one stroke forming the reference character corresponding to character identification of the reference character from a preset database. Wherein,
Alternatively, the sequence data of the stroke track points of each stroke in at least one stroke of the reference character which forms the exact shape structure and the writing sequence of each stroke of the reference character can be stored in a preset database. The specific way to obtain morphological features between every two strokes in the at least one stroke that constitutes the reference character with accurate body structure may then be:
1) Acquiring sequence data of stroke track points of all strokes in at least one stroke of a reference character with a composition body structure as a reference body structure, and writing sequence of all strokes of the reference character;
the reference character is a template character matched with the target character among the template characters stored in the preset database and having accurate shape and structure in step S202.
2) Carrying out change processing on the reference character to obtain a deformed character, wherein the deformed character is different from the reference character in size or shape;
the specific way of changing the reference character to obtain the deformed character may be: adjusting the spatial characteristics of the reference character to obtain an adjusted reference character, wherein the spatial characteristics of the adjusted reference character are preset spatial characteristics; and adjusting the size or the shape of the adjusted reference character to obtain the deformed character. The preset spatial features refer to one or more of the features of the reference character, such as position features, stroke structure features, direction features and the like. The preset spatial feature may be set manually or may be set by a system, and is not limited herein.
In a specific implementation, since the reference characters are stored in a preset database in advance, and the template characters stored in advance are often stored in the form of images, the spatial features of the reference characters are adjusted, that is, the reference images containing the reference characters are normalized, wherein the normalization refers to correcting and comparing the spatial structures of the reference characters included in the input reference images according to the character proportion structure of the writing standard, so as to better extract feature details in the characters. Specifically, in the normalization process, the width of the character strokes, the size of the characters and the spatial features of the images containing the characters are converted into spatial features (i.e., preset spatial features) with unified dimensions and unified standards, which are convenient to identify and measure for comparison.
For example, for the acquired Chinese characters, because the length differences of different Chinese characters are large, the irregular Chinese characters need to be scaled to a fixed-size area. The scaling ratio of the abscissa of the Chinese character can be set as R x And the scale ratio of the ordinate is R y The normalization formula of the obtained Chinese characters is as follows:
R x =W 2 /W 1
R x =H 2 /H 1
Wherein W is 1 Refers to the width, W of the acquired Chinese characters 2 The normalized Chinese character width is indicated; h 1 Refers to the length, H of the acquired Chinese characters 2 Refers to the length of the normalized Chinese character. It should be noted that, because the sequence data of the stroke track points of each stroke in at least one stroke forming the obtained Chinese character includes the position information of each stroke track point in each stroke, the length and width of the obtained Chinese character can be calculated based on the position information of each stroke track point in each stroke, which is not described herein.
Obtaining W 2 And H 2 Then, the position information of each stroke track point in the sequence data of the stroke track points of each stroke of the acquired Chinese character can be further normalized, and the normalization formula of the track points is as follows:
X 1 =R x *X 0
Y 1 =R y *Y 0
wherein X is 0 Refers to the abscissa of the stroke track points of the acquired Chinese characters, Y 0 Refers to the ordinate of the stroke track point; x is X 1 Refers to the stroke track of the acquired Chinese characterThe abscissa of the dot, Y 1 Refers to the ordinate of the stroke track point after normalization.
In a specific implementation, the size or the form of the adjusted reference character is adjusted to obtain a specific mode of deformed characters, namely: affine transformation is carried out on the image containing the adjusted reference characters, so that the image containing deformed characters is obtained. The affine transformation refers to data enhancement operations such as zooming in and out, moving or angle transformation on an image containing the adjusted reference character. Wherein data enhancement refers to adding similar samples of the sample. The image is enlarged, reduced, moved or angularly transformed, and the size or position of the character contained in the image is changed to some extent, but the character itself contained in the image is not changed, so that an image having similar contents is obtained by affine transformation of the image. That is, the data enhancing operation refers to an operation in which data having a difference from the data but similar data content can be obtained by processing the data. Since affine transformation does not change the linear structure of the original image, but uniformly changes the track points in the image, in order to increase sample diversification (i.e. change out deformed characters of the same character), the correction capability of the feature extraction model for extracting morphological features is improved, and affine transformation of the image can be performed to obtain different sample images of the same character.
3) According to the sequence data of the stroke track points of all the strokes of the reference character and the writing sequence of all the strokes of the reference character, carrying out feature extraction processing on the reference character to obtain original morphological features between every two strokes in at least one stroke forming the reference character;
4) According to the sequence data of the stroke track points of each stroke of the deformed character and the writing sequence of each stroke of the deformed character, carrying out feature extraction processing on the deformed character to obtain the change morphological features between every two strokes in at least one stroke forming the deformed character;
the specific embodiments of steps 3) and 4) may refer to the specific embodiment of step S203, which is not described herein.
5) And carrying out weighted addition on the original morphological characteristics between every two strokes in the reference character and the changed morphological characteristics between every two strokes corresponding to the deformed character to obtain the morphological characteristics between every two strokes in at least one stroke forming the reference character.
The method comprises the steps that the reference weight of original morphological characteristics between every two strokes in a reference character and the change weight of change morphological characteristics between every two strokes in a deformed character can be preset; and adding the original morphological characteristics between every two strokes in the reference character and the changing morphological characteristics between every two strokes corresponding to the deformed character based on the reference weight of the original morphological characteristics between every two strokes in the reference character and the changing weight of the changing morphological characteristics between every two strokes corresponding to the deformed character to obtain the morphological characteristics between every two strokes in at least one stroke composing the reference character.
Alternatively, the reference weight and the change weight may be set uniformly, and then the original morphological feature between every two strokes in the reference character and the change morphological feature between every two strokes corresponding to the deformed character are added based on the reference weight and the change weight, so as to obtain the morphological feature between every two strokes in at least one stroke forming the reference character. For example, a reference weight of 0.7 may be set, and a variation weight of 0.3; setting a reference weight to be 0.5, and setting a change weight to be 0.5; or the reference weight is set to 1, the change weight is also set to 1, and the like, and the present invention is not limited thereto. In practical applications, because some strokes and angles of the deformed character are not accurate enough, and the reference character is the most standard character of the body structure, the reference weight can be set to be greater than the variation weight.
In one possible implementation, the writing order of each stroke of the reference character whose writing order is the reference writing order may also be obtained; and then, comparing the writing sequence of each stroke of the reference character with the writing sequence of the corresponding stroke of the target character to obtain a sequence recognition result of the target character, wherein the sequence recognition result is used for indicating the matching degree of the writing sequence of each stroke of the target character and the reference writing sequence. Wherein, the reference writing sequence refers to the writing sequence of each stroke which accords with the writing standard. In addition, the reference character is not only a character whose physical structure is accurate, but also a character whose writing order is the reference writing order.
In one example, the degree of matching of the writing order of the individual strokes of the target character to the reference writing order may be the ratio of strokes of the target character whose writing order matches the reference writing order to all strokes of the target character. For example, if a character has a total of 8 strokes, the first and second strokes of which are reversed, i.e., the writing order of the first and second strokes do not match the reference writing order, then the degree of matching of the writing order of each of the strokes in the character to the reference writing order may be determined to be (8-2)/8=75%. The higher the matching degree between the writing sequence of each stroke of the target character and the reference writing sequence is, the more accurate the writing sequence of the target character is, and if the matching degree between the writing sequence of each stroke of the target character and the reference writing sequence is smaller, the more inaccurate the writing sequence of the target character is.
In another example, the degree of matching of the writing order of the individual strokes of the target character to the reference writing order may indicate whether the writing order of the individual strokes of the target character is the same as the reference writing order. Specifically, when the degree of matching of the writing order of each stroke of the target character with the reference writing order is greater than or equal to the target degree of matching, that is, the writing order of each stroke of the target character is the same as the reference writing order; when the matching degree of the writing order of each stroke of the target character and the reference writing order is smaller than the target matching degree, that is, the writing order of each stroke of the target character is different from the reference writing order. The target matching degree can be 100%, 95% or the like.
For example, referring to fig. 6, a schematic diagram of an implied interface is shown. The implied interface 601 includes 8 ancient poems, and the first half of the ancient poems of the student's small Wang Moxie are needed. After the small Wang Moxie is completed, clicking on the completion of the implied interface 601,at this time, the writing recognition model starts to process the writing track points of the obtained king in the implied interface 601, and obtains the character recognition results, the shape recognition results and the sequence recognition results of all the characters written by the king in the implied interface 601. As shown in the implied interface 602, the 1, 2, 4-7 ancient poems were written silently in the correct order Wang Dou, and only the result of the sequential recognition of the "raw" word "in the tree cluster" in the third ancient poem indicated that the writing sequence was inaccurate. The widget 603 in the implied interface 602 thus shows "raw: writing order error "; while widget 603 also displays a "clickDetails of theLooking at the correct writing order ", the king clicks on" because the underlined text displayed in the widget 603 will carry the page links "Details of theAfter this, the current implied interface 601 jumps to the interface where the correct writing order of the new word is demonstrated.
Alternatively, the above-mentioned reference character may be a template character matching the target character among a plurality of template characters stored in a preset database and having an accurate shape structure, and when the template character is stored, the shape characteristics between every two strokes of the template character, the writing order of each stroke of the template character, and the track characteristics of the template character may be stored in the preset database in advance.
In a specific implementation, referring to fig. 7, a schematic diagram of a training process for writing a recognition model is shown. The writing identification model comprises a feature extraction module and a classification module. The method comprises the steps of firstly inputting a sample, wherein the sample comprises a template image containing template characters with accurate body structure and a reference writing sequence, and sequence data of stroke track points of all strokes in at least one stroke forming all the template characters, and the writing sequence of all the strokes of the template characters.
Then, a template image containing template characters is subjected to change processing, and a deformed image containing deformed template characters is obtained. After the deformed image is obtained, sequence data of stroke track points of all strokes of the deformed template character contained in the deformed image and writing sequence of all strokes of the deformed template character can be obtained.
Then, a feature extraction module can be called, and feature extraction processing is carried out on the template character according to the sequence data of the stroke track points of each stroke of the template character and the writing sequence of each stroke of the template character, so as to obtain the original morphological features between every two strokes in at least one stroke forming the template character; according to the sequence data of the stroke track points of each stroke of the template deformed character and the writing sequence of each stroke of the template deformed character, carrying out feature extraction processing on the template deformed character to obtain the change morphological features between every two strokes in at least one stroke forming the template deformed character; and carrying out weighted addition on the original morphological characteristics between every two strokes in the template character and the changed morphological characteristics between every two strokes corresponding to the deformed character of the template to obtain the morphological characteristics between every two strokes in at least one stroke forming the template character. The feature extraction module may output morphological features between every two strokes that make up the template character, so as to store the template character and the morphological feature correspondence between every two strokes that make up the template character to a preset database.
Meanwhile, a feature extraction module can be called, and direction feature extraction processing is carried out on sequence data of stroke track points of each stroke of the template character, so that template direction track features of the template character in a plurality of preset directions are obtained; and splicing the plurality of template direction track features to obtain track features of the template characters. The feature extraction module can output track features of the template characters so as to store the template characters and the track features of the template characters to a preset database correspondingly. It can be seen that the output results in FIG. 6 include morphological features between every two strokes of the output constituent template characters, as well as trajectory features of the template characters.
Since the writing recognition model also needs to output a character recognition result by comparing morphological features, a character recognition result by comparing writing orders, and a trace feature. The writing recognition model may therefore further comprise a classification module for comparing morphological features, trajectory features and/or writing order.
The ability of the writing recognition model to recognize characters with inaccurate body structures and inaccurate writing sequences can be trained by means of the negative sample when the writing recognition model is trained, so that the classification accuracy of the writing recognition model is improved, and then a mixed sample can be input, wherein the mixed sample comprises a positive sample and a negative sample. The positive sample comprises a template image containing positive template characters, sequence data of stroke track points of all strokes in at least one stroke forming all positive template characters, writing sequence of all strokes of the positive template characters, and feature recognition results, sequence recognition results and character recognition results of the positive template characters. And the sample comprises an image containing negative template characters, sequence data of stroke track points of all strokes in at least one stroke forming all negative template characters, writing sequence of all strokes of the negative template characters, and shape recognition results, sequence recognition results and character recognition results of the negative template characters. Wherein, positive template characters refer to characters with accurate body structure and/or writing sequence as reference writing sequence, and negative template characters refer to characters with inaccurate body structure and/or writing sequence as reference writing sequence.
And then, carrying out the same operation as the sample on the mixed sample to obtain the morphological characteristics among every two strokes forming the positive template character, the track characteristics of the positive template character, the morphological characteristics among every two strokes forming the negative template character and the track characteristics of the negative template character. Then, a classification module can be called to compare the morphological characteristics between every two strokes of the template character with the morphological characteristics between the two strokes corresponding to the negative template character to obtain a first prediction classification result; calling an initial classification module, and comparing morphological characteristics between every two strokes of the template character with morphological characteristics between two strokes corresponding to the positive template character to obtain a second prediction classification result; and finally training the initial classification module based on the first prediction classification result and the body recognition result of the positive template character and the second prediction classification result and the body recognition result of the negative template character to obtain the classification module.
Similarly, an initial classification module can be called, and the track characteristics of the template characters are compared with the track characteristics of the negative template characters to obtain a third prediction classification result; calling an initial classification module, and comparing the track characteristics of the template characters with the track characteristics of the positive template characters to obtain a fourth prediction classification result; and finally, training the initial classification module based on the third prediction classification result and the sequence recognition result of the positive template character and the sequence recognition result of the fourth prediction classification result and the negative template character to obtain the classification module.
An initial classification module can be called, and the writing sequence of the template characters is compared with the writing sequence of the negative template characters to obtain a fifth prediction classification result; calling an initial classification module, and comparing the writing sequence of the template characters with the writing sequence of the positive template characters to obtain a sixth prediction classification result; and finally training the initial classification module based on the fifth prediction classification result and the character recognition result of the positive template character and the sixth prediction classification result and the character recognition result of the negative template character to obtain the classification module.
In a specific implementation, referring to fig. 8a, a schematic structural diagram of a writing recognition model is shown. The write identification model may be a convolutional neural network. Such as the LeNet-5 shown in fig. 8a (a classical convolutional neural network, which was originally used in handwriting character recognition applications).
Wherein, the LeNet-5 comprises C1 and C2 convolution layers as shown in FIG. 8a, S1 and S2 downsampling layers (downsampling layers are also called pooling layers), H1 is a full-connection layer, the output is a Gaussian connection layer H2, and the layers are classified by using a class function such as softmax. The C1 and C2 convolution layers mainly perform convolution operation in the process of feature extraction. After the image 701 containing the "word" is input, the convolution layers C1, C2 and the downsampling layers S1, S2 extract features of the image 801, and then the extracted features may be output to the full connection layer, and the full connection layer H1 and the gaussian connection layer H2 sequentially perform classification processing on the extracted features, and finally output a classification result.
In particular, referring to FIG. 8b, a schematic diagram of the process of a convolution operation is shown. By shifting the convolution kernel 803 of the convolution layer over the input data 802, a corresponding convolution value can be calculated for each shift, and finally, after the shift of the convolution kernel 803 over the input data 802 is completed, the convolution characteristic of the convolution kernel can be output. As shown in fig. 8b, after the convolution kernel 803 moves to the upper right of the input data 802, by multiplying each of the 9 values included in the convolution kernel 803 by each value corresponding to the input data 802 at this time, 4 is obtained, and then 4 is the value of row 1 and column 3 in the convolution characteristic of the convolution kernel.
In one possible implementation, the writing recognition model is a personalized writing recognition model that can be generated for different students' writing habits. Specifically, different people have different writing habits, mistakes made by different people during writing are different, for example, the stroke sequence of writing the component "" by the classmate A is reversed, the uppermost "mouth" of writing the "product" by the classmate B is overlooked, the body structure is inaccurate, the classmate C is used to write the line, and the "water" in the line is usually written by two lines, and four lines are not needed as the regular writing.
Therefore, the historical characters with accurate body structure, which are obtained by the historical writing of the target object, can be obtained, and the sequence data of the stroke track points of all strokes in at least one stroke forming the historical characters and the writing sequence of all strokes of the historical characters can be obtained; the training process of the writing recognition model mentioned in the example shown in fig. 6 is then performed with the history character as the template character described above, thereby obtaining a personalized writing recognition model for the target object. The character for morphological feature comparison with the target character obtained by writing the target object in the subsequent personalized writing recognition model can be the historical character with accurate body structure obtained by historical writing of the target object, so that the situation that the personalized writing recognition model has a character recognition result error, a body recognition result error and a sequential recognition result error due to different writing habits of different people is avoided.
Optionally, when the body recognition result indicates that the body structure of the target character is inaccurate, two strokes of the target character corresponding to the similarity with the similarity lower than the preset similarity may be acquired, and based on the acquired two strokes, body correction information of the reference character is generated, where the body correction information is used to indicate the accurate shapes of the acquired two strokes. And in the process of writing after the target object, when the output prompt information is used for prompting that the character written by the target object on the writing interface is the reference character, outputting the shape correction information so as to prompt the target object to write the accurate reference character according to the shape correction information.
Optionally, when the sequence recognition result indicates that the writing sequence of any stroke of the target character is inaccurate, sequence correction information of any stroke is generated, wherein the sequence correction information is used for indicating the accurate writing sequence of any stroke. And in the process of writing after the target object, when the output prompt information is used for prompting the character written by the target object on the writing interface to be the reference character, outputting the sequence correction information so as to prompt the target object to accurately write any one of the strokes.
Alternatively, when the character recognition result indicates that the target character is written in error, character correction information for indicating the cause of the error of the target character is generated. And in the process of writing after the target object, when the output prompt information is used for prompting that the character written by the target object on the writing interface is the reference character, outputting the character correction information so as to prompt the target object to correctly write the reference character by referring to the error reason.
Therefore, the personalized recognition and prompt can be carried out for the students according to the writing habit and the subject content mastering condition of the students, such as predicting the words which the students will write wrong, actively reminding and the like; meanwhile, the history writing errors of students can be recorded, so that accurate correction can be performed in the next writing process, and early warning can be performed in advance if the components of a certain word and parts are places where the students can have errors.
In practical application, the writing recognition model can be built in an intelligent learning hardware product, and linkage is performed through a software and hardware integrated scheme. For example, referring to FIG. 9, an interface diagram of another writing interface is shown. Reddish practice writing in the learning machine with built-in writing identification model. Since the xiaohong writes "what" word before, the written "mouth" section, and the rewritten horizontal and vertical hooks, the sequence recognition result of the previously written "what" word indicates that the writing sequence of "what" is inaccurate, and sequence correction information for indicating the accurate writing sequence of all strokes of the "what" middle "writable section is generated.
Thus, as shown in interface 901, when "what" is written in small red, small window 902 is displayed, and small window 902 includes "what" of sequential correction information: the horizontal and vertical hooks are written first, and the mouth is written last. Finally, the reddish can correct the information according to the sequence, and write what words according to the correct writing sequence.
Optionally, the character recognition result, the shape recognition result and the sequence recognition result of the character obtained by writing the target object each time can be recorded for the target object, and corresponding character correction information, shape correction information and sequence correction information can be generated; finally, based on one or more of character recognition results, shape recognition results, sequence recognition results, character correction information, shape correction information and sequence correction information of each character, generating a teaching strategy of the target object, wherein the teaching strategy comprises writing quality of each character of the target object and a training scheme aiming at each character. For example, when the target object is writing the character "san", it is preferable to write in the order from bottom to top, and the writing quality of the character "san" in the teaching strategy may be low, and then the training scheme may be to write 10 times of Chinese characters containing the character "san" such as "river", "thirst" and the like.
In the embodiment of the application, in writing scenes such as transcription, dictation and the like, the target object can be prompted by the prompt information to prompt which characters need to be written. Since the monitoring of the writing quality should be based on the writing correctness, when the similarity between the preset character and the reference character is larger than the preset threshold value, the target character is correctly written and matched with the expected preset character, so that the writing quality of the target character, namely the matching degree of the body structure of the target character and the reference body structure can be further monitored. In addition, since morphological characteristics can represent the relationship in morphological aspects such as position, length, direction and the like between two strokes, and a character is composed of a plurality of strokes, the morphology between every two strokes in the plurality of strokes composing the character is accurate, that is, the shape and structure of the character are accurate. Therefore, the reference character which is the same as the target character and has the body structure which accords with the writing standard can be identified by carrying out character identification processing on the target character, and then the body identification result for indicating the matching degree of the body structure of the target character and the reference body structure can be obtained by further comparing the morphological characteristics between every two strokes of the reference character and the morphological characteristics between the corresponding two strokes of the target character. Since the reference body structure refers to a body structure conforming to the writing standard, and the writing quality can be judged from the matching degree of the body structure of the character obtained by writing and the body structure conforming to the writing standard, the writing quality of the target character can be monitored through the obtained body recognition result of the target character.
In addition, the embodiment of the application also judges whether the writing sequence of each stroke of the target character is correct or not by comparing the writing sequence of each stroke of the reference character with the writing sequence of the corresponding stroke of the target character, so that on the basis of judging that the body structure is accurate, whether the writing sequence is accurate or not can be judged, and the writing quality of the target character can be further monitored.
Based on the related description of the data processing method, the application also discloses a data processing device. The data processing means may be a computer program (comprising program code) running on one of the computer devices mentioned above. The data processing apparatus may perform the data processing method shown in fig. 2 and 5, referring to fig. 6, the data processing apparatus may at least include: an acquisition unit 1001 and a processing unit 1002.
The obtaining unit 1001 is configured to determine a target character obtained by writing, and obtain sequence data of stroke track points of each of at least one stroke forming the target character, and a writing order of each of the strokes, where any one of the strokes is composed of a plurality of stroke track points, and the sequence data of the stroke track points of any one of the strokes includes position information of each of the stroke track points in the any one of the strokes;
The processing unit 1002 is configured to perform a character recognition process on the target character based on the sequence data of the stroke track points of the respective strokes, to obtain a reference character;
the processing unit 1002 is further configured to perform feature extraction processing on the target character according to the sequence data of the stroke track points of the respective strokes and the writing order of the respective strokes, so as to obtain morphological features between every two strokes in at least one stroke forming the target character;
the processing unit 1002 is further configured to obtain morphological features between every two strokes in at least one stroke of a reference character that forms a feature structure as a reference feature structure, and compare the obtained morphological features between every two strokes of the reference character with morphological features between two corresponding strokes of the target character to obtain a feature recognition result of the target character, where the feature recognition result is used to indicate a matching degree of the feature structure of the target character and the reference feature structure.
In one embodiment, the processing unit 1002 is specifically further configured to perform:
acquiring the writing sequence of each stroke of a reference character with the writing sequence being the reference writing sequence;
Comparing the writing sequence of each stroke of the reference character with the writing sequence of the corresponding stroke of the target character to obtain a sequence recognition result of the target character, wherein the sequence recognition result is used for indicating the matching degree of the writing sequence of each stroke of the target character and the reference writing sequence.
In yet another embodiment, the processing unit 1002, when performing the character recognition processing on the target character based on the sequence data of the stroke track points of the respective strokes, is further configured to perform:
carrying out direction characteristic extraction processing on the sequence data of the stroke track points of each stroke to obtain direction track characteristics of the target character in a plurality of preset directions;
splicing the plurality of direction track features to obtain target track features;
searching characters with track characteristics matched with the target track characteristics in a preset database to obtain the reference characters.
In yet another embodiment, the processing unit 1002 is specifically further configured to perform:
when no character with the track characteristic matched with the target track characteristic is found in a preset database, generating a character recognition result for indicating the writing error of the target character;
Outputting the character recognition result to prompt the target object to write again;
and re-writing the character obtained by the target object to obtain a target character.
In yet another embodiment, the obtaining unit 1001 may be further configured to, when determining that the target object writes the obtained target character, perform:
outputting prompt information, wherein the prompt information is used for prompting a target object to write preset characters on a writing interface;
responding to writing operation on the writing interface, and acquiring a target character obtained by writing the target object;
the processing unit 1002 may be further configured to perform: and if the similarity between the preset character and the reference character is greater than a preset threshold, triggering to execute the sequence data according to the stroke track points of all the strokes and the writing sequence of all the strokes, performing feature extraction processing on the target character to obtain morphological features among every two strokes in at least one stroke forming the target character, obtaining morphological features among every two strokes in at least one stroke forming the accurate reference character, and comparing the morphological features among every two strokes of the obtained reference character with the morphological features among the corresponding two strokes of the target character to obtain a feature recognition result of the target character.
In yet another embodiment, the processing unit 1002, when acquiring morphological features between every two strokes of at least one stroke that constitutes a physicaliy accurate reference character, is specifically operable to perform:
acquiring sequence data of stroke track points of all strokes in at least one stroke of a reference character with a composition body structure as a reference body structure, and writing sequence of all strokes of the reference character;
carrying out change processing on the reference character to obtain a deformed character, wherein the deformed character is different from the reference character in size or shape;
according to the sequence data of the stroke track points of all the strokes of the reference character and the writing sequence of all the strokes of the reference character, carrying out feature extraction processing on the reference character to obtain original morphological features between every two strokes in at least one stroke forming the reference character;
according to the sequence data of the stroke track points of all strokes of the deformed character and the writing sequence of all strokes of the deformed character, carrying out feature extraction processing on the deformed character to obtain the change morphological features between every two strokes in at least one stroke forming the deformed character;
And carrying out weighted addition on the original morphological characteristics between every two strokes in the reference character and the changed morphological characteristics between every two strokes corresponding to the deformed character to obtain the morphological characteristics between every two strokes in at least one stroke forming the reference character.
In yet another embodiment, the processing unit 1002 may be further configured to, when performing the change processing on the reference character to obtain a deformed character, perform:
adjusting the spatial characteristics of the reference character to obtain an adjusted reference character, wherein the spatial characteristics of the adjusted reference character are preset spatial characteristics;
and adjusting the size or the shape of the adjusted reference character to obtain the deformed character.
According to one embodiment of the present application, the steps involved in the methods shown in fig. 2 and 5 may be performed by respective units in the data processing apparatus shown in fig. 10. For example, step S201 shown in fig. 2 may be performed by the acquisition unit 1001 in the data processing apparatus shown in fig. 10; steps S202 to S204 may be performed by the processing unit 1002 in the data processing apparatus shown in fig. 10. For another example, steps S502 to S503 shown in fig. 5 may be performed by the acquisition unit 1001 in the data processing apparatus shown in fig. 10; step S501, step S504 to step S507 may be performed by the processing unit 1002 in the data processing apparatus shown in fig. 10.
According to another embodiment of the present application, each unit in the data processing apparatus shown in fig. 10 is divided based on a logic function, and each unit may be separately or completely combined into one or several other units to form the data processing apparatus, or some unit(s) thereof may be further split into a plurality of units with smaller functions to form the data processing apparatus, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. In other embodiments of the present application, the data processing apparatus may also include other units, and in practical applications, these functions may also be implemented with assistance of other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present application, a data processing apparatus as shown in fig. 10 may be constructed by running a computer program (including program code) capable of executing the steps involved in the method as shown in fig. 2 or fig. 5 on a general-purpose computing device such as a computer device including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and the data processing method of the embodiment of the present application is implemented. The computer program may be recorded on, for example, a computer storage medium, and loaded into and run in the above-described computer apparatus through the computer storage medium.
In the embodiment of the application, because the sequence data of the stroke track points of any stroke comprises the position information of each stroke track point in any stroke, morphological characteristics which can represent the relationship of the position, the length, the direction and the like between two strokes can be extracted through the obtained sequence data of the stroke track points of each stroke in at least one stroke forming the target character and the writing sequence of each stroke. Since morphological features can characterize the relationship in terms of position, length, direction, etc. between two strokes, and a character is made up of multiple strokes, if the morphology between every two strokes in the multiple strokes making up the character is in accordance with the writing standard, the morphological structure of the character should also be in accordance with the writing standard. Therefore, the reference character which is the same as the target character and has the body structure of the reference body structure can be identified by performing character identification processing on the target character, and then the body identification result for indicating the matching degree of the body structure of the target character and the reference body structure can be obtained by further comparing the morphological characteristics between every two strokes of the reference character with the morphological characteristics between the corresponding two strokes of the target character. Since the reference body structure refers to a body structure conforming to the writing standard, and the writing quality can be judged from the matching degree of the body structure of the character obtained by writing and the body structure conforming to the writing standard, the writing quality of the target character can be monitored through the obtained body recognition result of the target character.
Based on the method embodiment and the device embodiment, the application also provides electronic equipment. Referring to fig. 11, a schematic structural diagram of an electronic device according to an embodiment of the present application is provided. The electronic device shown in fig. 11 may include at least a processor 1101, an input interface 1102, an output interface 1103, and a computer storage medium 1104. Wherein the processor 1101, the input interface 1102, the output interface 1103, and the computer storage medium 1104 may be connected by a bus or other means.
The computer storage medium 1104 may be stored in a memory of the electronic device, the computer storage medium 1104 being for storing a computer program comprising program instructions, and the processor 1101 being for executing the program instructions stored by the computer storage medium 1104. The processor 1101 (or CPU (Central Processing Unit, central processing unit)) is a computing core and a control core of the electronic device, which are adapted to implement one or more instructions, in particular to load and execute one or more instructions to implement the above-described data processing method flow or corresponding functions.
The embodiment of the application also provides a computer storage medium (Memory), which is a Memory device in the electronic device and is used for storing programs and data. It will be appreciated that the computer storage medium herein may include both a built-in storage medium in the terminal and an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor 1101. Note that the computer storage medium may be a high-speed random access memory (random access memory, RAM) or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory; optionally, at least one computer storage medium remote from the processor may be present.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by the processor 1101 to implement the corresponding steps of the methods described above in connection with the data processing method embodiments of fig. 2 and 5, in a specific implementation, the one or more instructions in the computer storage medium are loaded and executed by the processor 1101 to:
the processor 1101 determines a target character obtained by writing, and obtains sequence data of stroke track points of each stroke in at least one stroke composing the target character, and writing order of each stroke, wherein any stroke is composed of a plurality of stroke track points, and the sequence data of the stroke track points of any stroke comprises position information of each stroke track point in any stroke;
the processor 1101 performs character recognition processing on the target character based on the sequence data of the stroke track points of the strokes to obtain a reference character;
the processor 1101 performs feature extraction processing on the target character according to the sequence data of the stroke track points of the strokes and the writing order of the strokes, so as to obtain morphological features between every two strokes in at least one stroke forming the target character;
The processor 1101 obtains morphological features between every two strokes in at least one stroke of a reference character forming a body structure as a reference body structure, and compares the obtained morphological features between every two strokes of the reference character with morphological features between two corresponding strokes of the target character to obtain a body recognition result of the target character, wherein the body recognition result is used for indicating a matching degree of the body structure of the target character and the reference body structure.
In one embodiment, the processor 1101 may be further configured to perform:
acquiring the writing sequence of each stroke of a reference character with the writing sequence being the reference writing sequence;
comparing the writing sequence of each stroke of the reference character with the writing sequence of the corresponding stroke of the target character to obtain a sequence recognition result of the target character, wherein the sequence recognition result is used for indicating the matching degree of the writing sequence of each stroke of the target character and the reference writing sequence.
In one embodiment, the processor 1101 is further configured to, when performing a character recognition process on the target character based on the sequence data of the stroke track points of the respective strokes, obtain a reference character, perform:
Carrying out direction characteristic extraction processing on the sequence data of the stroke track points of each stroke to obtain direction track characteristics of the target character in a plurality of preset directions;
splicing the plurality of direction track features to obtain target track features;
searching characters with track characteristics matched with the target track characteristics in a preset database to obtain the reference characters.
In one embodiment, the processor 1101 is specifically further configured to perform:
when no character with the track characteristic matched with the target track characteristic is found in a preset database, generating a character recognition result for indicating the writing error of the target character;
outputting the character recognition result to prompt the target object to write again;
and re-writing the character obtained by the target object to obtain a target character.
In one embodiment, the processor 1101, when determining that the target object writes the resulting target character, is further operable to perform:
outputting prompt information, wherein the prompt information is used for prompting a target object to write preset characters on a writing interface;
responding to writing operation on the writing interface, and acquiring a target character obtained by writing the target object;
The processor 1101 may also be configured to perform:
and if the similarity between the preset character and the reference character is greater than a preset threshold, triggering to execute the sequence data according to the stroke track points of all the strokes and the writing sequence of all the strokes, performing feature extraction processing on the target character to obtain morphological features among every two strokes in at least one stroke forming the target character, obtaining morphological features among every two strokes in at least one stroke forming the accurate reference character, and comparing the morphological features among every two strokes of the obtained reference character with the morphological features among the corresponding two strokes of the target character to obtain a feature recognition result of the target character.
In one embodiment, the processor 1101 is specifically configured to perform, when acquiring morphological features between every two strokes of at least one stroke constituting a reference character with a physique accuracy:
acquiring sequence data of stroke track points of all strokes in at least one stroke of a reference character with a composition body structure as a reference body structure, and writing sequence of all strokes of the reference character;
Carrying out change processing on the reference character to obtain a deformed character, wherein the deformed character is different from the reference character in size or shape;
according to the sequence data of the stroke track points of all the strokes of the reference character and the writing sequence of all the strokes of the reference character, carrying out feature extraction processing on the reference character to obtain original morphological features between every two strokes in at least one stroke forming the reference character;
according to the sequence data of the stroke track points of all strokes of the deformed character and the writing sequence of all strokes of the deformed character, carrying out feature extraction processing on the deformed character to obtain the change morphological features between every two strokes in at least one stroke forming the deformed character;
and carrying out weighted addition on the original morphological characteristics between every two strokes in the reference character and the changed morphological characteristics between every two strokes corresponding to the deformed character to obtain the morphological characteristics between every two strokes in at least one stroke forming the reference character.
In one embodiment, the processor 1101 may be further configured to perform, when performing the change processing on the reference character to obtain a deformed character:
Adjusting the spatial characteristics of the reference character to obtain an adjusted reference character, wherein the spatial characteristics of the adjusted reference character are preset spatial characteristics;
and adjusting the size or the shape of the adjusted reference character to obtain the deformed character.
In the embodiment of the application, because the sequence data of the stroke track points of any stroke comprises the position information of each stroke track point in any stroke, morphological characteristics which can represent the relationship of the position, the length, the direction and the like between two strokes can be extracted through the obtained sequence data of the stroke track points of each stroke in at least one stroke forming the target character and the writing sequence of each stroke. Since morphological features can characterize the relationship in terms of position, length, direction, etc. between two strokes, and a character is made up of multiple strokes, if the morphology between every two strokes in the multiple strokes making up the character is in accordance with the writing standard, the morphological structure of the character should also be in accordance with the writing standard. Therefore, the reference character which is the same as the target character and has the body structure of the reference body structure can be identified by performing character identification processing on the target character, and then the body identification result for indicating the matching degree of the body structure of the target character and the reference body structure can be obtained by further comparing the morphological characteristics between every two strokes of the reference character with the morphological characteristics between the corresponding two strokes of the target character. Since the reference body structure refers to a body structure conforming to the writing standard, and the writing quality can be judged from the matching degree of the body structure of the character obtained by writing and the body structure conforming to the writing standard, the writing quality of the target character can be monitored through the obtained body recognition result of the target character.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the electronic device to perform the method embodiments described above and illustrated in fig. 2 and 5. The computer readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
The data processing method in the embodiment of the invention can be applied to other language characters except Chinese characters and English or other scenes needing writing, and is not limited herein.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. A method of data processing, comprising:
determining a target character obtained by writing, and acquiring sequence data of stroke track points of all strokes in at least one stroke forming the target character and writing sequence of all strokes, wherein any stroke consists of a plurality of stroke track points, and the sequence data of the stroke track points of any stroke comprises position information of all stroke track points in any stroke;
performing character recognition processing on the target character based on the sequence data of the stroke track points of each stroke to obtain a reference character;
according to the sequence data of the stroke track points of each stroke and the writing sequence of each stroke, carrying out feature extraction processing on the target character to obtain morphological features between every two strokes in at least one stroke forming the target character;
the method comprises the steps of obtaining morphological characteristics between every two strokes in at least one stroke of a reference character with a composition body structure as the reference body structure, and comparing the morphological characteristics between every two strokes of the obtained reference character with morphological characteristics between two corresponding strokes of the target character to obtain a body recognition result of the target character, wherein the body recognition result is used for indicating the matching degree of the body structure of the target character and the reference body structure.
2. The method according to claim 1, wherein the method further comprises:
acquiring the writing sequence of each stroke of a reference character with the writing sequence being the reference writing sequence;
comparing the writing sequence of each stroke of the reference character with the writing sequence of the corresponding stroke of the target character to obtain a sequence recognition result of the target character, wherein the sequence recognition result is used for indicating the matching degree of the writing sequence of each stroke of the target character and the reference writing sequence.
3. The method according to claim 1, wherein the performing character recognition processing on the target character based on the sequence data of the stroke track points of the respective strokes to obtain a reference character includes:
carrying out direction characteristic extraction processing on the sequence data of the stroke track points of each stroke to obtain direction track characteristics of the target character in a plurality of preset directions;
splicing the plurality of direction track features to obtain target track features;
searching characters with track characteristics matched with the target track characteristics in a preset database to obtain the reference characters.
4. A method according to claim 3, characterized in that the method further comprises:
When no character with the track characteristic matched with the target track characteristic is found in a preset database, generating a character recognition result for indicating the writing error of the target character;
outputting the character recognition result to prompt the target object to write again;
and re-writing the character obtained by the target object to obtain a target character.
5. The method of claim 1, wherein the determining the written target character comprises:
outputting prompt information, wherein the prompt information is used for prompting a target object to write preset characters on a writing interface;
responding to writing operation on the writing interface, and acquiring a target character obtained by writing the target object;
the method further comprises the steps of:
and if the similarity between the preset character and the reference character is greater than a preset threshold, triggering to execute the sequence data according to the stroke track points of all the strokes and the writing sequence of all the strokes, performing feature extraction processing on the target character to obtain morphological features between every two strokes in at least one stroke forming the target character, acquiring morphological features between every two strokes in at least one stroke forming the reference character with a body structure being the reference body structure, and comparing the morphological features between every two strokes of the acquired reference character with the morphological features between the two corresponding strokes of the target character to obtain a body recognition result of the target character.
6. The method of any of claims 1-5, wherein the obtaining morphological features between every two strokes of at least one stroke of a reference character that constitutes a body structure that is a reference body structure comprises:
acquiring sequence data of stroke track points of all strokes in at least one stroke of a reference character with a composition body structure as a reference body structure, and writing sequence of all strokes of the reference character;
carrying out change processing on the reference character to obtain a deformed character, wherein the deformed character is different from the reference character in size or shape;
according to the sequence data of the stroke track points of all the strokes of the reference character and the writing sequence of all the strokes of the reference character, carrying out feature extraction processing on the reference character to obtain original morphological features between every two strokes in at least one stroke forming the reference character;
according to the sequence data of the stroke track points of all strokes of the deformed character and the writing sequence of all strokes of the deformed character, carrying out feature extraction processing on the deformed character to obtain the change morphological features between every two strokes in at least one stroke forming the deformed character;
And carrying out weighted addition on the original morphological characteristics between every two strokes in the reference character and the changed morphological characteristics between every two strokes corresponding to the deformed character to obtain the morphological characteristics between every two strokes in at least one stroke forming the reference character.
7. The method of claim 6, wherein the performing a change process on the reference character to obtain a deformed character comprises:
adjusting the spatial characteristics of the reference character to obtain an adjusted reference character, wherein the spatial characteristics of the adjusted reference character are preset spatial characteristics;
and adjusting the size or the shape of the adjusted reference character to obtain the deformed character.
8. A data processing apparatus, characterized in that the data processing apparatus comprises an acquisition unit and a processing unit, wherein:
the acquisition unit is used for determining a target character obtained by writing a target object, acquiring sequence data of stroke track points of all strokes in at least one stroke forming the target character, and writing sequence of all strokes, wherein any stroke consists of a plurality of stroke track points, and the sequence data of the stroke track points of any stroke comprises position information of all stroke track points in any stroke;
The processing unit is used for carrying out character recognition processing on the target character based on the sequence data of the stroke track points of each stroke to obtain a reference character;
the processing unit is further used for carrying out feature extraction processing on the target character according to the sequence data of the stroke track points of each stroke and the writing sequence of each stroke to obtain morphological features between every two strokes in at least one stroke forming the target character;
the processing unit is further configured to obtain morphological features between every two strokes in at least one stroke of a reference character with a body structure being a reference body structure, and compare the obtained morphological features between every two strokes of the reference character with morphological features between two corresponding strokes of the target character to obtain a body recognition result of the target character, where the body recognition result is used to indicate a matching degree of the body structure of the target character and the reference body structure.
9. A computer device, comprising:
a processor adapted to implement one or more computer programs;
computer storage medium storing one or more computer programs adapted to be loaded by the processor and to perform the data processing method according to any of claims 1-7.
10. A computer storage medium, characterized in that it stores one or more computer programs adapted to be loaded by a processor and to perform the data processing method according to any of claims 1-7.
11. A computer program product, characterized in that the computer program product comprises a computer program adapted to be loaded by a processor and to perform the data processing method according to any of claims 1-7.
CN202211161992.7A 2022-09-22 2022-09-22 Data processing method, related device, storage medium and computer product Pending CN117789227A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211161992.7A CN117789227A (en) 2022-09-22 2022-09-22 Data processing method, related device, storage medium and computer product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211161992.7A CN117789227A (en) 2022-09-22 2022-09-22 Data processing method, related device, storage medium and computer product

Publications (1)

Publication Number Publication Date
CN117789227A true CN117789227A (en) 2024-03-29

Family

ID=90378592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211161992.7A Pending CN117789227A (en) 2022-09-22 2022-09-22 Data processing method, related device, storage medium and computer product

Country Status (1)

Country Link
CN (1) CN117789227A (en)

Similar Documents

Publication Publication Date Title
CN110795543B (en) Unstructured data extraction method, device and storage medium based on deep learning
US10769487B2 (en) Method and device for extracting information from pie chart
RU2661750C1 (en) Symbols recognition with the use of artificial intelligence
US10115215B2 (en) Pairing fonts for presentation
CN110009027B (en) Image comparison method and device, storage medium and electronic device
CN107169485B (en) Mathematical formula identification method and device
US20210406266A1 (en) Computerized information extraction from tables
CN109284355B (en) Method and device for correcting oral arithmetic questions in test paper
US11461638B2 (en) Figure captioning system and related methods
CN112819686B (en) Image style processing method and device based on artificial intelligence and electronic equipment
CN111507330B (en) Problem recognition method and device, electronic equipment and storage medium
CN109189895B (en) Question correcting method and device for oral calculation questions
CN113722474A (en) Text classification method, device, equipment and storage medium
CN112347997A (en) Test question detection and identification method and device, electronic equipment and medium
CN115641308A (en) Calligraphy character copying evaluation system
CN113657098B (en) Text error correction method, device, equipment and storage medium
CN115984875B (en) Stroke similarity evaluation method and system for hard-tipped pen regular script copy work
CN113505786A (en) Test question photographing and judging method and device and electronic equipment
CN113053395A (en) Pronunciation error correction learning method and device, storage medium and electronic equipment
JP7293658B2 (en) Information processing device, information processing method and program
Chu et al. Supporting Chinese Character Educational Interfaces with Richer Assessment Feedback through Sketch Recognition.
CN117789227A (en) Data processing method, related device, storage medium and computer product
US20230084641A1 (en) Math detection in handwriting
WO2022126917A1 (en) Deep learning-based face image evaluation method and apparatus, device, and medium
CN110232847B (en) Copybook information generation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination