WO2022206404A1 - 练字格检测方法、装置、可读介质及电子设备 - Google Patents
练字格检测方法、装置、可读介质及电子设备 Download PDFInfo
- Publication number
- WO2022206404A1 WO2022206404A1 PCT/CN2022/081444 CN2022081444W WO2022206404A1 WO 2022206404 A1 WO2022206404 A1 WO 2022206404A1 CN 2022081444 W CN2022081444 W CN 2022081444W WO 2022206404 A1 WO2022206404 A1 WO 2022206404A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- character practice
- character
- grid
- practice
- frame
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000001514 detection method Methods 0.000 claims abstract description 120
- 238000012549 training Methods 0.000 claims description 41
- 238000012545 processing Methods 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 25
- 230000000717 retained effect Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 238000002372 labelling Methods 0.000 description 4
- 238000012552 review Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Definitions
- the present disclosure relates to the field of image processing, and in particular, to a method, a device, a readable medium, and an electronic device for detecting a character practice frame.
- the position detection and classification of the practice grid is very important for the entire intelligent review process.
- the character practice grid it can assist in judging the inclination and size of characters, judge and classify whether the character grid contains characters, and assist in correcting the perspective transformation generated by taking pictures, making subsequent reviews more accurate.
- related technologies are mainly based on straight line detection such as Hough transform, contour detection or corner detection, combined with rectangular geometric features to filter noise to obtain the final character practice grid detection result.
- the existing character practice grid detection algorithm has poor robustness to different backgrounds and different illuminations, is more sensitive to the parameter selection in rectangle detection, has low detection accuracy and slow detection speed, and the existing character practice frame detection algorithm is also It does not directly support the judgment and classification of whether the detection frame contains handwritten characters and whether it is a complete field.
- the present disclosure provides a method for detecting a character practice grid, the method comprising:
- the position information and character practice frame type of each character practice frame in the target image are determined, and the character practice frame type is a complete character practice frame containing characters, a complete character practice frame without characters Any one of the character practice frame of the character, the incomplete character practice frame containing the character, and the incomplete character practice frame without the character.
- the present disclosure provides a character practice grid detection device, the device comprising:
- the acquisition module is used to acquire the target image
- the processing module is used to determine the position information and the type of each character practice frame in the target image according to the pre-trained character practice frame detection model, and the character practice frame type is a complete and word-containing character practice frame , a complete and no character practice grid, an incomplete character practice frame with characters, and an incomplete character practice frame without characters.
- the present disclosure further provides a non-transitory computer-readable medium, on which a computer program is stored, and when the program is executed by a processing apparatus, implements the steps of the above-described method.
- the present disclosure also provides an electronic device, comprising:
- a processing device is configured to execute the computer program in the storage device to implement the steps of the above method.
- the present disclosure also provides a computer program that implements the steps of the above-described method when the program is executed by a processing device.
- the present disclosure also provides a computer program product having a computer program stored thereon, which implements the steps of the above-described method when the program is executed by a processing device.
- any uploaded image can be detected through the pre-trained character practice frame detection model, and the position and type of each character practice frame in the target image can be detected only through the image information, so that the user does not need to be able to obtain
- the position and type of the character practice grid on the grid paper are judged, which not only enables the user to clearly obtain the character practice state in the target image, but also facilitates subsequent operations such as cropping the user's writing content, and also facilitates the user's writing practice grid.
- the type judgment result can also be used to intelligently detect the completion rate of word practice, or remind the user whether to re-shoot, etc., thus greatly improving the user experience.
- FIG. 1 is a flow chart of a method for detecting character practice grids according to an exemplary embodiment of the present disclosure.
- FIG. 2 is a schematic diagram of four different types of character practice frames in a method for detecting character practice frames according to yet another exemplary embodiment of the present disclosure.
- FIG. 3 is a flow chart of a method for detecting character practice grids according to another exemplary embodiment of the present disclosure.
- Fig. 4 is a flow chart of a method for detecting character practice grids according to another exemplary embodiment of the present disclosure.
- Fig. 5 is a flow chart of a method for detecting a character practice grid according to another exemplary embodiment of the present disclosure.
- FIG. 6a is a schematic diagram of a labeling manner of four corner points of a character practice grid in training data during training of a character practice grid detection model in a character practice grid detection method according to an exemplary embodiment of the present disclosure.
- Fig. 6b is a schematic diagram showing the sequence of four corner points of a character practice frame detected by a character practice frame detection model in a character practice frame detection method according to an exemplary embodiment of the present disclosure.
- Fig. 7 is a flow chart of an apparatus for detecting character practice grids according to an exemplary embodiment of the present disclosure.
- FIG. 8 shows a schematic structural diagram of an electronic device suitable for implementing an embodiment of the present disclosure.
- the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
- the term “based on” is “based at least in part on.”
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
- FIG. 1 is a flow chart of a method for detecting character practice grids according to an exemplary embodiment of the present disclosure. As shown in FIG. 1 , the method includes step 101 and step 102 .
- a target image is acquired.
- the target image may be an image in any format captured by the user with any device. For example, in the scene of handwriting practice, most of the students need to write on the grid paper, and then shoot and upload the writing content to the teacher for comment. At this time, the image obtained by the student photographing the writing content on the grid paper can be the target image.
- step 102 according to the pre-trained character practice frame detection model, the position information and character practice frame type of each character practice frame in the target image are determined, and the character practice frame type is a complete and character-containing character practice frame , a complete and no character practice grid, an incomplete character practice frame with characters, and an incomplete character practice frame without characters.
- the character practice grid detection model can be used to detect any type of character practice grid, for example, the character practice grid can be a field character grid, a square grid, or a rice character grid.
- the position information of each character practice frame detected by the character practice frame detection model can be represented by, for example, the coordinates of the four corners of the character practice frame, or the coordinates of the center point of the character practice frame and the four corners of the character practice frame.
- the coordinates of each corner point are represented by the offset relative to the coordinates of the center point, etc.
- the type of the character practice grid can be any one of the above four types.
- the above-mentioned four kinds of character practice grid types are given examples respectively in Fig. 2, as shown in Fig. 2, including complete and character practice frame 1, complete and without character practice frame 3, incomplete and There are four types of practice grids with characters 2, incomplete without characters 4.
- the character training grid detection model can be a deep learning model based on a convolutional neural network, and is obtained by training according to annotated training samples through a supervised training method.
- any uploaded image can be detected through the pre-trained character practice frame detection model, and the position and type of each character practice frame in the target image can be detected only through the image information, so that the user does not need to be able to obtain
- the position and type of the character practice grid on the grid paper are judged, which not only enables the user to clearly obtain the character practice state in the target image, but also facilitates subsequent operations such as cropping the user's writing content, and also facilitates the user's writing practice grid.
- the type judgment result can also be used to intelligently detect the completion rate of word practice, or remind the user whether to re-shoot, etc., thus greatly improving the user experience.
- FIG. 3 is a flow chart of a method for detecting character practice grids according to another exemplary embodiment of the present disclosure. As shown in FIG. 3 , the method includes step 301 and step 302 .
- step 301 the position information of the detected character practice frame and the character practice frame type to which the detected character practice frame belongs are obtained through the character practice frame detection model, and the character practice frame detection model is determined by the character practice frame detection model The detected one or more character practice frames.
- step 302 according to the position information, the detected character practice frames whose overlapping degree satisfies the preset condition are deleted, and the retained detected character practice frames are determined as the character practice frames existing in the target image.
- step 201 the process of obtaining the position information of the detected character practice frame and the character practice frame type to which it belongs by using the character practice frame detection model may be shown in steps 401 to 404 shown in FIG. 4 .
- Fig. 4 is a flow chart of a method for detecting character practice grids according to another exemplary embodiment of the present disclosure. As shown in FIG. 4 , the method includes steps 401 to 404 .
- step 401 the first feature map of four channels is output through the character practice grid detection model, the four channels are used to distinguish the type of the character practice frame, and the first feature map of each channel is used to represent the The confidence level of the center point of the character practice grid of each pixel in the target image belonging to the character practice grid type corresponding to the channel.
- the output of the first channel may be the first feature map belonging to a complete and character-containing practice frame
- the output of the second channel may be the first feature map of a complete and non-character practice frame
- the third channel The output can be the first feature map that belongs to the incomplete and character-containing practice frame
- the output of the fourth channel can be the first feature map of the incomplete and no-character practice frame, then any character map in the target image.
- the confidence of a pixel on the first feature map output by the four channels can be a four-dimensional classification vector (0.5, 0.3, 0.1, 0.1). The order of the confidence is the same as the order of the channels.
- 0.5 represents the pixel
- the probability that a point belongs to the center point of a complete and character-containing character practice grid is 0.5
- 0.3 means that the probability of the pixel belonging to the center point of a complete and non-character practice frame is 0.3, and so on.
- step 402 according to the confidence level, local maxima are respectively found in all the pixels in the first feature map of the four channels, and all local maxima in the first feature map of each channel are searched according to Sort from large to small, and reserve the pixels corresponding to the first N local maxima as the center point of the character detection grid, where N is a first preset threshold, and N is an integer greater than 1.
- a corresponding decoding operation needs to be performed to output the position information of the detected character practice frame.
- This decoding operation is also the above-mentioned operation of finding a local maximum.
- the local maxima are searched according to the confidence in the four first feature maps output by the four channels respectively, and all local maxima found in the four first feature maps are sorted in descending order. Therefore, among all the local maxima in the four first feature maps, the largest top N is found as the center point of the character detection practice grid.
- the value of N may vary according to the size of the target image, for example, it may be 300.
- the four first feature maps output by the four channels corresponding to different types of calligraphy grids use higher confidence to represent the higher probability that the point belongs to the center point, the above local maximum value By searching, the center points of different types of calligraphy practice grids can be found respectively.
- the center points whose confidence is less than the threshold of the center point can be removed, and only the first N local maxima, the confidence
- the center point whose degree is not less than the threshold value of the center point is used as the center point of the detection practice grid.
- the center point threshold may be, for example, 0.5.
- step 403 output the second feature map of eight channels through the character practice grid detection model, and the eight channels are respectively used to represent each pixel in the target image as the center point of the character practice frame When , the offset of the four corner points relative to the coordinates of the pixel point.
- the character training grid detection model is a multi-task neural network model, which not only outputs the above four feature maps to determine the center point coordinates and type of the detected character training grid, but also outputs each pixel as the center point. , the offset of the four corners of the character practice grid relative to the coordinates of the pixel.
- the coordinates of the kth pixel in the target image are [x km , y km ]
- the coordinates of the four corners of the character practice grid can be [x km , y km ] k0 ,y k0 ,x k1 ,y k1 ,x k2 ,y k2 ,x k3 ,y k3 ], where,
- the values corresponding to the pixel points are:
- step 404 the position information of the detected character practice grid is determined according to the second feature map of the eight channels and the center point of the detected character practice grid, and the center point of the detected character practice grid is located at the location where the detected character practice grid is located.
- the corresponding confidence levels in the first feature maps of the four channels determine the type of character practice frame to which the detected character practice frame belongs.
- the offset information included in the second feature map of the eight channels output in step 303 can determine that the center point corresponds to The coordinates of the four corners of the detection practice grid.
- the specific confidence value in the feature map is used to determine the type to which the center point belongs, that is, the type of the detected character practice grid corresponding to the center point.
- the preset classification threshold is In the case of 0.5, the center point and the character practice frame type to which the detected character practice frame corresponding to the center point belongs is a complete character practice frame with characters.
- the confidence levels of multiple categories in the four-dimensional classification vector formed by the respective confidence levels of the center point in a certain detection practice grid on the first feature map output by the four channels are greater than the above-mentioned classification threshold, the The category corresponding to the vector with the highest confidence is determined as the center point and the type of the character practice frame to which the detected character practice frame corresponding to the center point belongs.
- the point can be are identified as belonging to the background point.
- ck is one of the above 4 channels to which the kth point belongs
- ck ⁇ ⁇ 0,1,2,3 ⁇ is the value of the kth point in the first feature map generated by the kth channel
- all points correspond to value constitutes the first feature map generated by the ck channel.
- r k is the length of the shorter side in the smallest circumscribed rectangle of the character practice grid corresponding to the kth point.
- the character training grid detection model can ensure the accuracy of the output feature map by outputting the quantization error prediction feature map.
- Fig. 5 is a flow chart of a method for detecting a character practice grid according to another exemplary embodiment of the present disclosure. As shown in FIG. 5 , the method includes step 501 and step 502 .
- step 501 the intersection ratio between the detected character practice frames is calculated.
- step 502 when the intersection ratio of the two detected character practice frames is greater than the second preset threshold, it is determined that the detected character practice frame corresponding to the center point of which the confidence is lower delete.
- the detection character practice frames can be screened by calculating the intersection ratio between the detection intervals.
- the detected character practice grid is deleted from the candidate list, and finally returns to the above step of "selecting the detected character practice grid with the highest confidence from the candidate list and added to the output list", and repeatedly deletes the detected character practice grid in the candidate list until Until there is no character practice frame detected in the candidate list.
- all the detected character practice frames in the output list are regarded as the reserved character detection practice frames, and they can be determined as the character practice frames existing in the target image.
- the center point processing is not performed according to the center point threshold in the process of confirming the center point, in order to ensure the accuracy of the center point, before performing the above-mentioned comparison of the intersection ratio, the Among the center points of all the detected character practice grids detected by the character practice grid detection model, the detected character practice grids whose confidence is less than the above-mentioned center point threshold are deleted, and only the detected character practice grids corresponding to the center points whose confidence is not less than the center point threshold are retained. Then, the subsequent processing of the cross-merger ratio is performed to further refine the detection accuracy of the character practice grid.
- the center point threshold may be, for example, 0.5.
- the center point of the character practice grid in the training sample image is too few compared with other non-central pixels, and the training is performed according to the conventional training method.
- the difference between the actual proportion of negative samples is too large, which is not conducive to the training of the model. Therefore, when training the character practice grid detection model, the selection ratio of positive and negative samples in the training samples can be controlled as the target ratio, wherein the positive sample is the value in the first feature map output by the character practice grid detection model that is not zero , and the negative sample is the pixel whose value is zero in the first feature map output by the character training grid detection model.
- the positive sample is the value in the first feature map output by the character practice grid detection model that is not zero
- the negative sample is the pixel whose value is zero in the first feature map output by the character training grid detection model.
- the number of positive samples that are three times larger than the number of positive samples can be selected as negative samples in order of loss value from high to low.
- the loss function used in the character training grid detection model to detect and train the center point of the character training grid may be Dice loss.
- the loss function used for detection training can be smooth l1 loss.
- the labels in the training samples of the character practice grid detection model are the coordinates of the four corners of all the character practice grids in the training sample arranged in sequence. For example, it can be in the order of upper left corner, upper right corner, lower right corner and lower left corner.
- the coordinates of the center point of each character practice grid can be obtained indirectly, and the center point of the character practice grid can also be learned.
- the offsets of the four corner points relative to the center point detected by the trained character practice grid detection model will also be output in the order of the above labeling.
- the correct direction of the character practice frame can also be obtained by detecting the character practice frame, as shown in Figure 6a and Figure 6b, if the character practice frame is in the correct direction
- the grid detection model is trained, it is annotated in the order shown in Figure 6a, then when the target image as shown in Figure 6b is detected, the correct four characters can be accurately detected for the character practice grid including text. order of corners.
- the numbers shown at the four corners of the character practice grid in Figures 6a and 6b are the sequence numbers of the corners. In this way, it is further convenient for subsequent recognition of the characters in the character practice grid, or reminding the user that the image direction is not correct, and the like.
- FIG. 7 is a structural block diagram of a character practice grid detection apparatus according to an exemplary embodiment of the present disclosure.
- the device includes: an acquisition module 10 for acquiring a target image; a processing module 20 for determining the position of each character practice frame in the target image according to a pre-trained character practice frame detection model Information and the type of character practice form, the type of character practice form is complete character practice frame with characters, complete character practice frame without characters, incomplete character practice frame with characters, incomplete character practice frame without characters Any of the word practice grids.
- any uploaded image can be detected through the pre-trained character practice frame detection model, and the position and type of each character practice frame in the target image can be detected only through the image information, so that the user does not need to be able to obtain
- the position and type of the character practice grid on the grid paper are judged, which not only enables the user to clearly obtain the character practice state in the target image, but also facilitates subsequent operations such as cropping the user's writing content, and also facilitates the user's writing practice grid.
- the type judgment result can also be used to intelligently detect the completion rate of word practice, or remind the user whether to re-shoot, etc., thus greatly improving the user experience.
- the processing module 10 includes: a detection sub-module, configured to obtain the position information of the detected character practice frame and the character practice frame to which the detected character practice frame belongs through the character practice frame detection model Type, the detection practice frame is one or more practice frames detected by the character practice frame detection model; the deduplication submodule is used to detect and practice the overlapping degree that satisfies the preset condition according to the position information.
- the character grid is deleted, and the remaining detected character practice frame is determined as the character practice frame existing in the target image.
- the detection sub-module is configured to: output the first feature map of four channels through the character practice grid detection model, and the four channels are used to distinguish the character practice grid types,
- the first feature map of each channel is used to represent the confidence of each pixel in the target image belonging to the center point of the character practice grid corresponding to the channel; Find local maxima in all pixels in a feature map, sort all local maxima in the first feature map of each channel in descending order, and keep the pixels corresponding to the first N local maxima
- the N is a first preset threshold
- the second feature map of eight channels is output through the character practice grid detection model, and the eight channels are respectively used to represent the When each pixel point in the target image is used as the center point of the character practice grid, the offset of the four corner points relative to the coordinates of the pixel point; according to the second feature map of the eight channels and the detection
- the center point of the character practice grid determines the position information of the detection character practice frame, and determines the detection practice character
- the deduplication submodule is configured to: calculate the intersection ratio between the detected character practice boxes; when the intersection ratio of two detected character practice frames is greater than the second In the case of a preset threshold, it is determined to delete the detected character practice grid corresponding to the center point with the lower confidence.
- the selection ratio of positive and negative samples in the training samples is controlled as the target ratio, wherein the positive sample is the first output of the character practice grid detection model.
- a pixel whose value is not zero in a feature map, and a negative sample is a pixel whose value is zero in the first feature map output by the character training grid detection model.
- the labels in the training samples of the character practice grid detection model are the coordinates arranged in order of the four corners of all the character practice grids in the training sample.
- Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like.
- the electronic device shown in FIG. 8 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
- an electronic device 800 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 801 that may be loaded into random access according to a program stored in a read only memory (ROM) 802 or from a storage device 808 Various appropriate actions and processes are executed by the programs in the memory (RAM) 803 . In the RAM 803, various programs and data required for the operation of the electronic device 800 are also stored.
- the processing device 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
- An input/output (I/O) interface 805 is also connected to bus 804 .
- I/O interface 805 input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 807 of a computer, etc.; a storage device 808 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 809.
- Communication means 809 may allow electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 8 shows an electronic device 800 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
- the computer program may be downloaded and installed from the network via the communication device 809, or from the storage device 808, or from the ROM 802.
- the processing device 801 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
- the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
- the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
- Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
- the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects.
- HTTP HyperText Transfer Protocol
- Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires a target image; determines the target according to a pre-trained character practice grid detection model
- the position information and the type of each character practice frame in the image, the character practice frame type is a complete character practice frame with characters, a complete character practice frame without characters, and an incomplete character practice frame with characters Any of the character practice frames that are incomplete and contain no characters. .
- Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).
- LAN local area network
- WAN wide area network
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
- the modules involved in the embodiments of the present disclosure may be implemented in software or hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances, for example, the acquisition module may also be described as "a module for acquiring target images".
- exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs Systems on Chips
- CPLDs Complex Programmable Logical Devices
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
- Example 1 provides a method for detecting a character practice frame, the method comprising: acquiring a target image; determining each character in the target image according to a pre-trained character practice frame detection model
- the position information of the character practice grid and the character practice frame type, and the character practice frame type is a complete character practice frame with characters, a complete character practice frame without characters, an incomplete character practice frame with characters, and an incomplete character practice frame. and does not contain any of the character practice grids.
- Example 2 provides the method of Example 1, wherein the position information and the character practice frame of each character practice frame in the target image are determined according to a pre-trained character practice frame detection model Types include:
- the position information of the detected character practice frame and the character practice frame type to which the detected character practice frame belongs are obtained through the character practice frame detection model, and the detected character practice frame is a character detected by the character practice frame detection model. or multiple practice frames;
- the detected character practice frames whose overlapping degree satisfies the preset condition are deleted, and the retained detected character practice frames are determined as the character practice frames existing in the target image.
- Example 3 provides the method of Example 2, wherein the location information of the detected character practice frame and the character practice frame to which the detected character practice frame belongs are obtained through the character practice frame detection model Types include:
- the first feature map of four channels is output through the character training grid detection model, the four channels are used to distinguish the type of the character training grid, and the first feature map of each channel is used to represent each pixel in the target image.
- local maxima are respectively found in all the pixels in the first feature map of the four channels, and all local maxima in the first feature map of each channel are in descending order Sorting, and retaining the pixels corresponding to the first N local maximum values as the center point of the detection practice grid, where N is a first preset threshold, and N is an integer greater than 1;
- the second feature map of eight channels is output through the character practice grid detection model, and the eight channels are respectively used to represent each pixel in the target image as the center point of the character practice grid.
- the four corners are The offset of the point relative to the coordinates of the pixel point;
- the position information of the detected character practice grid is determined according to the second feature map of the eight channels and the center point of the detected character practice grid, and the center point of the detected character practice grid is located in the four channels according to the center point of the detected character practice grid.
- the corresponding confidence levels in the first feature map determine the type of the character practice frame to which the detected character practice frame belongs.
- Example 4 provides the method of Example 2, and the preset condition is that the intersection ratio between any two of the detected character practice frames is greater than a second preset threshold, the Deleting the detection practice grids whose overlapping degree satisfies the preset condition according to the position information includes:
- intersection ratio of the two detected character practice frames is greater than the second preset threshold, it is determined to delete the detected character practice frame corresponding to the center point with the lower confidence.
- Example 5 provides the method of Example 3.
- the selection ratio of positive and negative samples in the training samples is controlled as the target ratio, wherein the positive samples are The negative samples are the pixels whose values are not zero in the first feature map output by the character training grid detection model, and the negative samples are pixels whose values are zero in the first feature map output by the character training grid detection model.
- Example 6 provides the method of Example 1, wherein the marks in the training samples of the character practice grid detection model are the keys of the four corners of all the character practice grids in the training sample. sequenced coordinates.
- Example 7 provides a character practice grid detection device, the device comprising:
- the acquisition module is used to acquire the target image
- the processing module is used to determine the position information and the type of each character practice frame in the target image according to the pre-trained character practice frame detection model, and the character practice frame type is a complete and word-containing character practice frame , a complete and no character practice grid, an incomplete character practice frame with characters, and an incomplete character practice frame without characters.
- Example 8 provides the apparatus of Example 7, the processing module comprising:
- the detection sub-module is used to obtain the position information of the character practice frame and the character practice frame type to which the character practice frame belongs through the character practice frame detection model, and the detection character practice frame is detected by the character practice frame.
- One or more character practice frames detected by the model are used to obtain the position information of the character practice frame and the character practice frame type to which the character practice frame belongs through the character practice frame detection model, and the detection character practice frame is detected by the character practice frame.
- One or more character practice frames detected by the model
- the deduplication submodule is configured to delete the detected character practice grids whose overlapping degree satisfies a preset condition according to the position information, and determine the retained detected character practice grids as the character practice grids existing in the target image.
- Example 9 provides a non-transitory computer-readable medium having stored thereon a computer program that, when executed by a processing apparatus, implements any of Examples 1-6. steps of the method described.
- Example 10 provides an electronic device comprising:
- a processing device configured to execute the computer program in the storage device, to implement the steps of the method in any one of Examples 1-6.
- Example 11 provides a computer program that, when executed by a processing apparatus, implements the steps of the method of any one of Examples 1-6.
- Example 12 provides a computer program product having stored thereon a computer program that, when executed by a processing apparatus, implements the steps of the method of any one of Examples 1-6.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Character Input (AREA)
- Character Discrimination (AREA)
Abstract
本公开涉及一种练字格检测方法、装置、可读介质及电子设备,包括:获取目标图像;根据预先训练好的练字格检测模型确定目标图像中的各个练字格的位置信息及练字格类型,练字格类型为完整且含字的练字格、完整且不含字的练字格、不完整且含字的练字格、不完整且不含字的练字格中的任一者。通过上述技术方案,能够对任意练字格纸张上的练字格进行位置和类型判断,这样不仅使得用户能够清楚地得到该目标图像中的练字状态,而且还能便于后续对用户的书写内容进行剪裁等操作,而且对练字格类型的判断结果还可以用于智能检测练字完成率,或者提醒用户是否需要重新拍摄等,因此大大提高了用户体验。
Description
相关申请的交叉引用
本申请要求于2021年3月30日提交的名称为“练字格检测方法、装置、可读介质及电子设备”的中国专利申请第202110341076.0号的优先权,该申请的公开通过引用被全部结合于此。
本公开涉及图像处理领域,具体地,涉及一种练字格检测方法、装置、可读介质及电子设备。
在智能练字点评技术中,练字格的位置检测及分类对于整个智能点评环节至关重要。以练字格为基准,可辅助判断字的倾斜程度和大小、可判断并分类田字格是否包含字、可辅助矫正拍照产生的透视变换,使得后续点评更精准。对于字帖练字格检测,相关技术来主要基于直线检测如霍夫变换、轮廓检测或角点检测,结合矩形的几何特征进行噪声过滤,得到最终的练字格检测结果。因此现有的练字格检测算法对不同背景、不同光照鲁棒性较差,对于矩形检测中的参数选择比较敏感,检测精度较低,检测速度较慢,而且现有练字格检测算法也不直接支持对检测框进行是否包含手写字、是否为完整田字格的判断及分类。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种练字格检测方法,所述方法包括:
获取目标图像;
根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型,所述练字格类型为完整且含字的练字格、完整且不含字的练字格、不完整且含字的练字格、不完整且不含字的练字格中的任一者。
第二方面,本公开化提供一种练字格检测装置,所述装置包括:
获取模块,用于获取目标图像;
处理模块,用于根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型,所述练字格类型为完整且含字的练字格、完整且不含字的练字格、不完整且含字的练字格、不完整且不含字的练字格中的任一者。
第三方面,本公开还提供一种非瞬态的计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现以上所述方法的步骤。
第四方面,本公开还提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现以上所述方法的步骤。
第五方面,本公开还提供一种计算机程序,该程序被处理装置执行时实现以上所述方法的步骤。
第六方面,本公开还提供一种计算机程序产品,其上存储有计算机程序,该程序被处理装置执行时实现以上所述方法的步骤。
通过上述技术方案,能够通过预先训练好的练字格检测模型对任意上传图像进行检测,仅通过图像信息即能检测得到该目标图像中各个练字格的位置及类型,这样无需用户在能够获取到用户书写信息等具有特定功能的书写设备上进行练字,也无需完全靠人工来对用户上传的图像进行识别判断,并且,还能够对田字格、米字格、方格等任意练字格纸张上的练字格进行位置和类型判断,这样不仅使得用户能够清楚地得到该目标图像中的练字状态,而且还能便于后续对用户的书写内容进行剪裁等操作,而且对练字格类型的判断结果还可以用于智能检测练字完成率,或者提醒用户是否需要重新拍摄等,因此大大提高了用户体验。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:
图1是根据本公开一示例性实施例示出的一种练字格检测方法的流程图。
图2是根据本公开又一示例性实施例示出的一种练字格检测方法中四种不同类型的练字格的示意图。
图3是根据本公开又一示例性实施例示出的一种练字格检测方法的流程图。
图4是根据本公开又一示例性实施例示出的一种练字格检测方法的流程图。
图5是根据本公开又一示例性实施例示出的一种练字格检测方法的流程图。
图6a是根据本公开一示例性实施例示出的一种练字格检测方法中练字格检测模型训练时训练数据中的练字格四个角点的标注方式的示意图。
图6b是根据本公开一示例性实施例示出的一种练字格检测方法中练字格检测模型检测得到的练字格四个角点的顺序的示意图。
图7是根据本公开一示例性实施例示出的一种练字格检测装置的流程图。
图8示出了适于用来实现本公开实施例的电子设备的结构示意图。
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
图1是根据本公开一示例性实施例示出的一种练字格检测方法的流程图。如图1所示,所述方法包括步骤101和步骤102。
在步骤101中,获取目标图像。该目标图像可以是用户用任意设备拍摄得到的任意格式的图像。例如,在手写练字的场景中,大多需要学生在格纸上进行书写之后,拍摄书写 内容上传至老师处进行点评。此时,学生对在格纸上的书写内容进行拍摄所得到的图像即可为该目标图像。
在步骤102中,根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型,所述练字格类型为完整且含字的练字格、完整且不含字的练字格、不完整且含字的练字格、不完整且不含字的练字格中的任一者。
该练字格检测模型可以用于检测任意种类的练字格,例如,该练字格可以为田字格、方格或者米字格等。该练字格检测模型所检测得到的各个练字格的位置信息可以用例如练字格的四个角点的坐标来表示,也可以利用练字格的中心点坐标,以及练字格的四个角点坐标相对于该中心点坐标的偏移量来表示等。
该练字格的类型可以上述四种类型中的任意一种。上述所述四种练字格类型在图2中分别给出了示例,如图2所示,包括完整且含字的练字格1、完整且不含字的练字格3、不完整且含字的练字格2、不完整且不含字4,共四种类型。
该练字格检测模型可以为基于卷积神经网络的深度学习模型,通过有监督的训练方法,根据带标注的训练样本训练得到。
通过上述技术方案,能够通过预先训练好的练字格检测模型对任意上传图像进行检测,仅通过图像信息即能检测得到该目标图像中各个练字格的位置及类型,这样无需用户在能够获取到用户书写信息等具有特定功能的书写设备上进行练字,也无需完全靠人工来对用户上传的图像进行识别判断,并且,还能够对田字格、米字格、方格等任意练字格纸张上的练字格进行位置和类型判断,这样不仅使得用户能够清楚地得到该目标图像中的练字状态,而且还能便于后续对用户的书写内容进行剪裁等操作,而且对练字格类型的判断结果还可以用于智能检测练字完成率,或者提醒用户是否需要重新拍摄等,因此大大提高了用户体验。
图3是根据本公开又一示例性实施例示出的一种练字格检测方法的流程图。如图3所示,所述方法包括步骤301和步骤302。
在步骤301中,通过所述练字格检测模型获取检测练字格的位置信息和所述检测练字格所属的练字格类型,所述检测练字格为由所述练字格检测模型所检测得到的一个或多个练字格。
在步骤302中,根据所述位置信息对重叠程度满足预设条件的检测练字格进行删除,并将保留下来的检测练字格确定为所述目标图像中存在的练字格。
也即,在经过该练字格检测模型直接检测得到的检测练字格并不全部都会被作为该目标图像中所存在的练字格。在得到该练字格检测模型输出的检测练字格的位置信息和所属 练字格类型之后,还会对所有的检测练字格进行筛选,从而使得检测结果与该目标图像中实际存在的练字格相匹配。
步骤201中通过练字格检测模型获取该检测练字格的位置信息和其所属的练字格类型的过程可以如图4所示的步骤401至步骤404所示。
图4是根据本公开又一示例性实施例示出的一种练字格检测方法的流程图。如图4所示,所述方法包括步骤401至步骤404。
在步骤401中,通过所述练字格检测模型输出四个通道的第一特征图,所述四个通道用于区分所述练字格类型,各通道的第一特征图用于表征所述目标图像中各像素点属于该通道对应的练字格类型的练字格中心点的置信度。
例如,第一通道输出的可以为属于完整且含字的练字格的第一特征图,第二通道输出的可以为属于完整且不含字的练字格的第一特征图,第三通道输出的可以为属于不完整且含字的练字格的第一特征图,第四通道输出的可以为不完整且不含字的练字格的第一特征图,则该目标图像中的任一像素点在四个通道输出的第一特征图上分别的置信度可以为四维分类向量(0.5,0.3,0.1,0.1),该置信度的顺序与通道顺序相同,此时,0.5表示该像素点属于完整且含字的练字格的中心点的概率为0.5,0.3表示该像素点属于完整且不含字的练字格的中心点的概率为0.3等等。
在步骤402中,根据所述置信度在所述四个通道的第一特征图中的所有像素点中分别查找局部最大值,并对各通道的第一特征图中的所有局部最大值按照从大到小的顺序排序,保留前N个局部最大值对应的像素点作为所述检测练字格的中心点,所述N为第一预设阈值,所述N为大于1的整数。
在得到上述四个通道中的第一特征图之后,为了确定该目标图像中的具体的中心点,需要进行相应的解码操作,以输出该检测练字格的位置信息。该解码操作也即上述查找局部最大值的操作。在四个通道分别输出的四张第一特征图中分别按照置信度进行局部最大值的查找,并根据四张第一特征图中查找到的所有局部最大值按照从大到小的顺序排序,从而在四张第一特征图中的所有局部最大值中查找到最大的前N个作为该检测练字格的中心点。
该N的取值可以根据该目标图像的尺寸大小的不同而不同,例如可以为300。
由于四个通道所输出的四张对应不同练字格类型的第一特征图中,都是使用更高的置信度表征该点属于中心点的更高的可能性,因此通过上述局部最大值的查找,就能分别将属于不同类型的练字格的中心点都查找到。
其中,为了保证中心点的精度,还可以在确定得到前N个局部最大值对应的像素点之 后,将其中置信度小于中心点阈值的中心点去掉,仅将前N个局部最大值中,置信度不小于该中心点阈值的中心点作为该检测练字格的中心点。该中心点阈值可以为例如0.5。
在步骤403中,通过所述练字格检测模型输出八个通道的第二特征图,所述八个通道分别用于表征所述目标图像中的每一个像素点作为所述练字格中心点时,四个角点相对于所述像素点的坐标的偏移量。
也即,该练字格检测模型为多任务的神经网络模型,不仅会输出上述四个特征图来确定检测练字格的中心点坐标和类型,而且还会输出每个像素点作为中心点时,该练字格的四个角点相对于该像素点坐标的偏移量。
例如,若该目标图像中的第k个像素点的坐标为[x
km,y
km],则该像素点作为练字格中心点时,该练字格的四个角点坐标可以为[x
k0,y
k0,x
k1,y
k1,x
k2,y
k2,x
k3,y
k3],其中,
该八个通道中分别输出的第二特征图中,该像素点对应的数值则分别为:
[x
k0-x
km,y
k0-y
km,x
k1-x
km,y
k1-y
km,x
k2-x
km,y
k2-y
km,x
k3-x
km,y
k3-y
km]。
在步骤404中,根据所述八个通道的第二特征图和所述检测练字格的中心点确定所述检测练字格的位置信息,并根据所述检测练字格的中心点在所述四个通道的第一特征图中分别对应的置信度大小确定所述检测练字格所属的练字格类型。
在步骤401和步骤402中确定了上述检测练字格的中心点之后,在步骤303中输出的八个通道的第二特征图中所包括的偏移量信息,即可确定得到该中心点对应的检测练字格的四个角点的坐标。
并且,由于在确定中心点坐标的四个通道的特征图中,对不同类型的检测练字格也进行区分,因此,可以根据之前确定得到的检测练字格中心点在该四个通道中的特征图中的具体置信度取值来确定该中心点所属的类型,也即该中心点对应的检测练字格所属的类型。例如,若该检测练字格的中心点在四个通道输出的第一特征图上分别的置信度为上述的四维分类向量(0.5,0.3,0.1,0.1),则在预设的分类阈值为0.5的情况下,该中心点以及该中心点对应的检测练字格所属的练字格类型即为完整且含字的练字格。另外,若某一检测练字格中的中心点在四个通道输出的第一特征图上分别的置信度所构成的四维分类向量中有多个类别的置信度都大于上述分类阈值,则将置信度最大的向量对应的类别确定为该中心点及该中心点对应的检测练字格所属的练字格类型。
另外,若该目标图像中的任一像素点在四个通道输出的第一特征图上分别的置信度所 构成的四维分类向量中所有类别的置信度都不大于上述分类阈值,则该点可以被为确定为属于背景点。
在一种可能的实施方式中,为了减少该练字格检测模型对于尺寸较大的目标图像的计算量,可以在该目标图像输入至该练字格检测模型之后,对于将每一个需要单独处理的像素点都映射到缩小R倍的特征图上,通过对缩小后的特征图的处理以得到上述4个通道的输出,具体可以如下述公式所示:
其中,
为缩小映射之后的点坐标,
表示向下取整,c
k为第k个点所属的上述4个通道之一,c
k∈{0,1,2,3},
为第k个点在第c
k个通道所生成的第一特征图中的取值,所有的点对应的
值便构成了第c
k个通道所生成的该第一特征图。
为练字格大小相关的自适应参数,可以为例如
r
k为第k个点对应的练字格的最小外接矩形中较短边的长度。
在对上述像素点进行缩小映射之后,为了保证模型输出结果的精度,该练字格检测模型可以通过输出量化误差预测特征图的方式来保证输出特征图的精度。
图5是根据本公开又一示例性实施例示出的一种练字格检测方法的流程图。如图5所示,所述方法包括步骤501和步骤502。
在步骤501中,计算所述检测练字格之间的交并比。
在步骤502中,在出现两个所述检测练字格的交并比大于所述第二预设阈值的情况时,确定将其中所述置信度较低的中心点对应的检测练字格进行删除。
在上述预设条件为任意两个检测练字格之间的交并比大于第二预设阈值的情况下,可以通过上述计算检测间隔之间的交并比来对检测练字格进行筛选。具体操作中,可以先将所有的检测练字格都按照中心点的置信度降序排列在候选列表中,然后选取置信度最高的检测练字格从候选列表中添加至输出列表中,并将其从候选列表中删除,接着计算输出列表中检测练字格与候选列表中的所有检测练字格之间的交并比,并将交并比大于上述第二预设阈值的候选列表中的所有检测练字格从候选列表中删除,最后返回上述“选取置信度最高的检测练字格从候选列表中添加至输出列表中”的步骤,重复对候选列表中的检测练字格进行删除,直到候选列表中没有检测练字格为止。最后将输出列表中的所有检测练字格都作为保留下来的检测练字格,并将其确定为目标图像中存在的练字格即可。
在一种可能的实施方式中,若在确认中心点的过程中没有根据上述中心点阈值进行中心点的处理,则为了保证中心点的精度,在进行上述交并比的比较之前,还可以将该练字格检测模型检测得到的所有检测练字格的中心点中置信度小于上述中心点阈值的检测练字格删除,仅保留置信度不小于该中心点阈值的中心点对应的检测练字格,进而进行后续的交并比的处理,进一步精确该练字格检测的精度。该中心点阈值可以为例如0.5。
在一种可能的实施方式中,由于练字格检测模型在训练过程中,训练样本图像中的练字格中心点相比于其他非中心点的像素点过少,按照常规训练方法训练时正负样本实际比例差距过大,不利于模型的训练。因此可以在对练字格检测模型训练时,控制训练样本中的正负样本选取比例为目标比例,其中,正样本为所述练字格检测模型输出的第一特征图中的值不为零的像素点,负样本为所述练字格检测模型输出的第一特征图中的值为零的像素点。例如,在正样本较少的情况下,可以选取所有正样本,并在所有的非正样本中按照loss值从高到底排序选择3倍于正样本的数量作为负样本。
在一种可能的实施方式中,所述练字格检测模型中对上述练字格中心点进行检测训练时所使用的损失函数可以为Dice loss,在对练字格四个角点相对中心点的偏移量进行检测训练时所使用的损失函数可以为smooth l1 loss。
在一种可能的实施方式中,所述练字格检测模型的训练样本中的标注为所述训练样本中所有练字格的四个角点的按序排列的坐标。例如,可以按照左上角、右上角、右下角及左下角的顺序。通过上述四个角点的坐标的标注,可以间接得到每个练字格的中心点坐标,进而也可以对练字格的中心点进行学习。通过上述按序的标注方式,训练后的练字格检测模型所检测得到的四个角点相对于中心点的偏移量也会按照上述标注的顺序输出。这样,在该目标图像中出现练字格不正的情况下,也能通过对练字格的检测而得到该练字格的正确方向,如图6a和图6b所示,若在对该练字格检测模型进行训练时,是按照如图6a所示的顺序进行标注,则在对如图6b所示的目标图像进行检测时,对于包括文字的练字格就可以准确的检测出正确的四个角点的顺序。其中,图6a和6b中在练字格四角处示出的数字为该角点的顺序号。这样,能进一步便于后续对练字格中的文字进行识别,或者对用户进行图像方向不正确的提醒等。
图7是根据本公开一示例性实施例示出的一种练字格检测装置的结构框图。如图7所示,所述装置包括:获取模块10,用于获取目标图像;处理模块20,用于根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型,所述练字格类型为完整且含字的练字格、完整且不含字的练字格、不完整且含字的练字格、不完整且不含字的练字格中的任一者。
通过上述技术方案,能够通过预先训练好的练字格检测模型对任意上传图像进行检测,仅通过图像信息即能检测得到该目标图像中各个练字格的位置及类型,这样无需用户在能够获取到用户书写信息等具有特定功能的书写设备上进行练字,也无需完全靠人工来对用户上传的图像进行识别判断,并且,还能够对田字格、米字格、方格等任意练字格纸张上的练字格进行位置和类型判断,这样不仅使得用户能够清楚地得到该目标图像中的练字状态,而且还能便于后续对用户的书写内容进行剪裁等操作,而且对练字格类型的判断结果还可以用于智能检测练字完成率,或者提醒用户是否需要重新拍摄等,因此大大提高了用户体验。
在一种可能的实施方式中,所述处理模块10包括:检测子模块,用于通过所述练字格检测模型获取检测练字格的位置信息和所述检测练字格所属的练字格类型,所述检测练字格为由所述练字格检测模型所检测得到的一个或多个练字格;去重子模块,用于根据所述位置信息对重叠程度满足预设条件的检测练字格进行删除,并将保留下来的检测练字格确定为所述目标图像中存在的练字格。
在一种可能的实施方式中,所述检测子模块用于:通过所述练字格检测模型输出四个通道的第一特征图,所述四个通道用于区分所述练字格类型,各通道的第一特征图用于表征所述目标图像中各像素点属于该通道对应的练字格类型的练字格中心点的置信度;根据所述置信度在所述四个通道的第一特征图中的所有像素点中分别查找局部最大值,并对各通道的第一特征图中的所有局部最大值按照从大到小的顺序排序,保留前N个局部最大值对应的像素点作为所述检测练字格的中心点,所述N为第一预设阈值;通过所述练字格检测模型输出八个通道的第二特征图,所述八个通道分别用于表征所述目标图像中的每一个像素点作为所述练字格中心点时,四个角点相对于所述像素点的坐标的偏移量;根据所述八个通道的第二特征图和所述检测练字格的中心点确定所述检测练字格的位置信息,并根据所述检测练字格的中心点在所述四个通道的特征图中分别对应的置信度大小确定所述检测练字格所属的练字格类型。
在一种可能的实施方式中,所述去重子模块用于:计算所述检测练字格之间的交并比;在出现两个所述检测练字格的交并比大于所述第二预设阈值的情况时,确定将其中所述置信度较低的中心点对应的检测练字格进行删除。
在一种可能的实施方式中,所述练字格检测模型在进行训练时,控制训练样本中的正负样本选取比例为目标比例,其中,正样本为所述练字格检测模型输出的第一特征图中的值不为零的像素点,负样本为所述练字格检测模型输出的第一特征图中的值为零的像素点。
在一种可能的实施方式中,所述练字格检测模型的训练样本中的标注为所述训练样本 中所有练字格的四个角点的按序排列的坐标。
下面参考图8,其示出了适于用来实现本公开实施例的电子设备800的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图8示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图8所示,电子设备800可以包括处理装置(例如中央处理器、图形处理器等)801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储装置808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有电子设备800操作所需的各种程序和数据。处理装置801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。
通常,以下装置可以连接至I/O接口805:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置806;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置807;包括例如磁带、硬盘等的存储装置808;以及通信装置809。通信装置809可以允许电子设备800与其他设备进行无线或有线通信以交换数据。虽然图8示出了具有各种装置的电子设备800,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置809从网络上被下载和安装,或者从存储装置808被安装,或者从ROM 802被安装。在该计算机程序被处理装置801执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合 使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取目标图像;根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型,所述练字格类型为完整且含字的练字格、完整且不含字的练字格、不完整且含字的练字格、不完整且不含字的练字格中的任一者。。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中 所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,获取模块还可以被描述为“获取目标图像的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种练字格检测方法,所述方法包括:获取目标图像;根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型,所述练字格类型为完整且含字的练字格、完整且不含字的练字格、不完整且含字的练字格、不完整且不含字的练字格中的任一者。
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型包括:
通过所述练字格检测模型获取检测练字格的位置信息和所述检测练字格所属的练字格类型,所述检测练字格为由所述练字格检测模型所检测得到的一个或多个练字格;
根据所述位置信息对重叠程度满足预设条件的检测练字格进行删除,并将保留下来的检测练字格确定为所述目标图像中存在的练字格。
根据本公开的一个或多个实施例,示例3提供了示例2的方法,所述通过所述练字格 检测模型获取检测练字格的位置信息和所述检测练字格所属的练字格类型包括:
通过所述练字格检测模型输出四个通道的第一特征图,所述四个通道用于区分所述练字格类型,各通道的第一特征图用于表征所述目标图像中各像素点属于该通道对应的练字格类型的练字格中心点的置信度;
根据所述置信度在所述四个通道的第一特征图中的所有像素点中分别查找局部最大值,并对各通道的第一特征图中的所有局部最大值按照从大到小的顺序排序,保留前N个局部最大值对应的像素点作为所述检测练字格的中心点,所述N为第一预设阈值,所述N为大于1的整数;
通过所述练字格检测模型输出八个通道的第二特征图,所述八个通道分别用于表征所述目标图像中的每一个像素点作为所述练字格中心点时,四个角点相对于所述像素点的坐标的偏移量;
根据所述八个通道的第二特征图和所述检测练字格的中心点确定所述检测练字格的位置信息,并根据所述检测练字格的中心点在所述四个通道的第一特征图中分别对应的置信度大小确定所述检测练字格所属的练字格类型。
根据本公开的一个或多个实施例,示例4提供了示例2的方法,所述预设条件为任意两个所述检测练字格之间的交并比大于第二预设阈值,所述根据所述位置信息对重叠程度满足预设条件的检测练字格进行删除包括:
计算所述检测练字格之间的交并比;
在出现两个所述检测练字格的交并比大于所述第二预设阈值的情况时,确定将其中所述置信度较低的中心点对应的检测练字格进行删除。
根据本公开的一个或多个实施例,示例5提供了示例3的方法,所述练字格检测模型在进行训练时,控制训练样本中的正负样本选取比例为目标比例,其中,正样本为所述练字格检测模型输出的第一特征图中的值不为零的像素点,负样本为所述练字格检测模型输出的第一特征图中的值为零的像素点。
根据本公开的一个或多个实施例,示例6提供了示例1的方法,所述练字格检测模型的训练样本中的标注为所述训练样本中所有练字格的四个角点的按序排列的坐标。
根据本公开的一个或多个实施例,示例7提供了一种练字格检测装置,所述装置包括:
获取模块,用于获取目标图像;
处理模块,用于根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型,所述练字格类型为完整且含字的练字格、完整且不含字的练字 格、不完整且含字的练字格、不完整且不含字的练字格中的任一者。
根据本公开的一个或多个实施例,示例8提供了示例7的装置,所述处理模块包括:
检测子模块,用于通过所述练字格检测模型获取检测练字格的位置信息和所述检测练字格所属的练字格类型,所述检测练字格为由所述练字格检测模型所检测得到的一个或多个练字格;
去重子模块,用于根据所述位置信息对重叠程度满足预设条件的检测练字格进行删除,并将保留下来的检测练字格确定为所述目标图像中存在的练字格。
根据本公开的一个或多个实施例,示例9提供了一种非瞬态的计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现示例1-6中任一项所述方法的步骤。
根据本公开的一个或多个实施例,示例10提供了一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现权示例1-6中任一项所述方法的步骤。
根据本公开的一个或多个实施例,示例11提供了一种计算机程序,该程序被处理装置执行时实现示例1-6中任一项所述方法的步骤。
根据本公开的一个或多个实施例,示例12提供了一种计算机程序产品,其上存储有计算机程序,该程序被处理装置执行时实现示例1-6中任一项所述方法的步骤。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其 中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
Claims (12)
- 一种练字格检测方法,所述方法包括:获取目标图像;根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型,所述练字格类型为完整且含字的练字格、完整且不含字的练字格、不完整且含字的练字格、不完整且不含字的练字格中的任一者。
- 根据权利要求1所述的方法,其中,所述根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型包括:通过所述练字格检测模型获取检测练字格的位置信息和所述检测练字格所属的练字格类型,所述检测练字格为由所述练字格检测模型所检测得到的一个或多个练字格;根据所述位置信息对重叠程度满足预设条件的检测练字格进行删除,并将保留下来的检测练字格确定为所述目标图像中存在的练字格。
- 根据权利要求2所述的方法,其中,所述通过所述练字格检测模型获取检测练字格的位置信息和所述检测练字格所属的练字格类型包括:通过所述练字格检测模型输出四个通道的第一特征图,所述四个通道用于区分所述练字格类型,各通道的第一特征图用于表征所述目标图像中各像素点属于该通道对应的练字格类型的练字格中心点的置信度;根据所述置信度在所述四个通道的第一特征图中的所有像素点中分别查找局部最大值,并对各通道的第一特征图中的所有局部最大值按照从大到小的顺序排序,保留前N个局部最大值对应的像素点作为所述检测练字格的中心点,所述N为第一预设阈值,所述N为大于1的整数;通过所述练字格检测模型输出八个通道的第二特征图,所述八个通道分别用于表征所述目标图像中的每一个像素点作为所述练字格中心点时,四个角点相对于所述像素点的坐标的偏移量;根据所述八个通道的第二特征图和所述检测练字格的中心点确定所述检测练字格的位置信息,并根据所述检测练字格的中心点在所述四个通道的第一特征图中分别对应的置信度大小确定所述检测练字格所属的练字格类型。
- 根据权利要求2所述的方法,其中,所述预设条件为任意两个所述检测练字格之间的交并比大于第二预设阈值,所述根据所述位置信息对重叠程度满足预设条件的检测练字格进行删除包括:计算所述检测练字格之间的交并比;在出现两个所述检测练字格的交并比大于所述第二预设阈值的情况时,确定将其中所述置信度较低的中心点对应的检测练字格进行删除。
- 根据权利要求3所述的方法,其中,所述练字格检测模型在进行训练时,控制训练样本中的正负样本选取比例为目标比例,其中,正样本为所述练字格检测模型输出的第一特征图中的值不为零的像素点,负样本为所述练字格检测模型输出的第一特征图中的值为零的像素点。
- 根据权利要求1所述的方法,其中,所述练字格检测模型的训练样本中的标注为所述训练样本中所有练字格的四个角点的按序排列的坐标。
- 一种练字格检测装置,所述装置包括:获取模块,用于获取目标图像;处理模块,用于根据预先训练好的练字格检测模型确定所述目标图像中的各个练字格的位置信息及练字格类型,所述练字格类型为完整且含字的练字格、完整且不含字的练字格、不完整且含字的练字格、不完整且不含字的练字格中的任一者。
- 根据权利要求7所述的装置,所述处理模块包括:检测子模块,用于通过所述练字格检测模型获取检测练字格的位置信息和所述检测练字格所属的练字格类型,所述检测练字格为由所述练字格检测模型所检测得到的一个或多个练字格;去重子模块,用于根据所述位置信息对重叠程度满足预设条件的检测练字格进行删除,并将保留下来的检测练字格确定为所述目标图像中存在的练字格。
- 一种非瞬态的计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现权利要求1-6中任一项所述方法的步骤。
- 一种电子设备,包括:存储装置,其上存储有计算机程序;处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-6中任一项所述方法的步骤。
- 一种计算机程序,该程序被处理装置执行时实现权利要求1-6中任一项所述方法的步骤。
- 一种计算机程序产品,其上存储有计算机程序,该程序被处理装置执行时实现权利要求1-6中任一项所述方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110341076.0A CN113033539B (zh) | 2021-03-30 | 2021-03-30 | 练字格检测方法、装置、可读介质及电子设备 |
CN202110341076.0 | 2021-03-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022206404A1 true WO2022206404A1 (zh) | 2022-10-06 |
Family
ID=76453038
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/081444 WO2022206404A1 (zh) | 2021-03-30 | 2022-03-17 | 练字格检测方法、装置、可读介质及电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113033539B (zh) |
WO (1) | WO2022206404A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033539B (zh) * | 2021-03-30 | 2022-12-06 | 北京有竹居网络技术有限公司 | 练字格检测方法、装置、可读介质及电子设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111242088A (zh) * | 2020-01-22 | 2020-06-05 | 上海商汤临港智能科技有限公司 | 一种目标检测方法、装置、电子设备及存储介质 |
CN111310760A (zh) * | 2020-02-13 | 2020-06-19 | 辽宁师范大学 | 结合局部先验特征和深度卷积特征的甲骨刻辞文字检测方法 |
CN112396032A (zh) * | 2020-12-03 | 2021-02-23 | 北京有竹居网络技术有限公司 | 书写检测方法、装置、存储介质及电子设备 |
CN113033539A (zh) * | 2021-03-30 | 2021-06-25 | 北京有竹居网络技术有限公司 | 练字格检测方法、装置、可读介质及电子设备 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106339104B (zh) * | 2016-08-24 | 2019-02-15 | 广州市香港科大霍英东研究院 | 智能手表的文本输入方法及装置 |
CN109189314B (zh) * | 2018-08-13 | 2022-01-21 | 广东小天才科技有限公司 | 手写设备的书写引导方法、装置、设备及介质 |
CN111079638A (zh) * | 2019-12-13 | 2020-04-28 | 河北爱尔工业互联网科技有限公司 | 基于卷积神经网络的目标检测模型训练方法、设备和介质 |
CN111540253A (zh) * | 2020-01-07 | 2020-08-14 | 上海奇初教育科技有限公司 | 智能硬笔书法练习系统及评分方法 |
CN111554149B (zh) * | 2020-05-15 | 2022-04-29 | 黑龙江德亚文化传媒有限公司 | 一种用于字帖评分的系统及方法 |
-
2021
- 2021-03-30 CN CN202110341076.0A patent/CN113033539B/zh active Active
-
2022
- 2022-03-17 WO PCT/CN2022/081444 patent/WO2022206404A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111242088A (zh) * | 2020-01-22 | 2020-06-05 | 上海商汤临港智能科技有限公司 | 一种目标检测方法、装置、电子设备及存储介质 |
CN111310760A (zh) * | 2020-02-13 | 2020-06-19 | 辽宁师范大学 | 结合局部先验特征和深度卷积特征的甲骨刻辞文字检测方法 |
CN112396032A (zh) * | 2020-12-03 | 2021-02-23 | 北京有竹居网络技术有限公司 | 书写检测方法、装置、存储介质及电子设备 |
CN113033539A (zh) * | 2021-03-30 | 2021-06-25 | 北京有竹居网络技术有限公司 | 练字格检测方法、装置、可读介质及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113033539A (zh) | 2021-06-25 |
CN113033539B (zh) | 2022-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111340131B (zh) | 图像的标注方法、装置、可读介质和电子设备 | |
US20230394671A1 (en) | Image segmentation method and apparatus, and device, and storage medium | |
CN111476309A (zh) | 图像处理方法、模型训练方法、装置、设备及可读介质 | |
CN112712069B (zh) | 一种判题方法、装置、电子设备及存储介质 | |
CN112883968B (zh) | 图像字符识别方法、装置、介质及电子设备 | |
CN113033682B (zh) | 视频分类方法、装置、可读介质、电子设备 | |
CN112364829B (zh) | 一种人脸识别方法、装置、设备及存储介质 | |
WO2023143016A1 (zh) | 特征提取模型的生成方法、图像特征提取方法和装置 | |
CN113140012B (zh) | 图像处理方法、装置、介质及电子设备 | |
WO2023142914A1 (zh) | 日期识别方法、装置、可读介质及电子设备 | |
WO2023078070A1 (zh) | 一种字符识别方法、装置、设备、介质及产品 | |
CN111738316B (zh) | 零样本学习的图像分类方法、装置及电子设备 | |
CN114463768A (zh) | 表格识别方法、装置、可读介质和电子设备 | |
CN112232341A (zh) | 文本检测方法、电子设备及计算机可读介质 | |
CN112712036A (zh) | 交通标志识别方法、装置、电子设备及计算机存储介质 | |
WO2022206404A1 (zh) | 练字格检测方法、装置、可读介质及电子设备 | |
CN112396032A (zh) | 书写检测方法、装置、存储介质及电子设备 | |
CN113191251B (zh) | 一种笔顺检测方法、装置、电子设备和存储介质 | |
WO2023130925A1 (zh) | 字体识别方法、装置、可读介质及电子设备 | |
CN110674813A (zh) | 汉字识别方法、装置、计算机可读介质及电子设备 | |
CN115937888A (zh) | 文档比对方法、装置、设备和介质 | |
CN114612909A (zh) | 字符识别方法、装置、可读介质及电子设备 | |
CN113128470B (zh) | 笔划识别方法、装置、可读介质及电子设备 | |
WO2022052889A1 (zh) | 图像识别方法、装置、电子设备和计算机可读介质 | |
WO2022089196A1 (zh) | 一种图像处理方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22778587 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22778587 Country of ref document: EP Kind code of ref document: A1 |