CN112115933A - Character recognition method, device and storage medium - Google Patents

Character recognition method, device and storage medium Download PDF

Info

Publication number
CN112115933A
CN112115933A CN202010864604.6A CN202010864604A CN112115933A CN 112115933 A CN112115933 A CN 112115933A CN 202010864604 A CN202010864604 A CN 202010864604A CN 112115933 A CN112115933 A CN 112115933A
Authority
CN
China
Prior art keywords
character
target
data structure
characters
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010864604.6A
Other languages
Chinese (zh)
Inventor
刘滨
旷黎明
林大
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Changzhou Weiyizhi Technology Co Ltd
Original Assignee
Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Changzhou Weiyizhi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Weiyi Intelligent Manufacturing Technology Co ltd, Changzhou Weiyizhi Technology Co Ltd filed Critical Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Priority to CN202010864604.6A priority Critical patent/CN112115933A/en
Publication of CN112115933A publication Critical patent/CN112115933A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a character recognition method, which comprises the following steps: acquiring target characters and establishing at least one target character library; constructing a target data structure of each target character library based on each target character library and a preset data processing structure; acquiring characters to be processed; according to the calling relation between the micro-service and the target data structure, the character to Be processed is identified, and an identification result is obtained, and the purpose is to ensure that the collection of big data in the field of industrial Internet of things is standardized and legal, the memory space is saved, and the QPS is efficient.

Description

Character recognition method, device and storage medium
Technical Field
The invention relates to the technical field of character processing of industrial Internet, in particular to a method and a device for recognizing a character and a storage medium.
Background
Industrial internet is a result of the convergence of global industrial systems with advanced computing, analytics, sensing technologies and internet connectivity. The equipment, production lines, factories, suppliers, products and customers can be tightly connected and fused through an open and global industrial-level network platform, various element resources in industrial economy are efficiently shared, and the manufacturing industry is helped to prolong the industrial chain. And there may be illegal characters in various element resources, and the illegal characters may be characters that need to be recognized in the test data, so as to avoid problems in the test data or recognize problems in the test process in time.
At present, a commonly used character recognition algorithm is to pack characters into a traditional packing form, such as a jar packet form, because doing so would cause each service that needs to perform illegal character filtering to load a thesaurus of illegal characters, for example, 10 services integrate the jar packet, if the capacity of the illegal thesaurus is 1G, there is a waste of 9G memory, which shows that the illegal character filtering method in the prior art would cause the memory to be occupied and reduce the filtering efficiency.
Disclosure of Invention
The invention aims to provide a character recognition method, a character recognition device and a storage medium, and aims to ensure that the collection of big data in the field of industrial Internet of things is standardized and legal, the memory space is saved, and the efficient QPS is realized; the method and the device avoid the situation that each micro service in the micro service architecture in the existing industrial scene needs to load a mass word stock, thereby saving a large amount of memory space and improving the usability of the service.
In order to achieve the above object, there is provided a character recognition method including:
acquiring target characters and establishing at least one target character library;
constructing a target data structure of each target character library based on each target character library and a preset data processing structure;
acquiring characters to be processed;
and identifying the character to be processed according to the calling relationship between the micro service and the target data structure, and acquiring an identification result.
In one implementation, the step of obtaining the target character and building at least one target character library includes:
obtaining an illegal character, wherein the illegal character is a preset character;
determining the illegal character as a target character;
forming the target characters into a target character library;
and loading the data corresponding to the target word bank into a memory for data processing.
In one implementation, the step of constructing a target data structure of each target character library based on each target character library and a preset data processing structure includes:
determining a preset data processing structure as a Be _ Tree data structure;
and constructing each target character library into a Tree-shaped data structure according to the Be _ Tree data structure.
In one implementation manner, the step of identifying the character to be processed according to the call relationship between the microservice and the target data structure and obtaining the identification result includes:
calling the tree-shaped data structure based on the data processing memory;
filtering the character to be processed to obtain a character filtering result;
judging whether the character filtering result contains the same character as the character to be processed;
if so, confirming that the character to be processed contains an illegal character.
In one implementation manner, the step of filtering the character to be processed to obtain a character filtering result includes:
when the characters to be processed are a plurality of characters, acquiring a first character in the characters to be processed;
filtering based on the first character, and obtaining a character filtering result;
if the character filtering result contains the character which is the same as the first character, acquiring a second character in the characters to be processed;
based on the position of the first character in the tree data structure, filtering the second character, and acquiring a filtering result;
wherein the first character is in an order prior to the second character.
In one implementation, the step of constructing each target character library into a tree data structure includes:
and constructing each target character library into a tree data structure by adopting hashmap.
In one implementation, the method further comprises:
acquiring a row position of a first character in the tree data structure;
judging whether the word is the last word of the row position of the tree-shaped data structure;
if not, setting the position of the character in the tree structure as a first flag bit;
otherwise, setting the position of the character as a second flag bit in the tree structure.
An implementation, the method further comprising:
if the bit is the second zone bit, ending the line search;
otherwise, the search for the line is continued based on the second character.
In addition, the invention also discloses a character recognition device, which comprises a processor and a memory connected with the processor through a communication bus; wherein the content of the first and second substances,
the memory is used for storing a character recognition program;
the processor is configured to execute the character recognition program to implement any of the character recognition steps.
And a storage device, such as a computer storage device, having one or more programs stored thereon, the one or more programs being executable by one or more processors to cause the one or more processors to perform any of the character recognition steps.
The character recognition method provided by the embodiment of the invention has the following beneficial effects:
(1) by adopting the pre-loading of the word stock and the Be-Tree algorithm, the problem of how to search for quick response in a massive word stock after information is input in an industrial scene is solved.
(2) The service which needs illegal character filtering is called by adopting the independent building service in the form of a feign interface. The method and the device avoid the situation that each micro service in the micro service architecture needs to load a mass lexicon in an industrial scene, thereby saving a large amount of memory space and improving the usability of the service.
(3) By adopting a user-defined annotation mode, the method is convenient to integrate into the service which needs illegal word filtering.
(4) The invention aims to ensure that the collection of big data in the field of industrial Internet of things is standardized and legal, saves memory space and has high efficient QPS.
Drawings
Fig. 1 is a flow chart of a character recognition method according to an embodiment of the present invention.
Fig. 2 is a specific embodiment of the character recognition method according to the embodiment of the present invention.
Fig. 3 is another embodiment of the character recognition method according to the present invention.
Fig. 4 is a diagram illustrating another embodiment of a character recognition method according to the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Please refer to fig. 1-4. It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
The invention provides a character recognition method as shown in fig. 1, which comprises the following steps:
s101, acquiring target characters and establishing at least one target character library.
It should be noted that, in an industrial scenario, the word stock of illegal characters is massive. Therefore, the illegal characters can be collected according to past experience, specifically, the illegal characters can be classified and grouped according to actual requirements, and the classified characters are used as target characters to obtain a target character library.
In an implementation manner of the present invention, the step of obtaining the target character and building at least one target character library includes:
s1011, obtaining an illegal character, wherein the illegal character is a preset character.
It will be appreciated that the user may have characters collected in advance as illegal characters, for example, in an illegal character database.
And S1012, determining the illegal character as a target character.
It should be noted that the target character is a basis for character recognition or filtering, and since the target character is an illegal character, when other characters are obtained and are the same as the target character, the target character is regarded as the same illegal character, and therefore, the preset illegal character is used as the target character.
And S1013, forming the target characters into a target character library.
And then, the target characters form a target character library to form a whole, for example, one target character library corresponds to illegal data of a testing process or corresponds to a testing tool of a product, so that the testing can be smoothly carried out in the industrial Internet of things without replacing the illegal characters midway, and the illegal characters can be artificially incorporated into the target character library when the illegal characters are updated to form an updated target character library.
And S1014, loading the data corresponding to the target word bank into a memory for data processing.
In the embodiment of the invention, in order to further improve the running efficiency and the running speed of the target character library in service, the target character library is loaded into the memory, so that the reading efficiency can be improved, and the response time can be shortened. Therefore, the input item does not need to be searched in the database for whether the input item is illegal characters or not each time, the searching speed is improved, and the searching efficiency and the searching speed of the illegal characters are improved.
S102, constructing a target data structure of each target character library based on each target character library and a preset data processing structure.
In an implementation manner of the present invention, the step of constructing the target data structure of each target character library based on each target character library and a preset data processing structure includes: determining a preset data processing structure as a Be _ Tree data structure; and constructing each target character library into a Tree-shaped data structure according to the Be _ Tree data structure.
In the embodiment of the invention, the data is processed through a preset data processing structure, so that the subsequent data identification or filtering process is facilitated.
It should be noted that a data structure refers to a collection of data elements that have one or more specific relationships with each other. Typically, a carefully selected data structure can lead to greater operational or storage efficiency.
Therefore, the data element relation existing between the preset data processing structure and each character in the target character library is utilized, and the character is convenient to find.
S103, acquiring characters to be processed.
It is understood that the character to be processed is a character generated during the operation of the industrial internet system, so that it is confirmed whether an illegal character is generated by the process after the processing result of the character to be processed is generated.
And S104, identifying the character to be processed according to the calling relation between the micro service and the target data structure, and acquiring an identification result.
In one implementation manner, the step of identifying the character to be processed according to the call relationship between the microservice and the target data structure and obtaining the identification result includes:
calling the tree-shaped data structure based on the data processing memory; filtering the character to be processed to obtain a character filtering result; judging whether the character filtering result contains the same character as the character to be processed; if so, confirming that the character to be processed contains an illegal character.
In the embodiment of the invention, in an industrial scene, the function of filtering illegal characters is built into an independent service under a micro-service architecture, so that the memory space is saved, and the target character library is called through the micro-service, so that the target character library is started when needed, and the memory cannot be continuously occupied.
Therefore, the problems that in the prior art, every service needing illegal character filtering needs to load a word bank of illegal characters, for example, 10 services integrate the jar, and if the capacity of the illegal word bank is 1G, 9G of memory is wasted, and the operation efficiency is low are solved.
By applying the embodiment of the invention, the illegal character filtering service can be clustered, thereby coping with high QPS and improving the availability of the system. The thesaurus of illegal characters in the memory uses a data structure of Be _ Tree, and if the thesaurus uses a whole character string in java, the efficiency of whether an input item needs to Be searched is very low. The word stock is constructed into a tree-shaped data structure, so that the matching range of retrieval is greatly reduced when whether one word is an illegal word is judged.
In an implementation manner shown in fig. 3, the step of filtering the character to be processed to obtain a character filtering result includes:
when the characters to be processed are a plurality of characters, acquiring a first character in the characters to be processed;
filtering based on the first character, and obtaining a character filtering result; if the character filtering result contains the character which is the same as the first character, acquiring a second character in the characters to be processed; based on the position of the first character in the tree data structure, filtering the second character, and acquiring a filtering result; wherein the first character is in an order prior to the second character.
As shown in fig. 3, for example, to determine whether "teacher" is an illegal word, it can be confirmed that the tree that needs to be retrieved is the tree of fig. 3 according to the first character, and then the "old" character is obtained by further retrieval and recognition, and is determined in the first row, because of the relevance between the characters, if there are two adjacent characters of the teacher, the teacher is not the next character, so that only the first character of "old", that is, the first character, is retrieved first, and the second character can be determined at any time. Therefore, the data size of the retrieval can be reduced, and the retrieval efficiency can be improved.
It should be noted that, in an implementation manner, when the number of characters is 3, 4, or 5 or even more, the first character is not the first character in the sequence, but any other character, for example, when the number of characters is 3, it may be defined that the second character is the first character, and then the third character is the second character. In another implementation manner, the first character and the second character are mentioned in the embodiment of the present invention, and may actually be defined in sequence according to the number of characters, for example, when there are 3 characters, the first character, the second character, and the third character may be set in sequence, and when there are 4 characters, the first character, the second character, the third character, and the fourth character may be set in sequence, which may be implemented according to the embodiment shown in fig. 3, and details of the embodiment of the present invention are not described herein.
In one implementation, the step of constructing each target character library into a tree data structure includes:
and constructing each target character library into a tree data structure by adopting hashmap. Acquiring the row position of a first character in the tree data structure; judging whether the word is the last word of the row position of the tree-shaped data structure; if not, setting the flag bit as a first flag bit; otherwise, set to the second flag bit.
In the embodiment of the present invention, a HashMap is used to construct the tree structure, for example:
{ one ═ old ═ isEnd ═ 0, teacher ═ isEnd ═ 1} }, isEnd ═ 0}
And judging whether the character is the last character in the word. If the word indicates that the sensitive word is finished, setting the flag bit isEnd to 1, otherwise, setting the flag bit isEnd to 0.
Get ("old") is set hashMap by retrieving "teacher" if found in hashMap, indicating that there is an illegal word starting with "old".
In one implementation, the method further comprises:
if the bit is the second zone bit, ending the line search;
otherwise, the search for the line is continued based on the second character.
When hashMap is hash map, get, and since chessend is 1, it indicates that the search is finished.
To sum up, in fig. 4, firstly, the illegal word server forms step 1, that is, the illegal word library is loaded into the memory, and then the illegal character library is encapsulated into a data structure of B _ Tree, which is a key step in the embodiment of the present invention, and forms a basis for character recognition. At the user side, step 2, submitting information to the micro service through the user; then the micro service calls the word stock to verify the submitted information through the step 3, namely, submits the character to be processed, searches the illegal word in the B _ Tree at the illegal word service end (namely, the end which inherits the service of the illegal word stock), then returns the verification result to the micro service in the step 5, and the micro service executes the step 6: and judging whether the user request is submitted or rejected, returning a submission result to the user side, and ending the whole process.
In addition, the invention also discloses a character recognition device, which comprises a processor and a memory connected with the processor through a communication bus; wherein the content of the first and second substances,
the memory is used for storing a character recognition program;
the processor is configured to execute the character recognition program to implement any of the character recognition steps.
And a storage device, such as a computer storage device, having one or more programs stored thereon, the one or more programs being executable by one or more processors to cause the one or more processors to perform any of the character recognition steps.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A method of character recognition, the method comprising:
acquiring target characters and establishing at least one target character library;
constructing a target data structure of each target character library based on each target character library and a preset data processing structure;
acquiring characters to be processed;
and identifying the character to be processed according to the calling relationship between the micro service and the target data structure, and acquiring an identification result.
2. The method of claim 1, wherein the step of obtaining the target characters and building at least one target character library comprises:
obtaining an illegal character, wherein the illegal character is a preset character;
determining the illegal character as a target character;
forming the target characters into a target character library;
and loading the data corresponding to the target word bank into a memory for data processing.
3. The character recognition method of claim 2, wherein the step of constructing the target data structure of each target character library based on each target character library and the preset data processing structure comprises:
determining a preset data processing structure as a Be _ Tree data structure;
and constructing each target character library into a Tree-shaped data structure according to the Be _ Tree data structure.
4. The character recognition method according to claim 3, wherein the step of recognizing the character to be processed according to the calling relationship between the micro service and the target data structure and obtaining the recognition result comprises:
calling the tree-shaped data structure based on the data processing memory;
filtering the character to be processed to obtain a character filtering result;
judging whether the character filtering result contains the same character as the character to be processed;
if so, confirming that the character to be processed contains an illegal character.
5. The character recognition method according to claim 4, wherein the step of filtering the character to be processed to obtain a character filtering result comprises:
when the characters to be processed are a plurality of characters, acquiring a first character in the characters to be processed;
filtering based on the first character, and obtaining a character filtering result;
if the character filtering result contains the character which is the same as the first character, acquiring a second character in the characters to be processed;
based on the position of the first character in the tree data structure, filtering the second character, and acquiring a filtering result;
wherein the first character is in an order prior to the second character.
6. The character recognition method of claim 5, wherein the step of constructing each target character library into a tree-like data structure comprises:
and constructing each target character library into a tree data structure by adopting hashmap.
7. The character recognition method of claim 6, further comprising:
acquiring a row position of a first character in the tree data structure;
judging whether the character is the last character of the row position of the tree data structure;
if not, setting the position of the character in the tree structure as a first flag bit;
otherwise, setting the position of the character as a second flag bit in the tree structure.
8. The character recognition method of claim 6, further comprising:
if the flag bit is the second flag bit, ending the line search;
otherwise, the search for the line continues to be performed based on the second character.
9. A character recognition apparatus, comprising a processor, and a memory connected to the processor via a communication bus; wherein the content of the first and second substances,
the memory is used for storing a character recognition program;
the processor for executing the character recognition program to implement the character recognition steps of any one of claims 1 to 8.
10. A storage device, being a computer storage device, having one or more programs stored thereon, the one or more programs being executable by one or more processors to cause the one or more processors to perform the character recognition steps of any of claims 1 to 8.
CN202010864604.6A 2020-08-25 2020-08-25 Character recognition method, device and storage medium Pending CN112115933A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010864604.6A CN112115933A (en) 2020-08-25 2020-08-25 Character recognition method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010864604.6A CN112115933A (en) 2020-08-25 2020-08-25 Character recognition method, device and storage medium

Publications (1)

Publication Number Publication Date
CN112115933A true CN112115933A (en) 2020-12-22

Family

ID=73804380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010864604.6A Pending CN112115933A (en) 2020-08-25 2020-08-25 Character recognition method, device and storage medium

Country Status (1)

Country Link
CN (1) CN112115933A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108055158A (en) * 2017-12-19 2018-05-18 深圳供电局有限公司 A kind of power grid image identification system and method
CN109543764A (en) * 2018-11-28 2019-03-29 安徽省公共气象服务中心 A kind of warning information legitimacy detection method and detection system based on intelligent semantic perception
CN111274805A (en) * 2020-01-19 2020-06-12 上海众言网络科技有限公司 Method and device for processing suspected words

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108055158A (en) * 2017-12-19 2018-05-18 深圳供电局有限公司 A kind of power grid image identification system and method
CN109543764A (en) * 2018-11-28 2019-03-29 安徽省公共气象服务中心 A kind of warning information legitimacy detection method and detection system based on intelligent semantic perception
CN111274805A (en) * 2020-01-19 2020-06-12 上海众言网络科技有限公司 Method and device for processing suspected words

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MXDCCC: "这样判断字符串里是否有非法字符.?", 《HTTPS://BBS.CSDN.NET/TOPICS/90512909》 *
何水霞: "基于B-Tree索引和BerkeleyDB的中文词库的设计和实现", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *

Similar Documents

Publication Publication Date Title
US10452691B2 (en) Method and apparatus for generating search results using inverted index
CN109189991B (en) Duplicate video identification method, device, terminal and computer readable storage medium
KR102485179B1 (en) Method, device, electronic device and computer storage medium for determining description information
CN102156751B (en) Method and device for extracting video fingerprint
CN110134738B (en) Distributed storage system resource estimation method and device
CN110532347B (en) Log data processing method, device, equipment and storage medium
CN110674360B (en) Tracing method and system for data
CN109408507B (en) Multi-attribute data processing method, device, equipment and readable storage medium
CN111752955A (en) Data processing method, device, equipment and computer readable storage medium
CN111597243A (en) Data warehouse-based abstract data loading method and system
CN111596945B (en) Differential upgrading method for dynamic multi-partition firmware of embedded system
CN112070550A (en) Keyword determination method, device and equipment based on search platform and storage medium
CN109656947B (en) Data query method and device, computer equipment and storage medium
CN112115933A (en) Character recognition method, device and storage medium
CN113626558B (en) Intelligent recommendation-based field standardization method and system
CN111159213A (en) Data query method, device, system and storage medium
CN115203339A (en) Multi-data source integration method and device, computer equipment and storage medium
US20230138113A1 (en) System for retrieval of large datasets in cloud environments
CN114936269A (en) Document searching platform, searching method, device, electronic equipment and storage medium
CN111143203B (en) Machine learning method, privacy code determination method, device and electronic equipment
CN110851709B (en) Information pushing method and device, computer equipment and storage medium
CN112115125A (en) Database access object name resolution method and device and electronic equipment
CN104252486B (en) A kind of method and device of data processing
CN114356292A (en) Interactive information processing method and device and computer equipment
CN116896585A (en) NFT processing method, NFT specification platform and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201222