CN114627459A

CN114627459A - OCR recognition method, recognition device and recognition system

Info

Publication number: CN114627459A
Application number: CN202011468116.XA
Authority: CN
Inventors: 范剑刚; 曾旭; 许俊; 柳厦
Original assignee: Cainiao Smart Logistics Holding Ltd
Current assignee: Cainiao Smart Logistics Holding Ltd
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2022-06-14

Abstract

The application discloses an OCR recognition method, a recognition device and a recognition system. The OCR recognition method is used for selectively recognizing a picture in a recognition terminal or a recognition server, and comprises the following steps: extracting image parameters from the picture; judging whether the picture meets a preset rule or not by using the image parameters; when the picture is judged to accord with a preset rule, the picture is sent to an identification server for identification; and when the picture is judged not to accord with the preset rule, identifying the picture and obtaining picture information. In the OCR recognition method, the OCR recognition device and the OCR recognition system, the picture is sent to the recognition server or is recognized by the local recognition terminal by using the preset rule, so that the time delay of picture recognition is reduced, and the recognition efficiency and accuracy are ensured.

Description

OCR recognition method, recognition device and recognition system

Technical Field

The present application relates to the field of logistics, and in particular, to an OCR recognition method, a recognition apparatus, and a recognition system.

Background

OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method.

In some identification scenes, for example, in OCR identification in a logistics scene, in the process of identifying information of an express parcel bill, an identification terminal with weak computing capability is used for identification, and the problems that identification cannot be achieved or the identification rate is low may occur; the server with strong computing power is used for identification, so that the problems of long time consumption and occupation of server resources exist, and no better solution exists at present.

In addition, training of the recognition model for outputting the machine learning model often requires a large amount of manpower to perform manual calibration data, resulting in no motivation to continuously perfect subsequent model optimization. This is also one of the existing problems.

Disclosure of Invention

In view of the foregoing problems, an embodiment of the present application provides an OCR recognition method, a recognition apparatus, and a recognition system to solve the problems in the prior art.

In order to solve the above problem, an embodiment of the present application discloses an OCR recognition method for selectively recognizing a picture in a recognition terminal or a recognition server, the recognition method including:

extracting image parameters from the picture;

judging whether the picture meets a preset rule or not by using the image parameters;

when the picture is judged to accord with a preset rule, the picture is sent to an identification server for identification;

and when the picture is judged not to accord with the preset rule, identifying the picture and obtaining picture information.

In order to solve the above problem, an embodiment of the present application further discloses an electronic device, including:

a memory for storing a computer readable program;

a processor, when the processor reads the computer readable program in the memory, the electronic device performs the following operations:

extracting image parameters from the picture;

In order to solve the above problem, an embodiment of the present application discloses an OCR recognition apparatus, including:

the image parameter extraction module is used for extracting image parameters from the picture;

the judging module is used for judging whether the picture meets a preset rule or not by using the image parameters;

the sending module is used for sending the picture to an identification server for identification when the picture is judged to accord with a preset rule;

and the identification module is used for identifying the picture and acquiring picture information when the picture is judged not to accord with the preset rule.

An embodiment of the present application further discloses a terminal device, including:

one or more processors; and

one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the terminal device to perform the above-described methods.

One embodiment of the present application also discloses one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause a terminal device to perform the above-described method.

As can be seen from the above, the embodiments of the present application include the following advantages:

in the scheme provided by the embodiment of the application, the picture is sent to the recognition server or the recognition terminal left at the local part is recognized by utilizing the preset rule, so that the time delay of picture recognition is reduced, and the recognition efficiency and accuracy are ensured.

In the solution provided in this optional embodiment of the present application, the recognition server may include a cloud server and an edge server. Through presetting the rule, the recognition terminal can send the picture to one of the cloud server and the edge server or stay at the recognition terminal for recognition, and the mode fully utilizes the characteristics of no time delay of the recognition terminal, low time delay and high computing power of the edge server and high computing power of the cloud server.

In the scheme that this application optional embodiment provided, the recognition server can include high in the clouds server and edge server, and through predetermineeing the rule, recognition terminal can send the picture to high in the clouds server, one of edge server or stay at recognition terminal and discern, and this kind of mode makes full use of recognition terminal does not have the time delay, and edge server is low time delay, high computing power, the high characteristics of computing power of high in the clouds server.

In addition, in the scheme provided by the optional embodiment of the application, the cloud server, the edge server and the recognition terminal can intelligently deposit recognized picture information, such as data of the logistics surface list, and according to the intelligently deposited logistics surface list picture, the recognized picture information, such as a mobile phone number, and a high-confidence-degree sample of the mobile phone number, a core component, namely an inference engine, which is recognized by the cloud model training server can be directly trained and optimized. And the trained and optimized inference engine can be issued to different platforms according to different models of the cloud server, the edge server and the recognition terminal to form a closed-loop self-learning logistics surface single recognition framework. Thereby greatly reducing the cost of manual marking and promoting lean and intelligent production.

The recognition capability of the deep learning model is improved by adjusting the model and continuously adding new data sets. For example, when the model is correctly identified, the picture and the identified mobile phone number can be obtained in a manual marking or machine marking mode; in conclusion, the scheme of the application can continuously obtain the correct identified data, the wrong identified data and the corrected data of the user, so that the model can be continuously corrected according to the data, and the accuracy of the output model is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following descriptions are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a block diagram of an OCR recognition system according to an embodiment of the present application.

FIG. 2 is a block diagram of an OCR recognition system according to a second embodiment of the present application.

Fig. 3 is a flowchart of an OCR recognition method according to a first embodiment of the present application.

Fig. 4 is a flowchart of an OCR recognition method according to a second embodiment of the present application.

Fig. 5 schematically shows a block diagram of a terminal device for performing a method according to the present application; and

fig. 6 schematically shows a storage unit for holding or carrying program code implementing a method according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived from the embodiments given herein by a person of ordinary skill in the art are intended to be within the scope of the present disclosure.

One of the core ideas of the application is to provide an OCR recognition method, device and system. And sending the picture obtained by the identification terminal to an identification server for identification or leaving the picture in local identification according to a specific preset rule. The efficiency of OCR recognition for pictures is improved.

First embodiment

A first embodiment of the present application provides an identification system, which includes an identification server and at least one identification terminal.

As shown in fig. 1, the identification system is a schematic diagram of the identification system, as shown in fig. 1, the identification server may be a cloud server or an edge server, and the identification terminal 20 may be an identification terminal, for example. The cloud server may be a separate server for performing OCR recognition, or a server integrating other functions. An edge server is a server that is arranged on a node at the edge of the network outside the central server, using the principles of edge computing. The edge calculation is a technology for optimizing an application program or a cloud computing system, and is to put data processing and application program operation, even the realization of some functional services, on a node at the edge of a network from a central server and realize calculation on an edge server.

The recognition terminal 20 is, for example, a device installed at a terminal of a user, or a device such as a mobile phone, a PDA, a tablet computer, a target gun, etc., and is used directly by the user to recognize a picture.

The picture can be a face sheet of a package in a logistics scene, and the like. The information recorded on the invoice for the package may include the recipient's name, the logistics order number, the recipient's address, the recipient's phone, etc.

In a logistics scenario, high speed identification is required. The amount of identification at a time is large and cannot be made wrong. If the local identification cannot be quickly identified or cannot be accurately identified, the identification needs to be uploaded to an identification server or an edge server at the cloud end for identification.

The recognition terminal 20 may include, for example, an image parameter extraction module 21, a judgment module 22, a transmission module 23, and a recognition module 24.

The image parameter extraction module 21 is used for extracting image parameters from the picture; the judging module 22 is configured to judge whether the picture meets a preset rule by using the image parameter; the sending module 23 is configured to send the picture to an identification server for identification when it is determined that the picture meets a preset rule; the identification module 24 is configured to identify the picture and obtain picture information when it is determined that the picture does not comply with the preset rule.

In some embodiments, the identification terminal 20 may further include an image acquisition module 25. The image acquisition module 25 may be a camera configured by the terminal, or a module for acquiring image information acquired by the camera configured by the terminal. For example, after the camera captures an image and converts the image into a digital signal, the image capturing module 25 obtains and transmits the digital signal to the image parameter extracting module 21.

The image parameter extraction module 21 may extract image parameters from the picture. The picture taken by the identification terminal is, for example, a data set, and the data set contains RGB pixel values of each pixel point. The image parameter extraction module can identify and acquire information such as bar code information, inclination angle and the like in the picture.

The image parameter extraction module identifies a barcode region of the picture, for example, the barcode region can be determined by pixels. Compared with other areas, the bar code area is in black-white-black-white ordered colors. The image parameter extraction module can obtain the position of the bar code area of the picture, the number represented by the bar code and other information. After the number represented by the bar code is determined, the content represented by the number, such as an order number, a logistics list number and the like, can be determined according to preset rules. The aforementioned barcode information may include: 1. size of barcode: the barcode accounts for the picture size threshold.

In some embodiments, the barcode with a smaller ratio indicates that the shot mobile phone number and the like are also smaller, and local identification is more difficult, so that the barcode can be uploaded to a cloud server or an edge server for identification.

The image parameter extraction module identifies the inclination angle of the picture, for example, the inclination angle of the picture relative to a horizontal coordinate axis and a numerical coordinate axis can be determined by obtaining the size of an included angle between a reference line of picture shooting and the edge of the picture.

The inclination angle of the picture may include an inclination angle of the picture with respect to three coordinate axes, i.e., x, y, and z, or an inclination angle with respect to each of the xy plane, the yz plane, and the xz plane.

The picture inclination angle can contain three numerical values, is respectively great for three axle or three plane picture inclination angle, and the picture can be cut and lead to unable discernment in the rotatory in-process of distolateral side, and the high in the clouds discernment need be uploaded to in step.

The judgment module determines whether the picture is left in a local identification terminal for identification processing or uploaded to a cloud end or an edge identification server for processing. The judging module judges whether the picture meets the preset rule or not according to the preset rule and the image parameter, and when the picture meets the preset rule, the picture is sent to the recognition server to be recognized; and when the preset rule is not met, identifying the picture and obtaining picture information.

Generally speaking, a machine learning model with stronger recognition capability can be set in the cloud. In the identification terminal, the computing processing capacity of the terminal is weaker than that of the cloud terminal due to the running speed, capacity and the like of hardware. The edge servers are computing nodes around the central node, and the computing processing capacity of the edge servers is generally weaker than that of the cloud identification server and stronger than that of the terminal.

In one embodiment, the recognition server and the recognition terminal are both obtained by training the same model. The OCR recognition system of the present application may further include a model training server 30.

After the recognition server 10 or the recognition terminal 20 performs recognition, the recognized data is uploaded as training data of a cloud machine learning model. The training data is used by the model training server 30 to train the machine learning model. After recognition, either at the recognition server 10 or the recognition terminal 20, the data may be uploaded to a database for training data deposition, as long as the data is deemed to be correct after recognition. That is, both the recognition server 10 and the recognition terminal 20 can provide sample data as history data for training the machine learning model. Each sample data may include a data set (e.g., a digitized picture) and a label (e.g., picture information identified for some or all of the picture, including name, phone, address information for the logistics menu). If the identified data is considered to be erroneous, both the correct data and the erroneous data, which are manually acquired or extracted from the database, can be uploaded for training the machine learning model. For example, when the model is correctly identified, the picture and the identified mobile phone number can be obtained in a manual marking or machine marking mode; and when the model identification is wrong, obtaining the picture, the identified wrong mobile phone number and the mobile phone number corrected by the user operation.

In some embodiments, the recognition server 10 and the model training server 30 may be disposed together or disposed separately. When placed together. One part of the server in the cloud is used for training the model, and the other part of the server in the cloud is used for OCR recognition.

In other embodiments, the number of the model training servers 30 may be three, for example, the model training servers 30 for the cloud server 11, the edge server 12, and the recognition terminal 20, respectively. The three model training servers 30 described above can derive different recognition models for the cloud server 11, the edge server 12, and the recognition terminal 20.

In the solution provided in this optional embodiment, the recognition server 10 may include a cloud server and an edge server. Through presetting the rule, the recognition terminal can send the picture to one of the cloud server and the edge server or stay at the recognition terminal for recognition, and the mode fully utilizes the characteristics of no time delay of the recognition terminal, low time delay and high computing power of the edge server and high computing power of the cloud server.

In one embodiment, the identification terminal 20 may be integrated into the barcode scanning gun or may be separate from the barcode scanning gun. For example, the identification terminal 20 is a terminal dedicated to identifying pictures, and is disposed separately from a barcode scanning gun or a mobile phone for collecting pictures, and the identification terminal 20 may be disposed on a shelf, for example.

The identification terminal can include thing networking device, thing networking device passes through thing networking protocol and is connected with picture acquisition devices such as bar code scanning rifle.

In addition, in the scheme provided by the embodiment of the application, the cloud server, the edge server and the recognition terminal can intelligently deposit recognized picture information, such as data of the logistics surface list, and according to the intelligently deposited logistics surface list picture, the recognized picture information, such as a mobile phone number, and a high-confidence-degree sample of the mobile phone number, a core component, namely an inference engine, which is recognized by the cloud model training server can be directly trained and optimized. And the trained and optimized inference engine can be issued to different platforms according to different models of the cloud server, the edge server and the recognition terminal to form a closed-loop self-learning logistics surface single recognition framework. Thereby greatly reducing the cost of manual marking and promoting lean and intelligent production.

Second embodiment

Fig. 2 is a schematic diagram of a second embodiment of the present application. In this embodiment, the recognition server 10 includes a cloud server 11 and an edge server 12.

The recognition terminal 20 includes an image parameter extraction module 21, a judgment module 22, a transmission module 23, and a recognition module 24, similarly to the first embodiment. The image parameter extraction module 21 is used for extracting image parameters from the picture; the judging module 22 is configured to judge whether the picture meets a preset rule by using the image parameter; the sending module 23 is configured to send the picture to an identification server for identification when it is determined that the picture meets a preset rule; the identification module 24 is configured to identify the picture and obtain picture information when it is determined that the picture does not comply with the preset rule.

The difference between the present embodiment and the previous embodiment is that, in the present embodiment, the recognition server 10 includes a cloud server 21 and an edge server 22, both of which can implement cloud computing. The cloud server may be a separate server for performing OCR recognition, or a server integrating other functions. An edge server is a server that is arranged on a node at the edge of the network outside the central server, using the principles of edge computing. The edge calculation is a technology for optimizing an application program or a cloud computing system, and is to put data processing and application program operation, even the realization of some functional services, on a node at the edge of a network from a central server and realize calculation on an edge server.

After the determining module 22 of the identification terminal 20 determines the destination of the picture to be sent, it can select between the cloud server 11, the edge server 12 and the identification terminal 20.

In addition, one or more of the cloud server 11, the edge server 12 and the recognition terminal 20 may upload sample data that is correctly recognized or incorrectly recognized to a cloud database, and train an OCR recognition model in the cloud server. After a large amount of data are utilized to train the cloud OCR recognition models, three models can be respectively derived, wherein the three models correspond to a cloud server, a recognition terminal and an edge server. The model structures of the three parts can be the same or similar, or can be different. Or two models applied to the recognition server and the recognition terminal may be derived separately.

The capabilities of the cloud server 11, the edge server 12, and the recognition terminal 20 in computing processing may be different. In contrast, the time delay required for uploading the picture from the identification terminal 20 to the edge server 12 or the cloud server 11 and returning the picture to the edge server 12 or the cloud server 11 is also different.

In some embodiments, the preset rule may be one of the following conditions, and the identification terminal is transferred to the cloud server 11 or the edge server 12 for identification:

1. size of barcode: the bar code accounts for the size threshold of the picture, and the small accounts indicate that the shot mobile phone number and the like are small, so that local identification is difficult, and the bar code can be uploaded to the cloud server 11 or the edge server 12 for identification;

2. picture inclination angle: the inclination angle of the picture is large, the picture can be cut in the process of rotating at the end side, so that the picture cannot be identified, and the picture needs to be uploaded to the cloud server 11 or the edge server 12 for identification in synchronization.

In other embodiments, the predetermined rule is, for example, a pre-estimated delay time required for the current picture to be identified at the end server 11, the edge server 12, and the identification terminal 20. For example, when the current picture has the problems of small barcode ratio, large picture inclination angle, low pixel, etc., the processing time of the identification terminal 20 will be longer, and when it is determined that the estimated length of the identification delay is greater than the corresponding delay threshold, the estimated length of the identification delay may be sent from the identification terminal 20 to the cloud server 11 or the edge server 12 for processing. The judgment of the identification delay can be completed by using image parameters through a special program, for example, the weighting processing can be performed by using the barcode proportion, the picture inclination angle and the picture pixels, and when the score after the picture weighting processing is greater than a specific score, the estimated identification delay is considered to exceed the corresponding delay threshold. And will not be described in detail herein.

In other embodiments, the predetermined rule is, for example, a pre-estimated accuracy rate of the current picture to be recognized at the end server 11, the edge server 12, and the recognition terminal 20. For example, when the current picture has the problems of small barcode ratio, large picture inclination angle, low pixel, etc., the recognition accuracy of the processing at the recognition terminal 20 may be reduced, and when the preset recognition accuracy is smaller than the preset accuracy threshold, the current picture may be sent from the recognition terminal 20 to the cloud server 11 or the edge server 12 for processing. The determination of the recognition accuracy may be performed by a special determination program or by the aforementioned recognition terminal 20 or a special determination server using image parameters, for example, the weighting processing may be performed by using a barcode ratio, a picture tilt angle, and picture pixels, and when the score after the picture weighting processing is greater than a corresponding threshold, the estimated recognition accuracy is considered to be less than the corresponding accuracy threshold. And will not be described in detail herein.

In other embodiments, the predetermined rule is, for example, a pre-estimated difficulty value of the current picture identification. The identification difficulty degree value of the picture is determined by the image parameters. The image parameters may include, for example, the following parameters in addition to the aforementioned barcode ratio and the picture tilt angle: at least one of position information of the picture, text information in the picture, and pixel values of the picture.

For example, according to the past data, when the error rate determined when the position information of the picture shows that the picture is collected from a specific warehouse is greater than the average value, the picture from the warehouse is considered to be not easily recognized, and the picture can be directly sent to the cloud server 11 or the edge server 12 for processing. Therefore, the possible recognition difficulty degree of the picture can be determined according to the position information of the picture; or, according to the past data, when the text information in the picture is determined to be in other languages (e.g., languages other than chinese and english) that are not commonly used, or contains specific contents (e.g., special symbols, etc.), the picture containing the contents is considered to be not easily recognized, and may be directly sent to the cloud server 11 or the edge server 12 for processing; or, according to the past data, when the pixel value in the picture is lower than the preset pixel value, the picture including the contents is considered to be not easy to identify, and the picture can be directly sent to the cloud server 11 or the edge server 12 for processing.

Therefore, the possible recognition difficulty degree of the picture can be determined according to at least one of the position information of the picture, the text information in the picture and the pixel value of the picture; when the difficulty value is greater than the corresponding threshold, the difficulty value may be sent from the identification terminal 20 to the cloud server 11 or the edge server 12 for processing.

The preset rule can be set as one or more of the size of the barcode, the inclination angle of the image, the identification delay and the identification accuracy, and the rule which needs to be sent to the cloud end, the edge or a local identification terminal for processing is determined by all or part of the parameters.

In particular, in a logistics scenario, the computing power of the edge server 12 is gradually expanded, and the network latency is small. The preset rules may take into account the computing power of the edge server.

In addition, in the scheme provided by the embodiment of the application, the cloud server, the edge server and the recognition terminal can intelligently deposit recognized picture information, such as data of the logistics list, and according to the intelligently deposited logistics list picture, recognized picture information, such as a mobile phone number, and a high-confidence-degree sample of the mobile phone number, a core component, namely an inference engine, for training and optimizing the recognition of the logistics list can be directly carried out on the model training server at the cloud. And the trained and optimized inference engine can be issued to different platforms according to different models of the cloud server, the edge server and the recognition terminal to form a closed-loop self-learning logistics surface single recognition framework. Thereby greatly reducing the cost of manual marking and promoting lean and intelligent production.

Third embodiment

The third embodiment of the present application provides an OCR recognition method. Fig. 3 is a flowchart illustrating steps of an OCR recognition method according to a third embodiment of the present application. As shown in fig. 3, the OCR recognition method according to the embodiment of the present application is used for selectively recognizing a picture in a recognition terminal or a recognition server, and specifically includes the following steps:

s101, extracting image parameters from a picture;

in step S101, the recognition terminal 20, such as a mobile phone, a PDA, a tablet computer, a target gun, or other devices, acquires a picture, and processes the picture to obtain image parameters.

The picture can be a face sheet of a package in a logistics scene, and the like. The information set forth on the order for the package may include one or more of the recipient's name, a logistics order number, an order number, a recipient address, a recipient phone number, etc.

The recognition terminal 20 may extract image parameters from the picture. The picture taken by the identification terminal 20 is, for example, a data set, and the data set includes RGB pixel values of each pixel point. The image parameter extraction module can identify and acquire information such as bar code information, inclination angle and the like in the picture,

The identification terminal 20 identifies the barcode region of the picture, which may be determined by pixels, for example. Compared with other areas, the bar code area is in black-white-black-white ordered colors. The image parameter extraction module can obtain the position of the bar code area of the picture, the number represented by the bar code and other information. After the number represented by the bar code is determined, the content represented by the number, such as an order number, a logistics list number and the like, can be determined according to preset rules. The aforementioned barcode information may include: 1. size of barcode: the barcode accounts for the picture size threshold.

The identification terminal 20 identifies the inclination angle of the picture, for example, the inclination angle of the picture with respect to the horizontal coordinate axis and the numerical coordinate axis may be determined by obtaining the size of the included angle between the reference line of the picture taking and the edge of the picture.

S102, judging whether the picture meets a preset rule or not by using the image parameters;

in this step, the recognition terminal 20 judges whether the picture conforms to the preset rule according to the preset rule and the image parameter, and when the picture conforms to the preset rule, the picture is sent to the recognition server for recognition; and when the preset rule is not met, identifying the picture and obtaining picture information.

Generally speaking, a machine learning model with stronger recognition capability can be set in the cloud. In the identification terminal, the computing processing capacity of the terminal is weaker than that of the cloud terminal due to the running speed, capacity and the like of hardware. The edge server is used as a computing node around the central node and bears part of computing processing work of the cloud.

2. picture inclination angle: the inclination angle of the picture is large, the picture can be cut in the process of rotating at the end side, so that the picture cannot be identified, and the picture needs to be uploaded to the cloud server 11 or the edge server 12 for identification synchronously.

In other embodiments, the predetermined rule is, for example, a pre-estimated time required for the current picture to be identified at the end server 11, the edge server 12, and the identification terminal 20. For example, when the barcode ratio of the current picture is too small and the inclination angle of the picture is large, the processing time of the identification terminal 20 will be longer, and when the required time length is longer than the specified time length, the time can be sent from the identification terminal 20 to the cloud server 11 or the edge server 12 for processing.

In other embodiments, the predetermined rule is, for example, a pre-estimated accuracy rate of the current picture to be recognized at the end server 11, the edge server 12, and the recognition terminal 20. For example, when the barcode ratio of the current picture is too small and the inclination angle of the picture is large, the recognition accuracy of the processing of the recognition terminal 20 may be reduced, and when the power is smaller than a preset standard value, the power may be sent from the recognition terminal 20 to the cloud server 11 or the edge server 12 for processing.

S103, when the picture is judged to accord with a preset rule, the picture is sent to an identification server for identification;

And when the picture is judged to accord with the preset rule, sending the picture to an identification server for identification.

And S104, when the picture is judged not to accord with the preset rule, identifying the picture and obtaining picture information.

In this step, when the preset rule is not satisfied, it is considered that the recognition device 20 can recognize the picture, and thus can remain at the local side for recognition.

As can be seen from the above description, the OCR recognition method provided in the first embodiment of the present application has at least the following technical effects:

in the scheme provided by the embodiment of the application, the picture is sent to the recognition server or the recognition terminal left in the local for recognition by using the preset rule, so that the time delay of picture recognition is reduced, and the recognition efficiency and accuracy are ensured.

In an optional embodiment of the present application, in step S104, that is, when it is determined that the picture does not conform to the preset rule, the step of identifying the picture and obtaining picture information may further include:

uploading the identified picture information to a machine learning model training server;

the machine learning model training server is used for training a machine learning model, outputting a first machine learning model after training to the recognition server, and outputting a second machine learning model after training to the recognition terminal.

respectively training a first machine learning model, a second machine learning model and a third machine learning model by using historical data;

and correspondingly sending the trained first machine learning model, the trained second machine learning model and the trained third machine learning model to a cloud server, an edge server and a recognition terminal.

In the scheme provided by the optional embodiment of the application, the cloud server, the edge server and the recognition terminal can intelligently deposit recognized picture information, such as data of the logistics list, and according to the intelligently deposited logistics list picture, the recognized picture information, such as a mobile phone number, and a high-confidence-degree sample of the mobile phone number, a core component, namely an inference engine, for training and optimizing the recognition of the logistics list can be directly performed on the model training server at the cloud. And the trained and optimized inference engine can be issued to different platforms according to different models of the cloud server, the edge server and the recognition terminal to form a closed-loop self-learning logistics surface single recognition framework. Thereby greatly reducing the cost of manual marking and promoting lean and intelligent production.

In an alternative embodiment of the present application, the image parameters include: and the size of the bar code in the picture information and the inclination angle of the picture. The preset rule may include: the ratio of the size of the bar code to the size of the picture is smaller than a first threshold value, or the inclination angle of the picture is larger than a second threshold value. Wherein the first threshold and the second threshold may be thresholds set by a developer.

In an optional embodiment of the present application, the step S102 of determining whether the picture conforms to a preset rule by using the image parameter may include:

determining the ratio of the size of the bar code of the picture to the size of the picture;

determining at least one inclination angle of the picture;

when the ratio of the size of the bar code to the size of the picture is smaller than a first threshold value, or the inclination angle of the picture is larger than a second threshold value, sending the picture information to the identification server for identification;

and when the ratio of the size of the bar code to the size of the picture is not less than a first threshold value and the inclination angle of the picture is greater than a second threshold value, identifying the picture information at the identification terminal.

The first threshold and the second threshold may be both specified by a developer to meet the usage requirement, and are not described herein again.

In the scheme provided by the optional embodiment of the application, the pictures to be identified are distributed by using the preset rule, so that the identification efficiency and accuracy are improved.

In an optional embodiment of the present application, the picture is a logistics surface sheet, and the logistics surface sheet includes a barcode region and a text information region.

In an optional embodiment of the present application, in step S103, after the step of sending the picture to an identification server for identification when it is determined that the picture meets a preset rule, the method may further include:

judging whether an edge server exists or not;

and when determining that the edge server exists, sending the picture to the edge server for identification.

And when determining that the edge server does not exist, sending the picture to the cloud server for identification.

In the scheme provided by the optional embodiment of the application, the identification server may include a cloud server and an edge server, and the identification terminal may determine whether to process the picture in an edge calculation manner by determining whether the edge server is deployed. That is, when the recognition terminal considers that the picture should be sent to the recognition server for processing, it may first determine whether an edge server is deployed. When the edge server which is selected is judged to exist besides the cloud server, the picture can be sent to the edge server nearby for processing according to a specific rule, the computing pressure of the cloud server is reduced, data processing and analysis are carried out in real time or faster, the data processing is closer to a source instead of an external data center or a cloud, and the delay time can be shortened. The image is sent to one of the cloud server and the edge server or left at the identification terminal for identification, and in some embodiments, if the edge server exists, the image is preferentially sent to the edge server for processing. The method fully utilizes the characteristics of no time delay of the identification terminal, low time delay and high computing power of the edge server and high computing power of the cloud server.

In an optional embodiment of the present application, the picture information includes: name, phone, address information. Before the step S101 of extracting image parameters from a picture, the method may further include:

and S100, collecting the picture.

In this step, the recognition terminal, such as a mobile phone, a tablet computer, a PDA, a target gun, etc., may take a picture of the recognition object and store the taken picture in its own storage space, or perform a process of extracting picture parameters therebetween. In addition, as another alternative embodiment, the identification terminal may not have the capability of taking a picture or the like, but may receive a picture transmitted from another device.

In summary, the scheme proposed by the present application has at least the following advantages:

In addition, in the scheme that this application optional embodiment provided, the recognition server can include high in the clouds server and edge server, through predetermineeing the rule, recognition terminal can send the picture to one of high in the clouds server, edge server or leave at recognition terminal and discern, and this kind of mode fully makes good for recognition terminal does not have the time delay, and edge server is low time delay, high computing power, the characteristics of high computing power of high in the clouds server.

In addition, in the scheme provided by the optional embodiment of the application, the cloud server, the edge server and the recognition terminal can perform intelligent precipitation on the recognized picture information, for example, data of the logistics list, and according to the intelligent precipitation of the logistics list picture, the recognized picture information, for example, a mobile phone number, and a high-confidence sample of the mobile phone number, a core component, namely an inference engine, for training and optimizing the recognition of the logistics list can be directly performed on the model training server at the cloud. And the trained and optimized inference engine can be issued to different platforms according to different models of the cloud server, the edge server and the recognition terminal to form a closed-loop self-learning logistics surface single recognition framework. Thereby greatly reducing the cost of manual marking and promoting lean and intelligent production.

Fourth embodiment

The fourth embodiment of the present application provides an OCR recognition method. Fig. 4 is a flowchart illustrating steps of an OCR recognition method according to a fourth embodiment of the present application. A scheme provided by a fourth embodiment of the present invention is applied to a recognition server corresponding to a recognition terminal, and as shown in fig. 4, the OCR recognition method according to the embodiment of the present invention is used for recognizing a picture at the recognition server, and specifically includes the following steps:

s201, receiving a picture which is sent by an identification terminal and accords with a preset rule;

s202, recognizing the picture by using a machine learning model to obtain picture information;

the image parameters corresponding to the pictures accord with preset rules, and the preset rules are used for determining whether the pictures are sent to the recognition server or not.

In an optional embodiment, the picture information includes: at least one of name, telephone, address information, and barcode number.

In an alternative embodiment, as shown in fig. 4, the method may further include:

s203, uploading the identified picture information to a machine learning model training server;

In the solution provided in this optional embodiment of the present application, the recognition server may include a cloud server and an edge server. Through presetting the rule, the recognition terminal can send the picture to one of high in the clouds server, edge server or stay at the recognition terminal and discern, and this kind of mode makes full use of the characteristics that the recognition terminal does not have the time delay, and edge server is low time delay, high computing power of high in the clouds server.

In the scheme provided by the optional embodiment of the application, the cloud server, the edge server and the recognition terminal can intelligently deposit recognized picture information, such as data of the logistics list, and according to the intelligently deposited logistics list picture, recognized picture information, such as a mobile phone number, and a high-confidence sample of the mobile phone number, a core component, namely an inference engine, which can be directly trained and recognized by the cloud model training server can be optimized. And the trained and optimized inference engine can be issued to different platforms according to different models of the cloud server, the edge server and the recognition terminal to form a closed-loop self-learning logistics surface single recognition framework. Thereby greatly reducing the cost of manual marking and promoting lean and intelligent production.

Fifth embodiment

The fifth embodiment of the present invention provides an OCR recognition apparatus, which, as shown in fig. 1, may include an image parameter extraction module 21, a determination module 22, a sending module 23, and a recognition module 24.

The image parameter extraction module 21 is configured to extract image parameters from the picture;

the judging module 22 is configured to judge whether the picture meets a preset rule by using the image parameter;

the sending module 23 is configured to send the picture to an identification server for identification when the picture is determined to meet a preset rule;

the identification module 24 is configured to identify the picture and obtain picture information when it is determined that the picture does not comply with the preset rule.

In an alternative embodiment the OCR recognition means further comprises: and the picture acquisition module is used for acquiring pictures.

Since the device embodiment is a virtual device embodiment corresponding to the method embodiment, it is not specifically described here. The relevant content may be as described with reference to the method embodiments.

Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown in fig. 5, the terminal device may include an input device 90, a processor 91, an output device 92, a memory 93, and at least one communication bus 94. The communication bus 94 is used to enable communication connections between the elements. The memory 93 may comprise a high speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, in which various programs may be stored in the memory 93 for performing various processing functions and implementing the method steps of the present embodiment.

Alternatively, the processor 91 may be implemented by, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the processor 91 is coupled to the input device 90 and the output device 92 through a wired or wireless connection.

Alternatively, the input device 90 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software-programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; optionally, the transceiver may be a radio frequency transceiver chip with a communication function, a baseband processing chip, a transceiver antenna, and the like. An audio input device such as a microphone may receive voice data. The output device 92 may include a display, a sound, or other output device.

In this embodiment, the processor of the terminal device includes a module for executing the functions of the modules of the data processing apparatus in each device, and specific functions and technical effects may refer to the foregoing embodiments, which are not described herein again.

Fig. 6 is a schematic diagram of a hardware structure of a terminal device according to another embodiment of the present application. FIG. 6 is a specific embodiment of the implementation of FIG. 5. As shown in fig. 6, the terminal device of the present embodiment includes a processor 101 and a memory 102.

The processor 101 executes the computer program codes stored in the memory 102 to implement the OCR recognition method of fig. 3 to 4 in the above embodiment.

The memory 102 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The memory 102 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Optionally, the processor 101 is provided in the processing assembly 100. The terminal device may further include: a communication component 103, a power component 104, a multimedia component 105, an audio component 106, an input/output interface 107 and/or a sensor component 108. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.

The processing component 100 generally controls the overall operation of the terminal device. The processing component 100 may include one or more processors 101 to execute instructions to perform all or part of the steps of the methods of fig. 3-4 described above. Further, the processing component 100 can include one or more modules that facilitate interaction between the processing component 100 and other components. For example, the processing component 100 may include a multimedia module to facilitate interaction between the multimedia component 105 and the processing component 100.

The power supply component 104 provides power to the various components of the terminal device. The power components 104 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the terminal device.

The multimedia component 105 includes a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The audio component 106 is configured to output and/or input audio signals. For example, the audio component 106 may include a Microphone (MIC) configured to receive external audio signals when the terminal device is in an operational mode, such as a voice recognition mode. The received audio signal may further be stored in the memory 102 or transmitted via the communication component 103. In some embodiments, the audio component 106 also includes a speaker for outputting audio signals.

The input/output interface 107 provides an interface between the processing component 100 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.

The sensor component 108 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 108 can detect the open/closed status of the terminal device, the relative positioning of the components, the presence or absence of user contact with the terminal device. The sensor assembly 108 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 108 may also include a camera or the like.

The communication component 103 is configured to facilitate wired or wireless communication between the terminal device and other devices. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card so that the terminal device can log onto a GPRS network to establish communication with the identification server via the internet.

From the above, the communication component 103, the audio component 106, the input/output interface 107 and the sensor component 108 involved in the embodiment of fig. 6 can be implemented as input devices in the embodiment of fig. 5.

An embodiment of the present application provides a terminal device, including: one or more processors; and one or more machine readable media having instructions stored thereon, which when executed by the one or more processors, cause the terminal device to perform a method of video summary generation as described in one or more of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The OCR recognition method, the recognition device and the recognition system provided by the present application are introduced in detail, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An OCR recognition method for selectively recognizing a picture in a recognition terminal or a recognition server, the recognition method comprising:

extracting image parameters from the picture;

2. An OCR recognition method according to claim 1, wherein after the step of recognizing the picture and obtaining picture information when it is determined that the picture does not comply with the preset rule, the method further comprises:

uploading the identified picture information as training data to a machine learning model training server;

3. An OCR recognition method according to claim 1, wherein said image parameters include: and the size of the bar code and the inclination angle of the picture in the picture information.

4. An OCR recognition method according to claim 3, wherein the preset rule includes: the ratio of the size of the bar code to the size of the picture is smaller than a first threshold value, or the inclination angle of the picture is larger than a second threshold value.

5. An OCR recognition method according to claim 1, wherein the step of determining whether the picture conforms to a preset rule by using the image parameter comprises:

determining the ratio of the size of a bar code contained in the picture to the size of the picture;

determining at least one inclination angle of the picture;

when the ratio of the size of the bar code to the size of the picture is smaller than a first threshold value or the inclination angle of the picture is larger than a second threshold value, sending the picture information to the identification server for identification;

6. An OCR recognition method as claimed in claim 1, wherein the picture is a logistics surface sheet comprising a barcode region and a text information region.

7. An OCR recognition method according to claim 1, wherein after the step of sending the picture to a recognition server for recognition when it is determined that the picture meets the preset rule, the method further comprises:

judging whether an edge server exists or not;

8. An OCR recognition method according to claim 7, wherein after the step of sending the picture to a recognition server for recognition when it is determined that the picture meets the preset rule, the method further comprises:

9. An OCR recognition method according to claim 1, wherein said picture information includes: name, phone, address information.

10. An OCR recognition method according to claim 1, wherein the preset rule includes at least one of:

determining that the estimated identification delay according to the image parameters is greater than a corresponding delay threshold;

determining that the estimated recognition accuracy rate determined according to the image parameters is smaller than a corresponding accuracy rate threshold value;

determining that the estimated recognition difficulty value determined according to the image parameters is greater than a corresponding difficulty threshold;

and the estimated identification delay, the estimated identification accuracy and the estimated identification difficulty degree value are determined by the image parameters.

11. An OCR recognition method according to claim 10, wherein said image parameters include: at least one of position information of the picture, text information in the picture, and pixel values of the picture.

12. An OCR recognition method for recognizing a picture at a recognition server, the recognition method comprising:

receiving a picture sent by an identification terminal;

recognizing the picture by using a machine learning model to obtain picture information;

13. An OCR recognition method according to claim 12, wherein said picture information includes: at least one of name, telephone, address information, and barcode number.

14. An OCR recognition method according to claim 12, further comprising:

15. An OCR recognition method according to claim 12, further comprising:

16. An OCR recognition apparatus for recognizing a picture, the OCR recognition apparatus comprising:

17. The apparatus of claim 16, wherein the OCR device further comprises:

and the picture acquisition module is used for acquiring pictures.

18. An OCR recognition system is used for recognizing pictures and comprises a recognition server and at least one recognition terminal;

the at least one identification terminal is used for extracting image parameters from the collected pictures, judging whether the image parameters accord with preset rules or not, and pushing the picture information to the identification server for identification when the image parameters accord with the preset rules; and when the image parameters meet a second preset rule, identifying the picture information.

19. An OCR recognition system as recited in claim 18 wherein the at least one recognition terminal is further configured to:

20. An OCR recognition system as recited in claim 18 wherein the recognition server is operative to:

sending the identified picture information to a machine learning model training server;

21. An OCR recognition system as recited in claim 18 wherein the image parameters include: and the size of the bar code in the picture information and the inclination angle of the picture.

22. An OCR recognition system according to claim 18 and wherein said preset rules include: the ratio of the size of the bar code to the size of the picture is smaller than a first threshold value, or the inclination angle of the picture is larger than a second threshold value.

23. An OCR recognition system as recited in claim 18 wherein the at least one recognition terminal is further configured to:

determining at least one inclination angle of the picture;

24. An OCR recognition system according to claim 18 wherein the picture is a logistics sheet, the logistics sheet includes a barcode region and a text information region, and the picture information includes: at least one of name, telephone, address information, and barcode number.

25. An OCR recognition system according to claim 18, wherein the recognition server comprises a cloud server and an edge server, and the recognition terminal is further configured to: judging whether an edge server exists or not;

when determining that an edge server exists, sending the picture to the edge server for identification;

26. An OCR recognition system according to claim 18 wherein said recognition terminal is integrated in or provided separately from a barcode scanning gun.

27. An OCR recognition system according to claim 26 and wherein said recognition terminal comprises an internet of things device connected to said barcode scanning gun via the internet of things.

28. A terminal device, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the terminal device to perform the method of one or more of claims 1-15.

29. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause a terminal device to perform the method of one or more of claims 1-15.