CN111314697B - Code rate setting method, equipment and storage medium for optical character recognition - Google Patents

Code rate setting method, equipment and storage medium for optical character recognition Download PDF

Info

Publication number
CN111314697B
CN111314697B CN202010116219.3A CN202010116219A CN111314697B CN 111314697 B CN111314697 B CN 111314697B CN 202010116219 A CN202010116219 A CN 202010116219A CN 111314697 B CN111314697 B CN 111314697B
Authority
CN
China
Prior art keywords
value
rate
picture
interval
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010116219.3A
Other languages
Chinese (zh)
Other versions
CN111314697A (en
Inventor
张昊
傅枧根
钟培雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202010116219.3A priority Critical patent/CN111314697B/en
Publication of CN111314697A publication Critical patent/CN111314697A/en
Application granted granted Critical
Publication of CN111314697B publication Critical patent/CN111314697B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • H04N19/426Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements using memory downsizing methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention discloses a code Rate setting method, equipment and a storage medium for optical character recognition, wherein the method comprises the steps of firstly down-sampling a picture, secondly coding the down-sampled picture for multiple times to obtain an optimal QP/Rate value (the lowest code Rate value/quantization coefficient value) which enables the down-sampled picture to be correctly recognized, then obtaining a code Rate increment M/quantization coefficient increment N according to a confidence neural network, and finally quickly finding an optimal code value aiming at the original picture, wherein the optimal code value is the lowest code Rate value/lowest quantization coefficient value which enables the optical character recognition precision of the original picture not to be influenced. Compared with the prior art, the invention not only can reduce the network transmission bandwidth, but also can reduce the occupation of storage space and the cost for a large number of hundreds of millions of pictures. The invention covers the application scene of the mainstream mixed coding architecture, can adopt any image coding standard or intra-frame coding mode of the video coding standard, and has wide application.

Description

Code rate setting method, equipment and storage medium for optical character recognition
Technical Field
The invention relates to the technical field of video coding and deep learning, in particular to a code rate setting method, equipment and a storage medium for optical character recognition.
Background
With the continuous development and progress of artificial intelligence technology, it has become popular to collect data and perform simple processing based on a mobile terminal, and then to transmit motion for intelligent analysis. Among them, face recognition and optical character recognition have been widely used. The transmission of a large amount of images consumes a large amount of bandwidth, in order to save the bandwidth of a data network, code Rate (Rate) setting needs to be carried out on image data, the code Rate of the image is minimum (so that the consumed bandwidth is minimum), and the influence on the image quality is minimum, so that the optical character recognition (namely OCR) effect is good. In addition, even in an application scenario in which OCR is directly performed in a cloud or a local server without network transmission, hundreds of millions of pictures occupy a large amount of storage space. In order to reduce the storage space of the picture and reduce the cost, it is also necessary to control the size of the picture by fast coding of the picture, and a code rate as small as possible (i.e. the size of the picture is as small as possible) is adopted so as not to affect the optical character recognition effect.
The conventional common image encoding methods are JPEG, JPEG2000, and the like. In recent years, the intra-frame coding method of the video coding standard can also be used for image coding, and better coding efficiency is achieved than the conventional methods such as JPEG. Among them, a series of standards such as h.264, HEVC, VVC, AVS2, AVS3, AV1 adopt a hybrid coding architecture, mainly aiming at video coding, but their intra-coding is also gradually applied to image coding. At present, on the premise of how to ensure the accuracy of optical character recognition in a plurality of coding standards, the problem of reducing the picture code rate as much as possible is still to be solved.
Disclosure of Invention
The present invention at least solves one of the technical problems in the prior art, and provides a code rate setting method, device and storage medium for optical character recognition.
According to an embodiment of the present invention, there is provided a code rate setting method for optical character recognition, including the steps of:
setting a Rate interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
and calculating an optimal Rate value of the downsampled picture in the Rate interval, wherein the optimal Rate value is a minimum value which meets the following conditions in the Rate interval: coding the downsampled picture based on the optimal Rate value, and correctly identifying the downsampled picture after decoding;
inputting the downsampled picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a code rate increment M;
setting the optimal Rate value plus the n Rate increment M as the optimal coding value of the original picture, wherein the value of n meets the following conditions: and coding the original picture based on the optimal Rate value plus n Rate increment M and being correctly identified after decoding, and coding the original picture based on the optimal Rate value plus n +1 Rate increment M and being incorrectly identified after decoding.
The code rate setting method for optical character recognition provided by the embodiment of the invention at least has the following beneficial effects:
(1) the method comprises the steps of firstly down-sampling a picture, secondly coding the down-sampled picture for multiple times to obtain an optimal Rate value (the lowest code Rate value) which enables the down-sampled picture to be correctly identified, secondly obtaining a code Rate increment M according to a confidence neural network, and finally quickly finding the optimal coding value aiming at the original picture according to the optimal Rate value and the code Rate increment M, wherein the optimal coding value is the lowest code Rate value which enables the optical character identification precision of the original picture not to be influenced. Compared with the prior art, the method reduces the time consumption of coding.
(2) The method not only can reduce the network transmission bandwidth, but also can reduce the occupation of storage space and the cost for a large number of hundreds of millions of pictures.
(3) The method covers the application scene of the mainstream mixed coding architecture, can adopt any image coding standard or intra-frame coding mode of the video coding standard, and has wide application.
According to the code Rate setting method for optical character recognition, the optimal Rate value of the down-sampling picture is obtained based on the dichotomy.
According to the method for setting the code Rate for optical character recognition in the embodiment of the invention, the setting of the Rate interval of the original picture comprises the following steps:
and setting a Rate interval of the original picture according to a coding standard to be selected, or setting the Rate interval of the original picture according to the size or bandwidth of the original picture.
According to the code rate setting method for optical character recognition provided by the embodiment of the invention, the down-sampling multiple of the original picture is 0.25.
According to an embodiment of the present invention, there is provided a code rate setting method for optical character recognition, including the steps of:
setting a QP interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
and solving an optimal QP value of the downsampled picture in the QP interval, wherein the optimal QP value is the minimum value of all values in the QP interval which meets the following conditions: coding the downsampled picture based on the optimal QP value, and correctly identifying the downsampled picture after decoding;
inputting the downsampled picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a quantization coefficient increment N;
setting the optimal QP value plus N quantization coefficient increment N as the optimal coding value of the original picture, wherein the value of N meets the following conditions: the original picture is encoded based on the optimal QP value plus N quantization coefficient increments N and can be correctly identified after decoding, and the original picture is encoded based on the optimal QP value plus N +1 quantization coefficient increments N and cannot be correctly identified after decoding.
The code rate setting method for optical character recognition provided by the embodiment of the invention at least has the following beneficial effects:
(1) the method comprises the steps of firstly carrying out downsampling on a picture, secondly carrying out multiple times of coding on the downsampled picture to obtain an optimal QP value (the lowest quantization coefficient value) which enables the downsampled picture to be correctly identified, then obtaining a quantization coefficient increment N according to a confidence neural network, and finally quickly finding the optimal coding value aiming at the original picture according to the optimal QP value and the quantization coefficient increment N, wherein the optimal coding value is the lowest quantization coefficient value which enables the optical character identification precision of the original picture not to be influenced. Compared with the prior art, the method reduces the time consumption of coding.
(2) The method not only can reduce the network transmission bandwidth, but also can reduce the occupation of storage space and the cost for a large number of hundreds of millions of pictures.
(3) The method covers the application scene of the mainstream mixed coding architecture, can adopt any image coding standard or intra-frame coding mode of the video coding standard, and has wide application.
According to the code rate setting method for optical character recognition, the optimal QP value of the downsampled picture is obtained based on the dichotomy.
According to the method for setting the code rate for optical character recognition in the embodiment of the invention, the setting of the QP interval of the original picture comprises the following steps:
and setting the QP interval of the original picture according to the coding standard to be selected, or setting the QP interval of the original picture according to the size or bandwidth of the original picture.
According to the code rate setting method for optical character recognition provided by the embodiment of the invention, the down-sampling multiple of the original picture is 0.25.
According to an embodiment of the present invention, there is provided a code rate setting apparatus for optical character recognition, including: at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform a code rate setting method for optical character recognition as described above.
According to an embodiment of the present invention, there is provided a computer-readable storage medium storing computer-executable instructions for causing a computer to execute a code rate setting method for optical character recognition as described above.
Drawings
The invention is further described below with reference to the accompanying drawings and examples;
fig. 1 is a schematic flowchart of a code rate setting method for optical character recognition according to a first embodiment of the present invention;
FIG. 2 is a schematic view of the detailed process of step S102 in FIG. 1;
fig. 3 is a schematic flowchart of a code rate setting method for optical character recognition according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating the step S202 in FIG. 2;
fig. 5 is a schematic structural diagram of a code rate setting device for optical character recognition according to a fifth embodiment of the present invention.
Detailed Description
The technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making any creative effort, shall fall within the protection scope of the disclosure. It should be noted that the features of the embodiments and examples of the present disclosure may be combined with each other without conflict. In addition, the purpose of the drawings is to graphically supplement the description in the written portion of the specification so that a person can intuitively and visually understand each technical feature and the whole technical solution of the present disclosure, but it should not be construed as limiting the scope of the present disclosure.
Referring to fig. 1 and 2, a first embodiment of the present invention provides a code rate setting method for optical character recognition, including the following steps:
s101, setting a Rate interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
as an alternative implementation manner, here, the Rate interval of the original picture may be set according to the coding standard to be selected, or the Rate interval of the original picture may be set according to the size or bandwidth of the original picture, which may be specifically adjusted according to the actual situation. For example: and coding the picture by using H.264 coding standard software JM, and setting the range of the Rate interval to be between [100 and 5000 ].
As an alternative embodiment, the original picture is downsampled by a factor of 0.25. In comparison with 0.1-time down-sampling and 0.5-time down-sampling, 0.25-time down-sampling is preferable because the size of a picture is reduced to a large extent and blurring of the picture can be avoided.
S102, solving an optimal Rate value of the downsampled picture based on a dichotomy, wherein the optimal Rate value is a minimum value meeting the following conditions in a Rate interval: coding the downsampled picture based on the optimal Rate value, and correctly identifying the downsampled picture after decoding;
it should be noted that, here, the optimal Rate value of the downsampled picture may also be obtained through a successive coding method, in this embodiment, it is preferable to obtain the optimal Rate value based on a bisection method, and the optimal Rate value can be obtained relatively quickly, and especially when the Rate interval is large, the efficiency of using the bisection method can be greatly improved.
The specific steps of solving the optimal Rate value of the downsampled picture based on the bisection method are as follows:
s1021, coding the downsampled picture based on the intermediate value of the Rate interval;
s1022, decoding the encoded downsampled picture, and then performing optical character recognition (recognition based on an optical character recognition model can be performed at the mobile terminal);
s1023, if the correct identification result can be obtained, updating the Rate interval by taking the middle value of the Rate interval as the updated right end point; if the correct recognition result cannot be obtained, updating the Rate interval by taking the middle value of the Rate interval as the updated left end point;
s1024, if the difference of the updated Rate values corresponding to the left end point and the right end point of the Rate interval is larger than 1, jumping to the step S1021; if the difference between the updated Rate values corresponding to the left and right endpoints of the Rate interval is less than or equal to 1, go to step S1025;
s1025, if the right end point of the Rate interval can obtain a correct recognition result, adopting the Rate value of the right end point as an optimal Rate value; if the right end point of the Rate interval can not obtain a correct recognition result, adopting the Rate value of the left end point as an optimal Rate value;
s103, inputting the down-sampling picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a code rate increment M;
in this step, a downsampled picture is input into a confidence neural network for deep learning, so that a confidence value that the downsampled picture can be correctly identified can be obtained, and then confidence prediction is performed to obtain a code rate increment M. Here, the confidence prediction is calculated by a functional formula, where the confidence value is input and the rate increase M is output, for example: when the confidence coefficient value is 90, solving that M is 5 according to a function calculation formula; when the confidence coefficient value is 80, solving M to be 4 according to a function calculation formula; it is understood that the function calculation formula can be set according to actual conditions.
S104, setting the optimal Rate value plus n code Rate increment M as the optimal coding value of the original picture, wherein the value of n meets the following conditions: the method comprises the steps of coding an original picture based on the optimal Rate value plus n Rate increment M and being correctly identified after decoding, and coding the original picture based on the optimal Rate value plus n +1 Rate increment M and being not correctly identified after decoding.
In the method provided by the embodiment, firstly, picture down-sampling is performed on an original picture; secondly, performing Rate value search on the downsampled picture based on a dichotomy to quickly obtain an optimal Rate value; then, continuously searching the original picture through the obtained optimal Rate value to obtain an optimal coding value, wherein the optimal coding value is a lowest code Rate value which enables the original picture to meet the accuracy of optical character recognition; the method can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of storage space for a large number of hundreds of millions of pictures; the method can also be applied to any current image coding standard or intra-frame coding mode of video coding standard, and is widely applied.
Referring to fig. 3 and 4, a second embodiment of the present invention provides a code rate setting method for optical character recognition, including the following steps:
s201, setting a QP (quantization coefficient) interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
as an alternative implementation manner, the QP interval of the original picture may be set according to the coding standard to be selected, or the QP interval of the original picture may be set according to the size or bandwidth of the original picture, which may be adjusted according to the actual situation. For example: if the picture is coded by the h.264 coding standard software JM, the QP interval range is set to be between [10, 40 ].
As an alternative embodiment, the original picture is downsampled by a factor of 0.25. In comparison with 0.1-time down-sampling and 0.5-time down-sampling, 0.25-time down-sampling is preferable because the size of a picture is reduced to a large extent and blurring of the picture can be avoided.
S202, solving an optimal QP value of the downsampled picture based on a dichotomy, wherein the optimal QP value is the minimum value of all values in a QP interval which meet the following conditions: coding the downsampled picture based on the optimal QP value, and correctly identifying the downsampled picture after decoding;
it should be noted that, here, the optimal QP value of the downsampled picture may also be obtained through a successive coding method, and in this embodiment, it is preferable to obtain the optimal QP value based on the bisection method, so that the optimal QP value can be obtained relatively quickly, and especially when the QP interval is large, the efficiency of using the bisection method can be greatly improved.
The specific steps of solving the optimal QP value of the downsampled picture based on the bisection method are as follows:
s2021, coding the downsampled picture based on the middle value of the QP interval;
s2022, decoding the encoded downsampled picture, and performing optical character recognition (recognition based on an optical character recognition model can be performed at the mobile terminal);
s2023, if the correct recognition result can be obtained, updating the QP segment by using the middle value of the QP segment as the updated right end point; if the correct identification result cannot be obtained, updating the QP interval by taking the middle value of the QP interval as an updated left end point;
s2024, if the difference between the QP values corresponding to the left end point and the right end point of the updated QP interval is larger than 1, jumping to the step S2021; if the difference between the QP values corresponding to the left and right endpoints of the updated QP interval is less than or equal to 1, go to step S2025;
s2025, if the right end point of the QP interval can obtain a correct identification result, adopting the QP value of the right end point as the optimal QP value; if the right end point of the QP interval can not obtain the correct identification result, adopting the QP value of the left end point as the optimal QP value;
s203, inputting the down-sampling picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a quantization coefficient increment N;
in this step, a downsampled picture is input into a confidence neural network for deep learning, so that a confidence value that the downsampled picture can be correctly identified can be obtained, and then confidence prediction is performed to obtain a quantization coefficient increment N. It should be noted that the confidence prediction here is a function calculation formula, and the function calculation formula can be set according to the actual situation, and the setting principle is the same as that of the first embodiment, and will not be described in detail here.
S204, setting the optimal QP value plus the N quantization coefficient increment N as the optimal coding value of the original picture, wherein the value of N meets the following conditions: the original picture is encoded based on the optimal QP value plus N quantization coefficient increase N and can be correctly identified after decoding, and the original picture is encoded based on the optimal QP value plus N +1 quantization coefficient increase N and cannot be correctly identified after decoding.
In the method provided by the embodiment, firstly, picture down-sampling is performed on an original picture; secondly, searching QP values of the downsampled pictures based on a dichotomy to quickly obtain the optimal QP values; then, continuously searching the original picture through the obtained optimal QP value to obtain an optimal coding value, wherein the optimal coding value is a lowest quantization coefficient value which enables the original picture to meet the accuracy of optical character recognition; the method can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of storage space for a large number of hundreds of millions of pictures; the method can also be applied to any current image coding standard or intra-frame coding mode of video coding standard, and is widely applied.
It should be noted that, since the QP value and the Rate value are mutually convertible in the art, the second embodiment is based on the same inventive concept as the first embodiment.
The third embodiment of the present invention provides a code rate setting method for optical character recognition, which uses h.264 coding standard software JM to exemplify the coding of a picture, and includes the following specific steps:
(1) the interval of the picture coding Rate is automatically set according to the requirement, for example: the Rate is required to be greater than 100 and less than 5000, and the interval can be set according to the conditions such as the size of the picture or the bandwidth;
(2) carrying out 1/4 times down-sampling on the original picture;
(3) coding based on the middle value of the downsampled picture code rate interval;
(4) performing optical character recognition after decoding the downsampled picture (recognition can be performed on the basis of an optical character recognition model at a mobile terminal);
(5) if the correct recognition result can be obtained under the condition of the middle value of the Rate, the right end point is used as an updated right end point, and the middle value is recalculated to be the middle value of a new interval; otherwise, the left endpoint is used as an updated left endpoint;
(6) repeating the steps (3), (4) and (5) until the separation is not carried out (the difference of the Rate values corresponding to the left end point and the right end point is less than or equal to 1), and finishing the searching process; if the right end point Rate can obtain the correct recognition result, adopting the right end point Rate for coding; otherwise, adopting the Rate of the left end point to carry out coding;
(7) after downsampling the picture, inputting the downsampled picture into a confidence neural network for deep learning to obtain a confidence value which can be correctly identified by the picture, and obtaining a code rate increment M according to the value of the confidence;
(8) taking the Rate value of the downsampled picture finally obtained in the step (6) as the initial Rate value of the original picture;
(9) continuously adding the code Rate increment M to the initial Rate value of the original picture to be used as a new Rate value;
(10) and the original picture is coded according to the new Rate value, and optical character recognition is carried out after decoding (the mobile terminal can carry out recognition based on an optical character recognition model).
(11) If the original picture can obtain a correct recognition result under the condition of the current Rate value, repeating the steps (9) and (10); otherwise, the Rate value is unchanged, and the update of the Rate value is finished;
(12) and subtracting a fixed value from the Rate value at the moment to be used as the Rate value for coding the original picture.
The method provided by the embodiment can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of the storage space for hundreds of millions of pictures. It should be noted that, the present embodiment uses the h.264 coding standard for example, but the present invention is applicable to any coding standard and method.
A fourth embodiment of the present invention provides a code rate setting method for optical character recognition, which uses an h.264 coding standard software JM to perform example coding on a picture, and includes the following specific steps:
(1) automatically setting a QP interval of picture coding according to requirements; for example: the QP is required to be larger than 10 and smaller than 40, and the interval may be set according to conditions such as the picture size and the bandwidth.
(2) The original picture is downsampled 1/4 times.
(3) Encoding is performed based on the intermediate value of the QP interval of the downsampled picture.
(4) And (3) decoding the downsampled picture and then performing optical character recognition (recognition can be performed on the basis of an optical character recognition model at a mobile terminal).
(5) If the correct recognition result can be obtained under the condition of the QP intermediate value, the QP intermediate value is used as an updated right endpoint, and the intermediate value is recalculated to be the intermediate value of the new interval; otherwise, it is taken as the updated left endpoint.
(6) Repeating the steps (3), (4) and (5) until the separation is not carried out (the difference between the QP values corresponding to the left end point and the right end point is less than or equal to 1), and ending the search process; if the QP/Rate of the right endpoint can obtain a correct recognition result, adopting the QP/Rate of the right endpoint for coding; otherwise, coding is carried out by adopting the QP/Rate of the left end point.
(7) And inputting the downsampled picture into a deep learning confidence neural network to obtain a confidence value which can be correctly identified by the picture. And obtaining a quantization coefficient increment N according to the value of the confidence coefficient.
(8) And (4) taking the QP value of the downsampled picture finally obtained in the step (6) as a new initial QP value of the original picture.
(9) The original picture QP value continues to be incremented by the quantization coefficient increment amount N as a new QP value.
(10) And the original picture is coded according to the new QP value, and then optical character recognition is carried out after decoding (the mobile terminal can carry out recognition based on an optical character recognition model).
(11) If the original picture can obtain a correct recognition result under the condition of the current QP value, repeating the steps (9) and (10); otherwise, the QP value is not changed, and the updating of the QP value is finished.
(12) The original picture is encoded using the QP value at this time minus a fixed value as the QP value.
The method provided by the embodiment can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of the storage space for hundreds of millions of pictures. It should be noted that, the present embodiment uses the h.264 coding standard for example, but the present invention is applicable to any coding standard and method.
Referring to fig. 5, a fifth embodiment of the present invention further provides a code rate setting device for optical character recognition, where the code rate setting device for optical character recognition may be any type of intelligent terminal, such as a mobile phone, a tablet computer, a personal computer, and the like.
Specifically, the code rate setting device for optical character recognition includes: one or more control processors and memory, one control processor being exemplified in fig. 5. The control processor and the memory may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.
The memory, which is a non-transitory computer-readable storage medium, may be used to store a non-transitory software program, a non-transitory computer-executable program, and a module, such as program instructions/modules corresponding to the code rate setting device for optical character recognition in the embodiments of the present invention, and the control processor implements the code rate setting method for optical character recognition by operating the non-transitory software program, instructions, and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store the generated data. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the control processor, and the remote memory may be connected to the code rate setting device for optical character recognition over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory and, when executed by the one or more control processors, perform a code rate setting method for optical character recognition in the above-described method embodiments, for example, perform the above-described method steps S101 to S104 in fig. 1 or the method steps S201 to S204 in fig. 3.
Embodiments of the present invention also provide a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, which are executed by one or more control processors, for example, by one of the control processors in fig. 5, and may cause the one or more control processors to perform the code rate setting method for optical character recognition in the above method embodiment, for example, perform the above-described method steps S101 to S104 in fig. 1, or perform the method steps S201 to S204 in fig. 3.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art can clearly understand that the embodiments can be implemented by software plus a general hardware platform. Those skilled in the art will appreciate that all or part of the processes of the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (10)

1. A code rate setting method aiming at optical character recognition is characterized by comprising the following steps:
setting a Rate interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture, wherein the Rate represents the code Rate of the original picture;
and calculating an optimal Rate value of the downsampled picture in the Rate interval, wherein the optimal Rate value is a minimum value which meets the following conditions in the Rate interval: coding the downsampled picture based on the optimal Rate value, and correctly identifying the downsampled picture after decoding;
inputting the downsampled picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a code rate increment M;
setting the optimal Rate value plus the n Rate increment M as the optimal coding value of the original picture, wherein the value of n meets the following conditions: and coding the original picture based on the optimal Rate value plus n Rate increment M and being correctly identified after decoding, and coding the original picture based on the optimal Rate value plus n +1 Rate increment M and being incorrectly identified after decoding.
2. The code Rate setting method for OCR according to claim 1, wherein the optimal Rate of the downsampled picture is obtained based on dichotomy; wherein the calculating of the optimal Rate value of the downsampled picture based on the bisection method comprises the following steps:
s1021, encoding the downsampled picture based on the intermediate value of the Rate interval;
s1022, decoding the coded downsampled picture and then performing optical character recognition;
s1023, if a correct identification result can be obtained, updating the Rate interval by taking the intermediate value of the Rate interval as an updated right end point; if the correct recognition result cannot be obtained, taking the middle value of the Rate interval as an updated left end point, and updating the Rate interval;
s1024, if the difference between the updated Rate values corresponding to the left end point and the right end point of the Rate interval is larger than 1, jumping to the step S1021; if the difference between the updated Rate values corresponding to the left and right endpoints of the Rate interval is less than or equal to 1, then go to step S1025;
s1025, if the right end point of the updated Rate interval can obtain a correct recognition result, adopting the Rate value of the right end point as the optimal Rate value; and if the right end point of the updated Rate interval can not obtain a correct identification result, adopting the Rate value of the left end point as the optimal Rate value.
3. The method for setting the code Rate for OCR according to claim 2, wherein the setting the Rate interval of the original picture comprises:
and setting a Rate interval of the original picture according to a coding standard to be selected, or setting the Rate interval of the original picture according to the size or bandwidth of the original picture.
4. The method for setting code rate for OCR according to any of claims 1 to 3, wherein the downsampling of the original picture is 0.25 times.
5. A method for setting quantization coefficients for optical character recognition, comprising the steps of:
setting a QP interval of an original picture, and performing downsampling on the original picture to obtain a downsampled picture, wherein the QP represents a quantization coefficient of the original picture;
and solving an optimal QP value of the downsampled picture in the QP interval, wherein the optimal QP value is the minimum value of all values in the QP interval which meets the following conditions: coding the downsampled picture based on the optimal QP value, and correctly identifying the downsampled picture after decoding;
inputting the downsampled picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a quantization coefficient increment N;
setting the optimal QP value plus N quantization coefficient increment N as the optimal coding value of the original picture, wherein the value of N meets the following conditions: the original picture is encoded based on the optimal QP value plus N quantization coefficient increments N and can be correctly identified after decoding, and the original picture is encoded based on the optimal QP value plus N +1 quantization coefficient increments N and cannot be correctly identified after decoding.
6. The method for setting quantization coefficients for OCR according to claim 5, wherein the optimal QP value of the downsampled picture is found based on dichotomy, comprising the following steps:
s2021, encoding the downsampled picture based on the middle value of the QP interval;
s2022, decoding the encoded down-sampling picture and then performing optical character recognition;
s2023, if the correct recognition result can be obtained, updating the QP interval by using the middle value of the QP interval as the updated right endpoint; if the correct identification result cannot be obtained, updating the QP interval by taking the middle value of the QP interval as an updated left end point;
s2024, if the difference between the updated QP values corresponding to the left end and the right end of the QP interval is larger than 1, jumping to the step S2021; if the difference between the updated QP values corresponding to the left and right endpoints of the QP interval is less than or equal to 1, go to step S2025;
s2025, if the right end point of the updated QP interval can obtain a correct identification result, adopting the QP value of the right end point as the optimal QP value; and if the right end point of the updated QP interval can not obtain a correct identification result, adopting the QP value of the left end point as the optimal QP value.
7. The method as claimed in claim 5, wherein the setting of the QP interval of the original picture comprises:
and setting the QP interval of the original picture according to the coding standard to be selected, or setting the QP interval of the original picture according to the size or bandwidth of the original picture.
8. The method as claimed in any one of claims 5 to 7, wherein the original picture is downsampled by a factor of 0.25.
9. A code rate setting device for optical character recognition, comprising: at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform a method for setting a code rate for OCR according to any one of claims 1 to 4 and a method for setting a quantization coefficient for OCR according to any one of claims 5 to 8.
10. A computer-readable storage medium storing computer-executable instructions for causing a computer to execute a code rate setting method for optical character recognition according to any one of claims 1 to 4 and a quantization coefficient setting method for optical character recognition according to any one of claims 5 to 8.
CN202010116219.3A 2020-02-25 2020-02-25 Code rate setting method, equipment and storage medium for optical character recognition Active CN111314697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010116219.3A CN111314697B (en) 2020-02-25 2020-02-25 Code rate setting method, equipment and storage medium for optical character recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116219.3A CN111314697B (en) 2020-02-25 2020-02-25 Code rate setting method, equipment and storage medium for optical character recognition

Publications (2)

Publication Number Publication Date
CN111314697A CN111314697A (en) 2020-06-19
CN111314697B true CN111314697B (en) 2021-10-15

Family

ID=71147740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116219.3A Active CN111314697B (en) 2020-02-25 2020-02-25 Code rate setting method, equipment and storage medium for optical character recognition

Country Status (1)

Country Link
CN (1) CN111314697B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109302608A (en) * 2017-07-25 2019-02-01 华为技术有限公司 Image processing method, equipment and system
CN109495741A (en) * 2018-11-29 2019-03-19 四川大学 Method for compressing image based on adaptive down-sampling and deep learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080067922A (en) * 2007-01-17 2008-07-22 삼성전자주식회사 Method and apparatus for decoding video with image scale-down function
CN101778275B (en) * 2009-01-09 2012-05-02 深圳市融创天下科技股份有限公司 Image processing method of self-adaptive time domain and spatial domain resolution ratio frame
EP3092806A4 (en) * 2014-01-07 2017-08-23 Nokia Technologies Oy Method and apparatus for video coding and decoding
ES2907602T3 (en) * 2014-12-31 2022-04-25 Nokia Technologies Oy Cross-layer prediction for scalable video encoding and decoding
CN109120926B (en) * 2017-06-23 2019-08-13 腾讯科技(深圳)有限公司 Predicting mode selecting method, device and medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109302608A (en) * 2017-07-25 2019-02-01 华为技术有限公司 Image processing method, equipment and system
CN109495741A (en) * 2018-11-29 2019-03-19 四川大学 Method for compressing image based on adaptive down-sampling and deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Low Bit-Rate Image Compression via Adaptive;Xiaolin Wu, Senior Member;《IEEE TRANSACTIONS ON IMAGE PROCESSING》;20090210;全文 *
基于HEVC 帧内预测的复杂度控制;李林格,张恋,王洁,周巧,张昊;《电视技术》;20161117;全文 *

Also Published As

Publication number Publication date
CN111314697A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
TWI806199B (en) Method for signaling of feature map information, device and computer program
US20220237454A1 (en) Linear neural reconstruction for deep neural network compression
KR20220137076A (en) Image processing method and related device
CN109587491A (en) A kind of intra-frame prediction method, device and storage medium
WO2022155974A1 (en) Video coding and decoding and model training method and apparatus
CN113642673A (en) Image generation method, device, equipment and storage medium
WO2022000298A1 (en) Reinforcement learning based rate control
CN111314697B (en) Code rate setting method, equipment and storage medium for optical character recognition
CN115880381A (en) Image processing method, image processing apparatus, and model training method
CN115442609A (en) Characteristic data encoding and decoding method and device
CN117459733A (en) Video encoding method, apparatus, device, readable storage medium, and program product
CN116433491A (en) Image processing method, device, equipment, storage medium and product
CN112312131A (en) Inter-frame prediction method, device, equipment and computer readable storage medium
CN113808157B (en) Image processing method and device and computer equipment
CN117494762A (en) Training method of student model, material processing method, device and electronic equipment
TW202337211A (en) Conditional image compression
CN116090543A (en) Model compression method and device, computer readable medium and electronic equipment
CN116644783A (en) Model training method, object processing method and device, electronic equipment and medium
CN114979651A (en) Terminal video data transmission method, device, equipment and medium
CN114501031B (en) Compression coding and decompression method and device
Barannik et al. The Principles of Developing a Differential Video Controlling Scheme Based on the Use of Intelligent Agents
WO2022147745A1 (en) Encoding method, decoding method, encoding apparatus, decoding apparatus
WO2023024115A1 (en) Encoding method, decoding method, encoder, decoder and decoding system
Depoian et al. Tiny Entropy Based Image Compression
CN115952829A (en) Network searching method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant