WO2021164479A1 - 一种视频文字跟踪方法及电子设备 - Google Patents
一种视频文字跟踪方法及电子设备 Download PDFInfo
- Publication number
- WO2021164479A1 WO2021164479A1 PCT/CN2021/071796 CN2021071796W WO2021164479A1 WO 2021164479 A1 WO2021164479 A1 WO 2021164479A1 CN 2021071796 W CN2021071796 W CN 2021071796W WO 2021164479 A1 WO2021164479 A1 WO 2021164479A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- point set
- points
- electronic device
- tracking
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
- G06V30/1607—Correcting image deformation, e.g. trapezoidal deformation caused by perspective
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- This application relates to the sub-field of Optical Character Recognition (OCR) in the field of Artificial Intelligence (AI), and in particular to a video text tracking method and electronic equipment.
- OCR Optical Character Recognition
- AI Artificial Intelligence
- AR translation does not need to take a photo before recognizing the content of the image, but instead presents a real-time translation effect to the text content in the camera, as long as the user points the camera to the translation that needs to be translated Content, it can give an accurate real-time translation in the original position.
- the whole process of real-scene AR translation is completely dynamic. Compared with the previous photo translation, the experience has been upgraded by leaps and bounds. It is especially suitable for scenes such as tourism, overseas shopping, and reading foreign literature.
- the entire process of AR translation involves technologies such as OCR text detection and recognition, text tracking (tracking), machine translation, AR rendering, and translated text backfilling. Because it takes a long time to perform OCR (100 milliseconds to seconds per video frame), it is impossible to obtain the position of the text line by OCR frame by frame when the lens of the mobile phone or camera moves in the actual shooting scene. Such a solution cannot be satisfied. Real-time. Therefore, tracking the text recognized by the early OCR in the AR translation product and predicting the location of the text line is a necessary guarantee for real-time display of the translation effect. In addition, the real-life AR translation technology can also be applied to scenes such as automatic translation and backfilling of video subtitles to quickly complete the subtitle translation of each frame of the video, which greatly saves manpower.
- the position of each line of straight line text is generally determined by an oblique rectangle.
- the most commonly used technical solutions are: first perform OCR on the first video frame after the lens is stabilized, detect and identify the text line position and text content in the video frame, and then use corner points in each text line area
- Key point detection technology determines a certain number of tracking points, and then uses tracking methods such as optical flow to obtain the corresponding positions of these tracking points in the next video frame, so that the projection transformation of each text line region between two video frames can be calculated Matrix (or homography matrix), the projection matrix is applied to the four vertices of the oblique rectangle of the text line area to get the position of the text line in the next frame, and then the translated text is backfilled; repeat the above tracking process until there is a text line Move out of the field of view or occluded by other objects, etc., so that the proportion of the number of tracking points (relative to the first frame of OCR) that can be found in two adjacent frames is less than the threshold, it is considered that the tracking has failed.
- the prior art has drawbacks in dealing with the tracking problem of curved text: when tracking curved text, there is a large amount of blank space outside the text area in the inclined rectangle used to frame the position of the text line. If the target is used to detect the commonly used IOU (actual The area of the text area and the predicted text area) is used as a measurement indicator. Although the intersection area between the actual text area and the predicted area may not be small, the value of the measurement indicator is normalized by using a larger predicted area. It must be undesirable; such curvy text often appears in the shop signs shown in Figure 2 and in scenes such as artistic narration or subtitles in videos.
- IOU actual The area of the text area and the predicted text area
- This application provides a video text tracking method and electronic device, which is different from the prior art for tracking full text lines, splitting the text line area into sub-areas, tracking each sub-areas, and then processing and connecting them into new text Line, not only can be compatible with straight text (the center point of the text is on a straight line) or curved text scenes, it also has a good tracking effect for text lines that exhibit deformation properties, and can accurately track the position of the predicted text line.
- the present application provides a method for tracking video text.
- the method includes: an electronic device performs OCR detection on a first video frame to obtain a frame point anchoring the position of each text line, which includes at least a first initial frame point Set, the first initial frame point set includes frame points identified by OCR for anchoring the position of the first text line, the first text line is any text line in the first video frame, and the first initial frame point
- the number of frame points in the frame point set is not less than 4;
- the electronic device determines the first extended frame point set according to the first initial frame point set, and the first extended frame point set frames the first text line in N consecutive And in the sub-regions with uniform width, the N is a positive integer not less than 2;
- the electronic device determines the first set of extended frame points according to the position of the second set of tracking points relative to each tracking point in the first set of tracking points
- the position of each frame point in the second video frame obtains a second calculation frame point set, wherein the first tracking point set includes the sub-regions
- the electronic device splits each text line into N continuous and uniform width sub-areas according to the frame points of each text line recognized by OCR, and then tracks each uniform sub-areas Determine the position of each text line in the second video frame. Because the scale of the tracking process is more refined, and each continuous sub-region can be straight or curved, it can not only be compatible with straight text (the center point of the text is in a straight line). (Above) or curved text scenes, it also has a good tracking effect for text lines that exhibit deformation properties, and can accurately track the position of the predicted text line.
- the electronic device determines the second area according to the second set of calculation frame points, including: the electronic device adjusts the position of each frame point in the second set of calculation frame points to obtain the first 2. a set of adjustment frame points, so that each sub-region determined by the second set of adjustment frame points completely surrounds each tracking point in the second set of tracking points; the electronic device determines the second area according to the second set of adjustment frame points.
- the electronic device may adjust the position of the frame point as a whole according to the highest tracking point and the lowest tracking point in the second tracking point set:
- the ordinate of each frame point in the frame point set under the second calculation is adjusted to be less than the minimum value of the ordinate of the tracking point in the second tracking point set, and greater than the difference between the minimum value of the ordinate of the tracking point and the preset parameter times the font height.
- the electronic device may adjust the position of the frame point according to the highest tracking point and the lowest tracking point within a preset distance range from each frame point:
- the electronic device may not directly use the second set of calculation frame points, but first compare each of the second set of calculation frame points. Adjust the position of the frame point to obtain a second set of adjustment frame points that can completely surround the second set of tracking points, and then use the second set of adjustment frame points to determine where the first text line is in the second video frame.
- the second area further improves the accuracy of predicting the position of the first text line in the second video frame, and completely enclosing the second tracking point enables more tracking points to be used when predicting the position of the subsequent video frame , To improve the continuity of video text tracking.
- the electronic device determines the second area according to the second set of adjustment frame points, including: an enclosing curve of each sub-area determined by the electronic device for the second set of adjustment frame points Perform smoothing processing to obtain the second region.
- the electronic device may respectively fit the frame points in the second adjusted upper frame point set and the second adjusted lower frame point set to obtain a smooth enclosing curve to form the second area, where:
- the second adjustment frame point set can be divided into a second adjustment upper frame point set and a second adjustment lower frame point set.
- the second adjustment upper frame point set is the frame point located in the upper half of the subregion, and the second adjustment lower frame point set Is the frame point located in the lower half of the subarea.
- the electronic device when the electronic device respectively fits the frame points in the second set of adjusted upper frame points and the second set of adjusted lower frame points, it may calculate the second set of adjusted upper frame points and the second set of frame points respectively. Adjust the linear correlation coefficient of the frame points in the lower frame point set, and determine the fitting method according to the value of the linear correlation coefficient:
- the electronic device may calculate the Pearson correlation coefficients of the frame points in the second adjusted upper frame point set and the second adjusted lower frame point set. If the linear correlation is strong (for example, the correlation coefficient is greater than 0.8), it may be determined to adopt Linear fitting; if the linear correlation is weak, it can be determined to adopt a higher-order fitting such as quadratic.
- the electronic device fits the enclosing curve of each sub-region determined by the second set of adjustment frame points, that is, smoothing the enclosing curve, so that each sub-region can continue to maintain the continuous nature, and the text line
- the surrounding curve can be kept smooth and prevent jagged.
- the electronic device may save the intermediate quantity for calculating the linear correlation coefficient (for example, the intermediate quantity when calculating the Pearson correlation coefficient), so that it can be directly used when the relevant intermediate quantity needs to be used in other subsequent calculation processes. Use the saved intermediate amount to reduce the amount of calculation.
- the linear correlation coefficient for example, the intermediate quantity when calculating the Pearson correlation coefficient
- the electronic device determines the first extended frame point set according to the first initial frame point set, including: when the number of frame points in the first initial frame point set is equal to 4, The electronic device takes the points with evenly spaced abscissas on the upper and lower sides of the rectangle determined by the four frame points as the new frame points to form the first set of extended frame points;
- the electronic device respectively fits frame points in the first initial upper frame point set and the first initial lower frame point set to obtain an upper fitting curve And the lower fitting curve, wherein the first initial upper frame point set includes the frame point located in the upper half of the first text line in the first initial frame point set, and the first initial lower frame point set includes the The frame point located in the lower half of the first text line in the first initial frame point set; the electronic device takes the points with evenly spaced abscissas on the upper fitting curve and the lower fitting curve as the new frame points, Compose the first set of extended frame points.
- the electronic device can perform different processing to select points with evenly spaced abscissas as new frame points to form the first extended frame point set, so that video text tracking can be well supported at the same time Straight text and curved text.
- the method further includes: the electronic device determines each frame point in the third adjustment frame point set according to the position of the third tracking point set relative to each tracking point in the second tracking point set From the position in the third video frame, a third calculation frame point set is obtained, wherein the third tracking point set includes the predicted position in the third video frame corresponding to the tracking point in the second tracking point set Tracking point, the third video frame is a video frame obtained after the second video frame; the electronic device adjusts the position of each frame point in the third calculation frame point set to obtain a third adjustment frame point set, so that the Each sub-region determined by the third adjustment frame point set completely surrounds each tracking point in the third tracking point set; the electronic device smoothes the encircling curve of each sub-region determined by the third adjustment frame point set to obtain the third Area, the third area is the position of the first text line in the third video frame determined by the electronic device.
- the position of the first text line in the fourth video frame, the fifth video frame, the sixth video frame, and other subsequent video frames can be tracked and determined.
- the video frame for OCR is the first video frame
- the next video frame that needs to be processed after the first video frame is the second video frame
- the next video frame that needs to be processed after the second video frame is the third video frame.
- Video frame, the frame point and tracking point of the first video frame are determined by OCR, and the frame point and tracking point determined by the previous video frame can be used in the subsequent video frame processing until the corresponding position is found in two adjacent frames of processing.
- the proportion of the number of tracking is less than the threshold of the proportion of tracking points, it is considered that the tracking has failed, and another tracking process is restarted, and a first video frame is determined again. In this way, the continuous and efficient operation of the video text method is guaranteed.
- the method further includes: the electronic device maintains a buffer with a fixed length of a preset number of video frames starting from the first video frame, and the buffer is used to identify the first video frame in the OCR. When the result of a video frame is not returned, the newly generated video frame is stored.
- the electronic device deletes an old video frame every time a new video frame is added to the buffer, where adjacent ones stored in the buffer The difference in the acquisition time of the video frame is less than the preset interval time.
- the electronic device deletes an old video frame every time a new video frame is added to the buffer, so that the remaining adjacent frames in the buffer Keep the interval as even as possible.
- the purpose of maintaining a fixed-length buffer is to prevent the first video frame from having too much text and the OCR recognition time is too long. Limiting the size of the buffer can shorten the time for the buffer to "catch up" with the latest video frame, thereby shortening The time for users to wait for the results to be returned improves the experience.
- an embodiment of the present application provides an electronic device, the electronic device includes: one or more processors and a memory; the memory is coupled with the one or more processors, and the memory is used to store computer program codes,
- the computer program code includes computer instructions, and the one or more processors call the computer instructions to make the electronic device execute: OCR detection is performed on the first video frame to obtain the frame points anchoring the positions of the text lines, which include at least A first initial frame point set, the first initial frame point set contains frame points identified by OCR for anchoring the position of the first text line, and the first text line is any text line in the first video frame ,
- the number of frame points in the first initial frame point set is not less than 4; according to the first initial frame point set, the first extended frame point set is determined, and the first extended frame point set frames the first text line at N In a continuous and uniform width sub-region, the N is a positive integer not less than 2; according to the position of the second tracking point set relative to each tracking point in the first tracking point set, each of the first
- each text line is divided into N continuous and uniform width sub-areas, and then each uniform sub-areas is tracked to determine each The position of the text line in the second video frame is more refined because the scale of the tracking process is more refined, and each continuous sub-region can be straight or curved, so it can not only be compatible with straight text (the center point of the text is on a straight line) Or curved text scenes, it also has a good tracking effect for text lines that exhibit deformable properties, and can accurately track the position of predicted text lines.
- the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute: adjust the positions of each frame point in the second set of calculation frame points to obtain the first 2. A set of adjustment frame points, so that each sub-region determined by the second set of adjustment frame points completely surrounds each tracking point in the second set of tracking points; and the second area is determined according to the second set of adjustment frame points.
- the frame point position may be adjusted as a whole according to the highest tracking point and the lowest tracking point in the second tracking point set:
- the ordinate of each frame point in the frame point set under the second calculation is adjusted to be less than the minimum value of the ordinate of the tracking point in the second tracking point set, and greater than the difference between the minimum value of the ordinate of the tracking point and the preset parameter times the font height.
- the position of the frame point may be adjusted according to the highest tracking point and the lowest tracking point within a preset distance range from each frame point:
- the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute: perform the enclosing curve of each subregion determined by the second set of adjustment frame points Smooth processing to obtain the second area.
- the frame points in the second set of adjusted upper frame points and the second set of adjusted lower frame points may be respectively fitted to obtain a smooth enclosing curve to form the second area, where the second The adjustment frame point set can be divided into the second adjustment upper frame point set and the second adjustment lower frame point set.
- the second adjustment upper frame point set is the frame point located in the upper half of the sub-region
- the second adjustment lower frame point set is the The frame point of the lower half of the subarea.
- the second adjusted upper frame point set and the second adjusted lower frame point set may be calculated respectively.
- the linear correlation coefficient of the frame points in the frame point set is determined according to the value of the linear correlation coefficient:
- the Pearson correlation coefficients of the frame points in the second adjusted upper frame point set and the second adjusted lower frame point set can be calculated. If the linear correlation is strong (for example, the correlation coefficient is greater than 0.8), the linear quasi- If the linear correlation is weak, it can be determined to adopt a higher-order fitting such as quadratic.
- the intermediate quantity for calculating the linear correlation coefficient (for example, the intermediate quantity when calculating the Pearson correlation coefficient) can be saved, so that it can be used directly when the relevant intermediate quantity needs to be used in other subsequent calculation processes.
- the amount of the middle thereby reducing the amount of calculation.
- the one or more processors are specifically configured to invoke the computer instructions to make the electronic device execute: when the number of frame points in the first initial frame point set is equal to 4, Take the points with evenly spaced abscissas on the upper and lower sides of the rectangle determined by the 4 frame points as the new frame point to form the first extended frame point set;
- the frame points in the first initial upper frame point set and the first initial lower frame point set are respectively fitted to obtain the upper fitting curve and the lower fitting curve.
- a composite curve wherein the first initial upper frame point set includes the frame point located in the upper half of the first text line in the first initial frame point set, and the first initial lower frame point set includes the first initial The frame points located in the lower half of the first text line in the frame point set; respectively take points with evenly spaced abscissas on the upper fitting curve and the lower fitting curve as the new frame points to form the first extended frame Point collection.
- the one or more processors are further configured to invoke the computer instructions to make the electronic device execute: according to the third tracking point set relative to each tracking point in the second tracking point set The position of each frame point in the third adjustment frame point set in the third video frame is determined to obtain the third calculation frame point set, where the third tracking point set includes the prediction in the third video frame
- the tracking point corresponding to the tracking point in the second tracking point set, the third video frame is a video frame obtained after the second video frame; adjusting the position of each frame point in the third calculation frame point set , Obtain the third adjustment frame point set, so that each sub-region determined by the third adjustment frame point set completely surrounds each tracking point in the third tracking point set; surround each sub-region determined by the third adjustment frame point set
- the curve is smoothed to obtain a third area, and the third area is the determined position of the first text line in the third video frame.
- the electronic device can track and determine the position of the first text line in subsequent video frames such as the fourth video frame, the fifth video frame, and the sixth video frame.
- the one or more processors are further configured to call the computer instructions to make the electronic device execute: starting from the first video frame, maintain a fixed length of a preset number of frames
- the buffer of the video frame is used to store the newly generated video frame when the result of the OCR identifying the first video frame is not returned.
- the number of video frames stored in the buffer is equal to the preset number of frames
- an old video frame is deleted every time a new video frame is added to the buffer, wherein the number of adjacent video frames stored in the buffer is The difference in the acquisition time is less than the preset interval time.
- an electronic device which includes:
- OCR detection module used to perform OCR detection on the first video frame to obtain the frame points anchoring the position of each text line, which includes at least a first initial frame point set, and the first initial frame point set includes the OCR identified
- a frame point used to anchor the position of the first text line, the first text line is any text line in the first video frame, and the number of frame points in the first initial frame point set is not less than 4;
- Frame point expansion module used to determine a first extended frame point set according to the first initial frame point set, and the first extended frame point set frames the first text line in N continuous and uniformly-width sub-areas,
- the N is a positive integer not less than 2;
- Frame point calculation module used to determine the position of each frame point in the first extended frame point set in the second video frame according to the position of the second tracking point set relative to each tracking point in the first tracking point set to obtain the first 2. Calculating a frame point set, wherein the first tracking point set includes tracking points in a sub-region determined by the first extended frame point set in the first video frame, and the second tracking point set is included in the first video frame; 2. A tracking point predicted in a video frame corresponding to a tracking point in the first tracking point set, where the second video frame is a video frame obtained after the first video frame;
- Region determination module Determine a second region according to the second set of calculation frame points, where the second region is the determined position of the first text line in the second video frame.
- the area determination module specifically includes:
- Frame point adjustment unit used to adjust the position of each frame point in the second calculation frame point set to obtain a second adjustment frame point set, so that each sub-region determined by the second adjustment frame point set completely surrounds the second tracking point Each tracking point in the collection;
- Area determining unit used to determine the second area according to the second set of adjustment frame points.
- the frame point adjustment unit may adjust the position of the frame point as a whole according to the highest tracking point and the lowest tracking point in the second tracking point set: adjust each frame point in the frame point set on the second calculation
- the ordinate of is greater than the maximum value of the ordinate of the tracking point in the second tracking point set, and is smaller than the sum of the maximum value of the ordinate of the tracking point and the preset parameter times the font height; adjust each frame point in the frame point set of the second calculation
- the ordinate of is smaller than the minimum value of the ordinate of the tracking point in the second tracking point set, and is greater than the difference between the minimum value of the ordinate of the tracking point and the preset parameter times the font height.
- the frame point adjustment unit may adjust the position of the frame point according to the highest tracking point and the lowest tracking point within a preset distance from each frame point: adjusting each of the frame point sets in the second calculation
- the ordinate of the frame point is greater than the maximum value of the ordinate of the tracking point within the preset distance from each frame point, and is less than the sum of the maximum value of the ordinate of the tracking point and the preset parameter times the font height; adjust the second calculation of the lower frame point
- the ordinate of each frame point in the set is smaller than the minimum value of the ordinate of the tracking point within a preset distance from each frame point, and is greater than the difference between the minimum value of the ordinate of the tracking point and the preset parameter times the font height.
- the region determining unit is specifically configured to perform smoothing processing on the enclosing curve of each subregion determined by the second adjustment frame point set to obtain the second region.
- the area determining unit may respectively fit frame points in the second adjusted upper frame point set and the second adjusted lower frame point set to obtain a smooth enclosing curve to form the second area
- the second adjustment frame point set can be divided into a second adjustment upper frame point set and a second adjustment lower frame point set.
- the second adjustment upper frame point set is the frame point located in the upper half of the subregion
- the point set is the frame point located in the lower half of the subarea.
- the region determining unit when the region determining unit respectively fits the frame points in the second adjusted upper frame point set and the second adjusted lower frame point set, it may calculate the second adjusted upper frame point set and Second, adjust the linear correlation coefficient of the frame points in the lower frame point set, and determine the fitting method to be adopted according to the value of the linear correlation coefficient:
- the region determining unit may calculate the Pearson correlation coefficient of the frame points in the second adjusted upper frame point set and the second adjusted lower frame point set. If the linear correlation is strong (for example, the correlation coefficient is greater than 0.8), the The area determining unit may determine to adopt a linear fitting; if the linear correlation is weak, the area determining unit may determine to adopt a higher order fitting such as quadratic.
- the electronic device may further include an intermediate quantity storage module, which is used to store the intermediate quantity for calculating the linear correlation coefficient (for example, the intermediate quantity when calculating the Pearson correlation coefficient) for other subsequent calculations.
- an intermediate quantity storage module which is used to store the intermediate quantity for calculating the linear correlation coefficient (for example, the intermediate quantity when calculating the Pearson correlation coefficient) for other subsequent calculations.
- the relevant intermediate quantity needs to be used in the process, the saved intermediate quantity can be used directly, thereby reducing the amount of calculation.
- the frame point expansion module specifically includes:
- the straight line text box point expansion unit is used for when the number of box points in the first initial box point set is equal to 4, take the points with evenly spaced abscissas on the upper and lower sides of the rectangle determined by the 4 box points as the new box point , Compose the first set of extended frame points;
- the curved text box point expansion unit is used to fit the box points in the first initial upper box point set and the first initial lower box point set when the number of box points in the first initial box point set is greater than 4 ,
- the upper fitting curve and the lower fitting curve are obtained, wherein the first initial upper frame point set includes the frame points located in the upper half of the first text line in the first initial frame point set, and the first initial lower frame point
- the frame point set includes the frame points located in the lower half of the first text line in the first initial frame point set; respectively take the points with evenly spaced abscissas on the upper fitting curve and the lower fitting curve as the new Frame points form the first set of extended frame points.
- the electronic device may further include:
- Tracking calculation frame point module used to determine the position of each frame point in the third adjustment frame point set in the third video frame according to the position of the third tracking point set relative to each tracking point in the second tracking point set, to obtain the first Three calculation frame point sets, where the third tracking point set includes the tracking points predicted in the third video frame that correspond to the tracking points in the second tracking point set, and the third video frame is in the A video frame obtained after the second video frame;
- Tracking adjustment frame point module used to adjust the position of each frame point in the third calculation frame point set to obtain a third adjustment frame point set, so that each sub-region determined by the third adjustment frame point set completely surrounds the third tracking Each tracking point in the point set;
- the tracking area determination module is used for smoothing the surrounding curve of each sub-area determined by the third adjustment frame point set to obtain a third area.
- the third area is the determined first text line in the third video frame In the location.
- the tracking calculation frame point module, the tracking adjustment frame point module, and the tracking area determination module can track and determine that the first text line is in the fourth video frame, the fifth video frame, the sixth video frame, etc. The position in the video frame.
- the electronic device may further include:
- Cache maintenance module used to maintain a fixed length of a preset number of video frames from the first video frame, the cache is used to store the newly generated video when the result of OCR identifying the first video frame is not returned frame.
- the buffer maintenance module deletes an old video frame every time a new video frame is added to the buffer, where adjacent ones stored in the buffer The difference in the acquisition time of the video frame is less than the preset interval time.
- the buffer maintenance module deletes an old video frame every time a new video frame is added to the buffer, so that the remaining adjacent frames in the buffer Keep the interval as even as possible.
- the embodiments of the present application provide a chip that is applied to an electronic device.
- the chip includes one or more processors for invoking computer instructions to make the electronic device execute the first video frame.
- OCR detection obtains the frame points anchoring the position of each text line, which includes at least a first initial frame point set, and the first initial frame point set contains the frame identified by OCR for anchoring the position of the first text line Point, the first text line is any text line in the first video frame, the number of frame points in the first initial frame point set is not less than 4;
- the first extended frame point is determined according to the first initial frame point set Set, the first extended frame point set frames the first text line in N continuous and uniformly-width sub-areas, where N is a positive integer not less than 2; according to the second tracking point set relative to the first tracking point
- the position of each tracking point in the set is determined, the position of each frame point in the first extended frame point set in the second video frame is determined, and a second calculated frame point set is obtained, where the first tracking
- the second video frame is a video frame obtained after the first video frame; the second area is determined according to the second set of calculation frame points, and the second area is the determined first text line in The position in the second video frame.
- the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute: adjust the positions of each frame point in the second set of calculation frame points to obtain the first 2. A set of adjustment frame points, so that each sub-region determined by the second set of adjustment frame points completely surrounds each tracking point in the second set of tracking points; and the second area is determined according to the second set of adjustment frame points.
- the frame point position may be adjusted as a whole according to the highest tracking point and the lowest tracking point in the second tracking point set:
- the ordinate of each frame point in the frame point set under the second calculation is adjusted to be smaller than the minimum value of the ordinate of the tracking point in the second tracking point set, and greater than the difference between the minimum value of the ordinate of the tracking point and the preset parameter times the font height.
- the position of the frame point may be adjusted according to the highest tracking point and the lowest tracking point within a preset distance range from each frame point:
- the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to execute: perform the enclosing curve of each sub-region determined by the second set of adjustment frame points Smooth processing to obtain the second area.
- the frame points in the second set of adjusted upper frame points and the second set of adjusted lower frame points may be respectively fitted to obtain a smooth enclosing curve to form the second area, where the second The adjustment frame point set can be divided into the second adjustment upper frame point set and the second adjustment lower frame point set.
- the second adjustment upper frame point set is the frame point located in the upper half of the sub-region
- the second adjustment lower frame point set is the The frame point of the lower half of the subarea.
- the second adjusted upper frame point set and the second adjusted lower frame point set may be calculated respectively.
- the linear correlation coefficient of the frame points in the frame point set is determined according to the value of the linear correlation coefficient:
- the Pearson correlation coefficients of the frame points in the second adjusted upper frame point set and the second adjusted lower frame point set can be calculated. If the linear correlation is strong (for example, the correlation coefficient is greater than 0.8), the linear quasi- If the linear correlation is weak, it can be determined to adopt a higher-order fitting such as quadratic.
- the intermediate quantity for calculating the linear correlation coefficient (for example, the intermediate quantity when calculating the Pearson correlation coefficient) can be saved, so that it can be used directly when the relevant intermediate quantity needs to be used in other subsequent calculation processes.
- the amount of the middle thereby reducing the amount of calculation.
- the one or more processors are specifically configured to invoke the computer instructions to make the electronic device execute: when the number of frame points in the first initial frame point set is equal to 4. Take the points with evenly spaced abscissas on the upper and lower sides of the rectangle determined by the 4 frame points as the new frame point to form the first extended frame point set;
- the frame points in the first initial upper frame point set and the first initial lower frame point set are respectively fitted to obtain the upper fitting curve and the lower fitting curve.
- a composite curve wherein the first initial upper frame point set includes the frame point located in the upper half of the first text line in the first initial frame point set, and the first initial lower frame point set includes the first initial The frame points located in the lower half of the first text line in the frame point set; respectively take points with evenly spaced abscissas on the upper fitting curve and the lower fitting curve as the new frame points to form the first extended frame Point collection.
- the one or more processors are further configured to invoke the computer instructions to make the electronic device execute: according to the third tracking point set relative to each tracking point in the second tracking point set The position of each frame point in the third adjustment frame point set in the third video frame is determined to obtain the third calculation frame point set, where the third tracking point set includes the prediction in the third video frame
- the tracking point corresponding to the tracking point in the second tracking point set, the third video frame is a video frame obtained after the second video frame; adjusting the position of each frame point in the third calculation frame point set , Obtain the third adjustment frame point set, so that each sub-region determined by the third adjustment frame point set completely surrounds each tracking point in the third tracking point set; surround each sub-region determined by the third adjustment frame point set
- the curve is smoothed to obtain a third area, and the third area is the determined position of the first text line in the third video frame.
- the electronic device can track and determine the position of the first text line in subsequent video frames such as the fourth video frame, the fifth video frame, and the sixth video frame.
- the one or more processors are also used to call the computer instructions to make the electronic device execute: starting from the first video frame, maintain a fixed length of a preset number of frames
- the buffer of the video frame is used to store the newly generated video frame when the result of the OCR identifying the first video frame is not returned.
- the number of video frames stored in the buffer is equal to the preset number of frames
- an old video frame is deleted every time a new video frame is added to the buffer, wherein the number of adjacent video frames stored in the buffer is The difference in the acquisition time is less than the preset interval time.
- the embodiments of the present application provide a computer program product containing instructions.
- OCR detection is performed on the first video frame to obtain anchor texts
- the frame point of the position of the line which includes at least a first initial frame point set
- the first initial frame point set includes the frame point identified by OCR for anchoring the position of the first text line
- the first text line is the
- the number of frame points in the first initial frame point set is not less than 4
- the first extended frame point set is determined, and the first extended frame point
- N is a positive integer not less than 2
- the above-mentioned electronic device when the above-mentioned computer program product runs on an electronic device, the above-mentioned electronic device is specifically caused to execute: adjust the position of each frame point in the second set of calculation frame points to obtain a second adjustment frame Point set, so that each sub-area determined by the second set of adjustment frame points completely surrounds each tracking point in the second set of tracking points; the second area is determined according to the second set of adjustment frame points.
- the frame point position may be adjusted as a whole according to the highest tracking point and the lowest tracking point in the second tracking point set:
- the ordinate of each frame point in the frame point set under the second calculation is adjusted to be less than the minimum value of the ordinate of the tracking point in the second tracking point set, and greater than the difference between the minimum value of the ordinate of the tracking point and the preset parameter times the font height.
- the position of the frame point may be adjusted according to the highest tracking point and the lowest tracking point within a preset distance range from each frame point:
- the above-mentioned electronic device when the above-mentioned computer program product runs on an electronic device, the above-mentioned electronic device is specifically caused to execute: smoothing the encircling curve of each sub-region determined by the second set of adjustment frame points, Get the second area.
- the frame points in the second set of adjusted upper frame points and the second set of adjusted lower frame points may be respectively fitted to obtain a smooth enclosing curve to form the second area, where the second The adjustment frame point set can be divided into the second adjustment upper frame point set and the second adjustment lower frame point set.
- the second adjustment upper frame point set is the frame point located in the upper half of the sub-region
- the second adjustment lower frame point set is the The frame point of the lower half of the subarea.
- the second adjusted upper frame point set and the second adjusted lower frame point set may be calculated respectively.
- the linear correlation coefficient of the frame points in the frame point set is determined according to the value of the linear correlation coefficient:
- the Pearson correlation coefficients of the frame points in the second adjusted upper frame point set and the second adjusted lower frame point set can be calculated. If the linear correlation is strong (for example, the correlation coefficient is greater than 0.8), the linear quasi- If the linear correlation is weak, it can be determined to adopt a higher-order fitting such as quadratic.
- the electronic device when the computer program product is running on an electronic device, the electronic device is caused to save an intermediate amount for calculating the linear correlation coefficient (for example, the intermediate amount when calculating the Pearson correlation coefficient) for use in
- the relevant intermediate quantity needs to be used in other subsequent calculation processes, the saved intermediate quantity can be used directly, thereby reducing the amount of calculation.
- the above-mentioned electronic device when the above-mentioned computer program product is run on an electronic device, the above-mentioned electronic device is specifically caused to execute: when the number of frame points in the first initial frame point set is equal to 4, in 4 The points with evenly spaced abscissas on the upper and lower sides of the rectangle determined by the frame points are taken as the new frame points to form the first extended frame point set;
- the frame points in the first initial upper frame point set and the first initial lower frame point set are respectively fitted to obtain the upper fitting curve and the lower fitting curve.
- a composite curve wherein the first initial upper frame point set includes the frame point located in the upper half of the first text line in the first initial frame point set, and the first initial lower frame point set includes the first initial The frame points located in the lower half of the first text line in the frame point set; respectively take points with evenly spaced abscissas on the upper fitting curve and the lower fitting curve as the new frame points to form the first extended frame Point collection.
- the electronic device when the computer program product is running on the electronic device, the electronic device is also caused to execute: according to the position of the third tracking point set relative to the tracking points in the second tracking point set, Determine the position of each frame point in the third video frame in the third adjustment frame point set to obtain a third calculation frame point set, wherein the third tracking point set includes the predicted and the third frame point set in the third video frame.
- the tracking point corresponding to the location of the tracking point in the second tracking point set, the third video frame is a video frame obtained after the second video frame; the position of each frame point in the third calculation frame point set is adjusted to obtain the first Three adjustment frame point sets, so that each sub-region determined by the third adjustment frame point set completely surrounds each tracking point in the third tracking point set; smoothing the encircling curve of each sub-region determined by the third adjustment frame point set Through processing, a third area is obtained, and the third area is the determined position of the first text line in the third video frame.
- the electronic device can track and determine the position of the first text line in subsequent video frames such as the fourth video frame, the fifth video frame, and the sixth video frame.
- the above-mentioned electronic device when the above-mentioned computer program product runs on the electronic device, the above-mentioned electronic device is also caused to execute: starting from the first video frame, maintain a fixed length of a preset number of video frames Cache, the cache is used to store the newly generated video frame when the result of the OCR identifying the first video frame is not returned.
- the number of video frames stored in the buffer is equal to the preset number of frames
- an old video frame is deleted every time a new video frame is added to the buffer, wherein the number of adjacent video frames stored in the buffer is The difference in the acquisition time is less than the preset interval time.
- an embodiment of the present application provides a computer-readable storage medium, including instructions, when the instructions are executed on an electronic device, the electronic device is caused to execute: OCR detection is performed on the first video frame to obtain anchor texts
- the frame point of the position of the line at least includes a first initial frame point set, and the first initial frame point set contains frame points recognized by OCR for anchoring the position of the first text line, and the first text line is the
- the number of frame points in the first initial frame point set is not less than 4; according to the first initial frame point set, the first extended frame point set is determined, and the first extended frame point
- the above-mentioned electronic device when the above-mentioned instruction is executed on the electronic device, the above-mentioned electronic device is specifically caused to execute: adjust the position of each frame point in the second set of calculated frame points to obtain a second set of adjusted frame points , So that each sub-area determined by the second set of adjustment frame points completely surrounds each tracking point in the second set of tracking points; the second area is determined according to the second set of adjustment frame points.
- the frame point position may be adjusted as a whole according to the highest tracking point and the lowest tracking point in the second tracking point set:
- the ordinate of each frame point in the frame point set under the second calculation is adjusted to be smaller than the minimum value of the ordinate of the tracking point in the second tracking point set, and greater than the difference between the minimum value of the ordinate of the tracking point and the preset parameter times the font height.
- the position of the frame point may be adjusted according to the highest tracking point and the lowest tracking point within a preset distance range from each frame point:
- the above-mentioned electronic device when the above-mentioned instruction is executed on the electronic device, the above-mentioned electronic device is specifically caused to execute: smoothing the enclosing curve of each sub-region determined by the second adjustment frame point set to obtain the The second area.
- the frame points in the second set of adjusted upper frame points and the second set of adjusted lower frame points may be respectively fitted to obtain a smooth enclosing curve to form the second area, where the second The adjustment frame point set can be divided into the second adjustment upper frame point set and the second adjustment lower frame point set.
- the second adjustment upper frame point set is the frame point located in the upper half of the sub-region
- the second adjustment lower frame point set is the The frame point of the lower half of the subarea.
- the second adjusted upper frame point set and the second adjusted lower frame point set may be calculated respectively.
- the linear correlation coefficient of the frame points in the frame point set is determined according to the value of the linear correlation coefficient:
- the Pearson correlation coefficients of the frame points in the second adjusted upper frame point set and the second adjusted lower frame point set can be calculated. If the linear correlation is strong (for example, the correlation coefficient is greater than 0.8), the linear quasi- If the linear correlation is weak, it can be determined to adopt a higher-order fitting such as quadratic.
- the above-mentioned electronic device when the above-mentioned instructions are executed on the electronic device, the above-mentioned electronic device is caused to save the intermediate amount for calculating the linear correlation coefficient (for example, the intermediate amount when calculating the Pearson correlation coefficient) for other subsequent calculations.
- the intermediate amount for example, the intermediate amount when calculating the Pearson correlation coefficient
- the saved intermediate quantity can be used directly, thereby reducing the amount of calculation.
- the above-mentioned electronic device when the above-mentioned instruction is executed on the electronic device, the above-mentioned electronic device is specifically caused to execute: when the number of frame points in the first initial frame point set is equal to 4, in 4 frame points Take the points with evenly spaced abscissas on the upper and lower sides of the determined rectangle as the new frame point to form the first extended frame point set;
- the frame points in the first initial upper frame point set and the first initial lower frame point set are respectively fitted to obtain the upper fitting curve and the lower fitting curve.
- a composite curve wherein the first initial upper frame point set includes the frame point located in the upper half of the first text line in the first initial frame point set, and the first initial lower frame point set includes the first initial The frame points located in the lower half of the first text line in the frame point set; respectively take points with evenly spaced abscissas on the upper fitting curve and the lower fitting curve as the new frame points to form the first extended frame Point collection.
- the above-mentioned electronic device when the above-mentioned instruction is executed on the electronic device, the above-mentioned electronic device is also caused to execute: according to the position of the third tracking point set relative to each tracking point in the second tracking point set, the second tracking point set is determined. Three adjustments are made to the position of each frame point in the third video frame in the frame point set to obtain a third calculation frame point set, wherein the third tracking point set includes the predicted frame point in the third video frame and the second frame point set.
- the tracking point corresponding to the tracking point in the tracking point set, the third video frame is a video frame obtained after the second video frame; adjusting the position of each frame point in the third calculation frame point set to obtain the third adjustment The frame point set, so that each sub-region determined by the third adjustment frame point set completely surrounds each tracking point in the third tracking point set; smoothing the encircling curve of each sub-region determined by the third adjustment frame point set, A third area is obtained, and the third area is the determined position of the first text line in the third video frame.
- the electronic device can track and determine the position of the first text line in subsequent video frames such as the fourth video frame, the fifth video frame, and the sixth video frame.
- the above-mentioned electronic device when the above-mentioned instructions are executed on the electronic device, the above-mentioned electronic device is also caused to execute: starting from the first video frame, maintaining a buffer with a fixed length of a preset number of video frames, The buffer is used to store the newly generated video frame when the result of the OCR identifying the first video frame is not returned.
- the number of video frames stored in the buffer is equal to the preset number of frames
- an old video frame is deleted every time a new video frame is added to the buffer, wherein the number of adjacent video frames stored in the buffer is The difference in the acquisition time is less than the preset interval time.
- the electronic equipment provided in the second aspect, the electronic equipment provided in the third aspect, the chip provided in the fourth aspect, the computer program product provided in the fifth aspect, and the computer storage medium provided in the sixth aspect are all used to execute the present invention. Apply the method provided in the example. Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding method, which will not be repeated here.
- Fig. 1 is a schematic diagram of the inclination angle change of an oblique rectangle in the prior art
- FIG. 2 is a schematic diagram of a scene in the prior art where an oblique rectangle is used to determine the position of a curved text;
- FIG. 3 is a schematic diagram of a scene in the prior art where an oblique rectangle is used to determine the position of a deformed text
- FIG. 4 is a schematic diagram of a scene in which the video text tracking method in an embodiment of the present application is used to determine the position of a curved text;
- FIG. 5 is a schematic diagram of a scene in which the video text tracking method in an embodiment of the present application is used to determine the position of a deformed text
- FIG. 6 is a schematic flowchart of a video text tracking method in an embodiment of the present application.
- FIG. 7 is a schematic diagram of an exemplary scenario of OCR detection and determination of frame points in an embodiment of the present application.
- FIG. 8 is a schematic diagram of an exemplary scenario for determining a uniform subregion of a framed text line in an embodiment of the present application
- FIG. 9 is a schematic diagram of an exemplary scene for determining the position of a frame point in a second video frame in an embodiment of the present application.
- FIG. 10 is a schematic diagram of an exemplary scenario for adjusting the position of a frame point in an embodiment of the present application.
- FIG. 11 is a schematic diagram of an exemplary scene of smoothing processing of a sub-region surrounding curve in an embodiment of the present application.
- FIG. 12 is a schematic diagram of a scene of buffered video frame scheduling in an embodiment of the present application.
- FIG. 13 is a schematic diagram of an exemplary structure of an electronic device in an embodiment of the present application.
- Fig. 14 is a block diagram of an exemplary software structure of an electronic device in an embodiment of the present application.
- first and second are only used for descriptive purposes, and cannot be understood as implying or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, “multiple” The meaning is two or more.
- OCR generally refers to the process in which electronic devices check the characters printed on paper, determine their shape by detecting dark and light patterns, and then use character recognition methods to translate the shapes into computer text.
- the frame point refers to the vertex of the rectangular frame generated during the OCR recognition process and used to frame the position of the text line. As shown in Figure 7, the area determined by the frame point detected by the OCR can frame the position of the text line.
- Tracking points may also be referred to as corner points, feature points, etc. in the embodiments of the present application.
- Corner detection is a method used to obtain image features in computer vision systems, and is widely used in motion detection, image matching, video tracking, three-dimensional modeling, and target recognition.
- corner detection methods detect image points with specific features. These feature points have specific coordinates in the image and have certain mathematical features, such as local maximum or minimum gray levels, and certain gradients. Features etc.
- the position change mode between video frames can be determined, thereby determining the possible positions of other points in the video frame.
- the projection transformation matrix between the two video frames can be calculated by the position changes of the corresponding tracking points in the video frame A and the video frame B, and then the coordinates of a certain point in the video frame A can be substituted into the projection transformation matrix, namely The approximate coordinates of this point in the video frame B can be calculated.
- the electronic device splits the text line area into sub-areas, tracks each sub-areas and then processes them to connect them into a new text line, which is not only compatible with straight text (the center point of the text is on a straight line) or curved
- the text scene also has a good tracking effect for text lines that exhibit deformation properties, and can accurately track the position of the predicted text line.
- FIG. 4 it is a schematic diagram of a scene in which the video text tracking method in an embodiment of the present application is used to determine the position of a curved text.
- FIG. 5 it is a schematic diagram of a scene in which the video text tracking method in the embodiment of the present application is used to determine the position of the deformed text.
- FIG. 6 a schematic flow diagram of a video text tracking method in an embodiment of this application:
- FIG. 7 it is a schematic diagram of an exemplary scenario of OCR detection and determination of frame points in an embodiment of this application.
- the frame points anchoring the position of each text line can be obtained. Taking any one of the text lines as the first text line, the obtained frame points include at least one used to anchor the first text line.
- the frame point of the text line position is referred to as the first initial frame point set in the embodiment of the present application.
- the number of frame points in the first initial frame point set is a multiple of 2 and not less than 4.
- the first area can be enclosed by the line of each frame point in the first initial frame point set, and the first text line is located in the first area.
- the number of frame points in the first initial frame point set is 4; if the first text line is curved text, the number of frame points in the first initial frame point set is greater than 4 And is a multiple of 2.
- the first video frame is a video frame.
- the first video frame may be a video frame during video shooting, or may be a video frame during video playback, which is not limited here.
- the first video frame may be the first video frame obtained after the lens is stabilized during video shooting.
- S602. Determine a first extended frame point set according to the first initial frame point set, and the first extended frame point set frames the first text line in N continuous and uniformly-width sub-areas, where N is not less than 2. Positive integer;
- FIG. 8 it is a schematic diagram of an exemplary scene for determining a uniform subregion of a framed text line in an embodiment of the present application.
- the first initial frame point set determines the first area where the first text line is located, and the first initial frame point set can divide the first area into a plurality of continuous and irregular quadrilateral sub-areas.
- the process of determining the first set of extended frame points is the process of determining the N continuous and uniform subregions of the first text line.
- the process of determining the first extension frame point set will be described in detail below. Because the number of frame points in the first initial framed set is a multiple of 2 and not less than 4. According to whether the number of frame points in the first initial frame point set is 4, the process of determining the first extended frame point set is different:
- the number of frame points in the first initial frame point set is equal to 4.
- the number of frame points in the first initial frame point set is equal to 4, which means that the first text line is a straight line text. Therefore, an inclined rectangle with a frame point number of 4 can be used to anchor the position of the first text line. At this time, it is only necessary to take points with evenly spaced abscissas on the upper and lower sides of the rectangle, and use the obtained points with evenly spaced abscissas as new frame points to form the first extended frame point set.
- the number of frame points in the first initial frame point set is greater than 4, which means that the first text line is curved text, so frame points with a number greater than 4 are used to anchor the position of the first text line.
- the frame points in the first initial frame point set can be divided into a first initial upper frame point set and a first initial lower frame point set, wherein the first initial upper frame point set includes the first initial frame point set located in the The frame point in the upper half of the first text line, and the first initial lower frame point set includes the frame points located in the lower half of the first text line in the first initial frame point set.
- the N subregions formed by the frame points in the first extended frame point set are continuous and uniform in width, and frame the first text line.
- the abscissa interval between the new frame points can be determined according to the actual situation, as long as the final determined sub-region can be continuous And the width is uniform, for example, it can be determined according to the total length of the text line and/or the width of the font in the text line, which is not limited here.
- the abscissa interval between the new frame points may be approximately twice the width of the font in the text line.
- S603 Determine the position of each frame point in the first extended frame point set in the second video frame according to the position of the second tracking point set relative to each tracking point in the first tracking point set to obtain a second calculation frame point set.
- the first tracking point set includes tracking points in the sub-regions determined by the first extended frame point set in the first video frame
- the second tracking point set includes the tracking points predicted in the second video frame and the The tracking point corresponding to the tracking point in the first tracking point set
- the second video frame is a video frame obtained after the first video frame;
- FIG. 9 it is a schematic diagram of an exemplary scene in which the position of the frame point in the second video frame is determined in this embodiment of the application.
- the electronic device determines the first set of extended frame points, it can determine the first set of tracking points, the first set of tracking points includes tracking in the sub-region determined by the first set of extended frame points in the first video frame point.
- the electronic device may use tracking point (key point) detection technology such as corner points to determine a certain number of tracking points in the sub-region determined by the first set of extended frame points to form the first set of tracking points.
- tracking point key point detection technology
- corner points to determine a certain number of tracking points in the sub-region determined by the first set of extended frame points to form the first set of tracking points.
- the electronic device also determines the tracking points in the sub-regions that frame other text lines in the first video frame.
- the electronic device can determine a second tracking point set according to the first tracking point set, and the second tracking point set includes the predicted position in the second video frame corresponding to the tracking point in the first tracking point set.
- Tracking point, the second video frame is a video frame obtained after the first video frame.
- the electronic device can predict the positions of some tracking points in the first tracking point set of the first video frame in the second video frame as the second tracking point set.
- the electronic device can determine the position of each frame point in the first extended frame point set in the second video frame to obtain the second calculation frame Point collection.
- the electronic device may obtain the projection change matrix from the first video frame to the second video frame according to the positional relationship of the tracking points at the corresponding positions in the first tracking point set and the second tracking point set;
- the positions of the frame points in the first extended frame point set in the second video frame are calculated, and the calculated frame points are used as the second calculated frame point set.
- the number of tracking points in the second tracking point set is generally less than or equal to the number in the first tracking point set.
- FIG. 10 it is a schematic diagram of an exemplary scenario for adjusting the position of the frame point in an embodiment of the application.
- some of the frame points in the second calculation frame point set calculated according to the projection change matrix and the first extended frame point set may be farther away and may not necessarily be in the first video frame.
- Corresponding positions in, the positions of the frame points need to be adjusted, so that the sub-regions determined by the frame points can completely surround each tracking point in the second tracking point set as shown in FIG. 10(b).
- the description is: according to the maximum and minimum values of the ordinate of the tracking point in the second tracking point set Adjust the ordinate of the frame point as a whole.
- the second calculation frame point set can be divided into a second calculation frame point set and a second calculation frame point set, where the second calculation frame point set includes the second calculation frame point set located on the sub-region Half of the frame points; the second calculation lower frame point set includes the frame points located in the lower half of the sub-region in the second calculation frame point set.
- the ordinate of each frame point in the second set of frame points can be adjusted to be greater than the maximum value of the ordinate of the tracking point in the second tracking point set, and less than the sum of the maximum value of the ordinate of the tracking point and the preset parameter times the font height ;
- the ordinate of each frame point in the frame point set under the second calculation can be adjusted to be smaller than the minimum value of the ordinate of the tracking point in the second tracking point set, and greater than the difference between the minimum value of the ordinate of the tracking point and the preset parameter times the font height .
- the frame point does not need to be adjusted; if the ordinate of the frame point is not within the range, the frame point ordinate can be adjusted to the range by the minimum moving distance.
- the preset parameter can be set to 0.5.
- the ordinate of each frame point in the set of frame points on the second calculation can be adjusted to be greater than the maximum value of the ordinate of the tracking point within the preset distance from each frame point, and smaller than the maximum value of the ordinate of the tracking point and the preset distance.
- the ordinate of each frame point in the set of frame points under the second calculation can be adjusted to be less than the minimum value of the ordinate of the tracking point within the preset distance from each frame point, and greater than the minimum value of the ordinate of the tracking point and multiple of the preset parameters The difference in font height.
- the frame point does not need to be adjusted; if the vertical coordinate of the frame point is not within this range, the vertical coordinate of the frame point can be adjusted to this range through the minimum moving distance.
- the preset parameter can be set to 0.5, and the preset distance can be set to 1 font width.
- S605 Perform smoothing processing on the enclosing curve of each sub-region determined by the second set of adjustment frame points to obtain a second region, where the second region is the position of the first text line in the second video frame determined by the electronic device.
- FIG. 11 it is a schematic diagram of an exemplary scene of smoothing the sub-region surrounding curve in an embodiment of the present application.
- the obtained second adjustment frame point set can be divided into a second adjustment upper frame point set and a second adjustment lower frame point set, where the second adjustment upper frame point set is the frame point located in the upper half of the sub-region, and the second adjustment The set of lower frame points is the frame points located in the lower half of the subarea.
- the electronic device can respectively fit the frame points in the second set of adjusted upper frame points and the second set of adjusted lower frame points to obtain a smooth enclosing curve to form a second area, which is the second area determined by the electronic device.
- the electronic device when the electronic device respectively fits the frame points in the second adjusted upper frame point set and the second adjusted lower frame point set, it may calculate the second adjusted upper frame point set and the second adjusted lower frame point set respectively.
- the linear correlation coefficient of the frame point is determined by the fitting method according to the value of the linear correlation coefficient:
- the electronic device can calculate the Pearson correlation coefficients of the frame points in the second adjusted upper frame point set and the second adjusted lower frame point set. If the linear correlation is weak, it can be determined to adopt a higher-order fitting such as quadratic.
- the purpose of fitting is to perform smoothing processing, so that each sub-region can continue to maintain a continuous nature, and the surrounding curve of the text line can be kept smooth to prevent the appearance of jaggedness.
- the electronic device can save the intermediate amount for calculating the linear correlation coefficient (for example, the intermediate amount when calculating the Pearson correlation coefficient), so that the saved intermediate amount can be used directly when the relevant intermediate amount is needed in other subsequent calculation processes, thereby Reduce the amount of calculation.
- the linear correlation coefficient for example, the intermediate amount when calculating the Pearson correlation coefficient
- the second area is determined to be the position of the first text line in the second video frame, in the OCR
- the translation result can be backfilled into the second area. So as to achieve the effect of directly displaying the translation in the corresponding position of the text in the video.
- the electronic device in the embodiment of the present application determines to frame the first text line in the first of N continuous and uniform width sub-regions. Expand the set of frame points, and then determine the corresponding positions of the frame points in the first set of extended frame points in the second video frame through tracking to form a second set of calculated frame points. The positions of the frame points in the second set of calculated frame points are adjusted to obtain the second set of adjustment frame points to completely enclose the tracking points, and the encircling curves of the sub-regions determined by the second set of adjustment frame points are smoothed to obtain Determine the second area of the position of the first text line in the second video frame.
- the first video frame can be the first video frame obtained after the lens is stabilized during real-time shooting and framing, or the first video frame in the video during video caption translation, or the first video frame obtained after the text content in the video changes.
- a video frame is not limited here.
- FIG. 12 is a schematic diagram of a scene of buffered video frame scheduling in an embodiment of the application.
- the electronic device can maintain a fixed-length buffer of a preset number of video frames from the first video frame.
- the buffer is used to identify the result of the first video frame in OCR. Store the newly generated video frame when returning.
- the preset number of frames can be 10, then the buffer stores within 10 video frames after the first video frame. After the result of the OCR identifying the first video frame is returned, these can be stored in the buffer The video frame starts to track and catch up with the latest video frame.
- the buffer can store up to a preset number of video frames, if the buffer is full, an old video frame must be deleted every time a new video frame is added.
- the deletion strategy is to make the interval between the remaining adjacent frames in the buffer Try to keep it even.
- 10 video frames have been stored in the current buffer, they are numbered [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].
- the eleventh video frame is generated, the second video frame can be deleted, and the eleventh video frame is stored in, becoming [1, 3, 4, 5, 6, 7, 8, 9, 10, 11].
- the 12th video frame is generated, the 4th video frame can be deleted, and the 12th video frame is stored in, becoming [1, 3, 5, 6, 7, 8, 9, 10, 11, 12].
- the 13th video frame is generated, the 6th video frame can be deleted, and the 13th video frame is stored in, becoming [1, 3, 5, 7, 8, 9, 10, 11, 12, 13].
- the 8th video frame can be deleted, and the 14th video frame is stored in as [1, 3, 5, 7, 9, 10, 11, 12, 13, 14].
- the 15th video frame is generated, the 10th video frame can be deleted, and the 15th video frame is stored in, becoming [1, 3, 5, 7, 9, 11, 12, 13, 14, 15].
- the 16th video frame is generated, the 12th video frame can be deleted, and the 16th video frame is stored in, becoming [1, 3, 5, 7, 9, 11, 13, 14, 15, 16].
- the 17th video frame is generated, the 14th video frame can be deleted, and the 17th video frame can be stored in as [1, 3, 5, 7, 9, 11, 13, 15, 16, 17].
- the 16th video frame can be deleted, and the 18th video frame is stored in, becoming [1, 3, 5, 7, 9, 11, 13, 15, 17, 18].
- the 19th video frame is generated, the 17th video frame can be deleted, and the 19th video frame is stored in, becoming [1, 3, 5, 7, 9, 11, 13, 15, 18, 19].
- the 20th video frame is generated, the 19th video frame can be deleted, and the 20th video frame is stored in, becoming [1, 3, 5, 7, 9, 11, 13, 15, 18, 20].
- the purpose of maintaining a fixed-length cache is to prevent too much text in the first video frame, and OCR recognition takes too long. Limiting the size of the cache can shorten the time for the cache to "catch up" with the latest video frame, thereby shortening the time for users to wait for the result to be returned. To enhance the experience.
- the video frame for OCR is the first video frame in the above-mentioned embodiment 1
- the first video frame in the buffer is the second video frame in the above-mentioned embodiment 1
- the second video frame in the buffer is the third video frame.
- the second adjustment corresponding to the first text line in the second video frame has been determined The set of frame points and the second set of tracking points.
- the position of the first text line in the third video frame and the third adjustment frame point set corresponding to the first text line in the third video frame can be determined Set with the third tracking point, the steps can be as follows:
- the third tracking point set includes the tracking point predicted in the third video frame and the tracking point corresponding to the tracking point in the second tracking point set, and the third video frame is a video obtained after the second video frame frame;
- steps 1-3 The specific execution process in steps 1-3 is similar to steps S603 to S605, and will not be repeated here.
- the position of the first text line can be determined in each subsequent video frame, and the OCR recognition result or the translation result can be backfilled into the position until there is text
- the line moves out of the field of view, is blocked by other objects, or the text content of the video is changed, so that the proportion of the number of tracking points in the corresponding position in the two adjacent frames (the number of tracking points in the video frame is relative to the first video for OCR
- the ratio of the number of tracking points in the frame is less than the threshold of the tracking point ratio, it is considered that the tracking has failed this time, and the OCR is performed again when the shot is stabilized or the video text is updated to start another tracking process.
- the following describes an exemplary electronic device 100 provided by an embodiment of the present application.
- FIG. 13 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
- the electronic device 100 may have more or fewer components than shown in the figure, may combine two or more components, or may have different component configurations.
- the various components shown in the figure may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
- the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2.
- Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194 and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
- SIM Subscriber identification module
- the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
- the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100.
- the electronic device 100 may include more or fewer components than those shown in the figure, or combine certain components, or split certain components, or arrange different components.
- the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
- the processor 110 may include one or more processing units.
- the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
- AP application processor
- modem processor modem processor
- GPU graphics processing unit
- image signal processor image signal processor
- ISP image signal processor
- controller memory
- video codec digital signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- the different processing units may be independent devices or integrated in one or more processors.
- the controller may be the nerve center and command center of the electronic device 100.
- the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching instructions and executing instructions.
- a memory may also be provided in the processor 110 to store instructions and data.
- the memory in the processor 110 is a cache memory.
- the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
- the processor 110 may include one or more interfaces.
- the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter/receiver (universal asynchronous) interface.
- I2C integrated circuit
- I2S integrated circuit built-in audio
- PCM pulse code modulation
- PCM pulse code modulation
- UART universal asynchronous transmitter/receiver
- MIPI mobile industry processor interface
- GPIO general-purpose input/output
- SIM subscriber identity module
- USB Universal Serial Bus
- the I2C interface is a bidirectional synchronous serial bus, which includes a serial data line (SDA) and a serial clock line (SCL).
- the processor 110 may include multiple sets of I2C buses.
- the processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively through different I2C bus interfaces.
- the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to realize the touch function of the electronic device 100.
- the I2S interface can be used for audio communication.
- the processor 110 may include multiple sets of I2S buses.
- the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170.
- the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
- the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
- the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
- the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
- the UART interface is a universal serial data bus used for asynchronous communication.
- the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
- the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
- the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function.
- the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
- the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
- the MIPI interface includes a camera serial interface (camera serial interface, CSI), a display serial interface (display serial interface, DSI), and so on.
- the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
- the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
- the GPIO interface can be configured through software.
- the GPIO interface can be configured as a control signal or as a data signal.
- the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
- the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
- the SIM interface can be used to communicate with the SIM card interface 195 to realize the function of transmitting data to the SIM card or reading data in the SIM card.
- the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
- the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.
- the interface connection relationship between the modules illustrated in the embodiment of the present invention is merely a schematic description, and does not constitute a structural limitation of the electronic device 100.
- the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
- the charging management module 140 is used to receive charging input from the charger.
- the charger can be a wireless charger or a wired charger.
- the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
- the power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
- the wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
- the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
- Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
- Antenna 1 can be multiplexed as a diversity antenna for wireless LAN.
- the antenna can be used in combination with a tuning switch.
- the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100.
- the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
- the mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
- the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation via the antenna 1.
- at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
- at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
- the modem processor may include a modulator and a demodulator.
- the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
- the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
- the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
- the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
- the modem processor may be an independent device.
- the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
- the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions.
- the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
- the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
- the wireless communication module 160 may also receive a signal to be sent from the processor 110, perform frequency modulation, amplify, and convert it into electromagnetic waves to radiate through the antenna 2.
- the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
- the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
- the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
- GPS global positioning system
- GLONASS global navigation satellite system
- BDS Beidou navigation satellite system
- QZSS quasi-zenith satellite system
- SBAS satellite-based augmentation systems
- the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
- the GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor.
- the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
- the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
- the display screen 194 is used to display images, videos, and the like.
- the display screen 194 includes a display panel.
- the display panel can use liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
- LCD liquid crystal display
- OLED organic light-emitting diode
- active-matrix organic light-emitting diode active-matrix organic light-emitting diode
- emitting diode AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
- the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.
- the electronic device 100 can implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
- the ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
- ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
- the ISP may be provided in the camera 193.
- the camera 193 is used to capture still images or videos.
- the object generates an optical image through the lens and is projected to the photosensitive element.
- the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
- CMOS complementary metal-oxide-semiconductor
- the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
- ISP outputs digital image signals to DSP for processing.
- DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
- the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one.
- Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects the frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
- Video codecs are used to compress or decompress digital video.
- the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
- MPEG moving picture experts group
- MPEG2 MPEG2, MPEG3, MPEG4, and so on.
- NPU is a neural-network (NN) computing processor.
- NN neural-network
- applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
- the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
- the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
- the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
- the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121.
- the internal memory 121 may include a storage program area and a storage data area.
- the storage program area can store an operating system, at least one application required for a function (such as a face recognition function, a fingerprint recognition function, a mobile payment function, etc.) and so on.
- the storage data area can store data created during the use of the electronic device 100 (such as face information template data, fingerprint information template, etc.) and the like.
- the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
- UFS universal flash storage
- the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
- the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
- the audio module 170 can also be used to encode and decode audio signals.
- the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
- the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
- the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
- the receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
- the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
- the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
- the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C.
- the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
- the earphone interface 170D is used to connect wired earphones.
- the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
- OMTP open mobile terminal platform
- CTIA cellular telecommunications industry association of the USA, CTIA
- the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
- the pressure sensor 180A may be provided on the display screen 194.
- the capacitive pressure sensor may include at least two parallel plates with conductive materials.
- the electronic device 100 determines the intensity of the pressure according to the change in capacitance.
- the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
- the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
- touch operations that act on the same touch position but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
- the gyro sensor 180B may be used to determine the movement posture of the electronic device 100.
- the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
- the gyro sensor 180B can be used for image stabilization.
- the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake.
- the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
- the air pressure sensor 180C is used to measure air pressure.
- the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
- the magnetic sensor 180D includes a Hall sensor.
- the electronic device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip holster.
- the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D.
- features such as automatic unlocking of the flip cover are set.
- the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and apply to applications such as horizontal and vertical screen switching, pedometers, and so on.
- the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
- the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
- the light emitting diode may be an infrared light emitting diode.
- the electronic device 100 emits infrared light to the outside through the light emitting diode.
- the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100.
- the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
- the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
- the ambient light sensor 180L is used to sense the brightness of the ambient light.
- the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
- the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
- the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.
- the fingerprint sensor 180H is used to collect fingerprints.
- the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
- the temperature sensor 180J is used to detect temperature.
- the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
- the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature.
- the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
- Touch sensor 180K also called “touch panel”.
- the touch sensor 180K may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
- the touch sensor 180K is used to detect touch operations acting on or near it.
- the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
- the visual output related to the touch operation can be provided through the display screen 194.
- the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.
- the button 190 includes a power-on button, a volume button, and so on.
- the button 190 may be a mechanical button. It can also be a touch button.
- the electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
- the motor 191 can generate vibration prompts.
- the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
- touch operations applied to different applications can correspond to different vibration feedback effects.
- Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
- Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
- the touch vibration feedback effect can also support customization.
- the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
- the SIM card interface 195 is used to connect to the SIM card.
- the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100.
- the electronic device 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
- the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
- the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
- the SIM card interface 195 can also be compatible with different types of SIM cards.
- the SIM card interface 195 may also be compatible with external memory cards.
- the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
- FIG. 14 is a block diagram of the software structure of the electronic device 100 according to an embodiment of the present invention.
- the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
- the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
- the application layer can include a series of application packages.
- the application package may include applications (also called applications) such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
- applications also called applications
- the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
- the application framework layer includes some predefined functions.
- the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, a local profile assistant (LPA), etc.
- a window manager may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, a local profile assistant (LPA), etc.
- LPA local profile assistant
- the window manager is used to manage window programs.
- the window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.
- the content provider is used to store and retrieve data and make these data accessible to applications.
- the data may include videos, images, audios, phone calls made and received, browsing history and bookmarks, phone book, etc.
- the view system includes visual controls, such as controls that display text, controls that display pictures, and so on.
- the view system can be used to build applications.
- the display interface can be composed of one or more views.
- a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
- the phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).
- the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
- the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction.
- the notification manager is used to notify download completion, message reminders, and so on.
- the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialogue interface. For example, text messages are prompted in the status bar, prompt sounds, electronic devices vibrate, and indicator lights flash.
- Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
- the core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.
- the application layer and application framework layer run in a virtual machine.
- the virtual machine executes the java files of the application layer and the application framework layer as binary files.
- the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
- the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), two-dimensional graphics engine (for example: SGL), etc.
- the surface manager is used to manage the display subsystem, and provides a combination of two-dimensional (2-Dimensional, 2D) and three-dimensional (3-Dimensional, 3D) layers for multiple applications.
- the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
- the media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
- the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, synthesis, and layer processing.
- the 2D graphics engine is a drawing engine for 2D drawing.
- the kernel layer is the layer between hardware and software.
- the kernel layer contains at least display driver, camera driver, audio driver, sensor driver, and virtual card driver.
- the corresponding hardware interrupt is sent to the kernel layer.
- the kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, etc.).
- the original input events are stored in the kernel layer.
- the application framework layer obtains the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer.
- the camera 193 captures still images or videos.
- the term “when” can be interpreted as meaning “if" or “after” or “in response to determining" or “in response to detecting".
- the phrase “when determining" or “if detected (statement or event)” can be interpreted as meaning “if determined" or “in response to determining" or “when detected (Condition or event stated)” or “in response to detection of (condition or event stated)”.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
- the usable medium may be a magnetic medium, (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state hard disk).
- the process can be completed by a computer program instructing relevant hardware.
- the program can be stored in a computer readable storage medium. , May include the processes of the above-mentioned method embodiments.
- the aforementioned storage media include: ROM or random storage RAM, magnetic disks or optical discs and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Character Input (AREA)
- Image Analysis (AREA)
Abstract
一种视频文字跟踪方法及电子设备。在该方法中,将文本行区域拆分为各个子区域,对各子区域进行跟踪再经过处理联结成新文本行。该方法不仅可以兼容于直线文本或者弯曲文本场景,对于展现出形变性质的文本行也有很好的跟踪效果,能够准确跟踪预测文本行的位置。
Description
本申请要求于2020年02月21日提交中国专利局、申请号为202010108338.4、申请名称为“一种视频文字跟踪方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能(Artificial Intelligence,AI)领域中光学字符识别(Optical Character Recognition,OCR)子领域,尤其涉及一种视频文字跟踪方法及电子设备。
实景增强现实(Augmented Reality,AR)翻译与拍照翻译最大的不同在于:AR翻译无需先拍照再识别图片内容,而是对摄像头中的文字内容呈现实时翻译效果,只要用户将摄像头对准需要翻译的内容,它就可以在原文位置给出准确的实时翻译。实景AR翻译的整个过程完全是动态的,比起以往的拍照翻译,体验上有了跨越式升级,尤其适用于旅游、海淘购物、阅读外文文献等场景。
AR翻译的全流程涉及到OCR文字检测识别、文字跟踪(追踪)、机器翻译、AR渲染、翻译文字回填等技术。由于进行OCR耗时较长(百毫秒至秒级每视频帧),在实际拍摄场景中手机或相机的镜头移动时不可能通过逐帧进行OCR的方式得到文本行的位置,这样的方案无法满足实时性,因此在AR翻译产品中对前期OCR识别出的文字进行跟踪,预测提供文本行的位置,是实时展现翻译效果的必要保证。此外,实景AR翻译的技术也可以应用于视频字幕自动翻译回填等场景中,快速完成视频中每一帧的字幕翻译,极大地节省人力。
目前,如图1所示,为了处理取景时文本行边框与取景边框各边不平行所导致的文本行倾斜问题,每行直线文本的位置一般用一个倾斜矩形确定。当前使用较多的技术方案是:首先对镜头稳定后的第一个视频帧进行OCR,检测与识别出视频帧中的文本行位置及文字内容,其次在每个文本行区域中使用角点等关键点检测技术确定一定数目的追踪点,再采用光流等跟踪方法得到这些追踪点在下一个视频帧中的对应位置,从而可以计算出每个文本行区域在两个视频帧之间的投影变换矩阵(或单应性矩阵),将投影矩阵作用于文本行区域倾斜矩形的四个顶点即可得到文本行在下一帧中的位置,进而进行翻译文字回填;重复上述追踪过程,直到有文本行移出取景视野以外或者被其他物体遮挡等情形,使得能在相邻两帧中找出对应位置的追踪点数目比例(相对于做OCR的第一帧)小于阈值时,认为此次跟踪失败,待镜头稳定时重新进行OCR开始另一次跟踪流程。该方法的应用,使得即使文本行相对取景时倾斜角度发生变化,也能跟踪到文本行在最新的视频帧中的文本行位置,在相应位置进行回填。
然而,现有技术在处理弯曲文本的跟踪问题时有缺陷:当跟踪弯曲文本时,用于框定文本行位置的倾斜矩形中存在着大量文本区域以外的空白,如果用目标检测常用的IOU(实际文本区域与预测文本区域的面积交并比)作为衡量指标的话,虽然实际文字区域与预测区域间的相交面积可能并不小,但是使用较大的预测区域做归一化之后,衡量指标的数值一定是不甚理想的;这样的弯曲文本常常出现在图2所示的商店招牌以及视频的艺术字旁白或字幕等场景中。
其次,对于可以形变的弯曲文本,例如瓶装饮料的外包装文本,如图3所示,在拍摄角 度变化时,文本的“朝向”也会发生变化,倾斜矩形更是无法在形状上体现这样的变化。
因此,现有技术在对弯曲文本进行跟踪时,无法准确地跟踪定位视频文本行的位置。
发明内容
本申请提供了一种视频文字跟踪方法及电子设备,区别于现有技术针对全文本行进行跟踪,将文本行区域拆分为各个子区域,对各子区域进行跟踪再经过处理联结成新文本行,不仅可以兼容于直线文本(文字中心点在一条直线上)或者弯曲文本场景,对于展现出形变性质的文本行也有很好的跟踪效果,能够准确跟踪预测文本行的位置。
第一方面,本申请提供了一种视频文字跟踪方法,该方法包括:电子设备对第一视频帧进行OCR检测,得到锚定各文本行的位置的框点,其中至少包括第一初始框点集合,该第一初始框点集合中包含OCR识别出的用于锚定第一文本行的位置的框点,该第一文本行为该第一视频帧中的任一个文本行,该第一初始框点集合中框点的数目不小于4;该电子设备根据该第一初始框点集合,确定第一扩展框点集合,该第一扩展框点集合将该第一文本行框定在N个连续且宽度均匀的子区域中,该N为不小于2的正整数;该电子设备根据第二追踪点集合相对于第一追踪点集合中各追踪点的位置,确定该第一扩展框点集合中各框点在第二视频帧中的位置,得到第二计算框点集合,其中,该第一追踪点集合中包括在该第一视频帧中该第一扩展框点集合确定的子区域中的追踪点,该第二追踪点集合中包括在该第二视频帧中预测出的与该第一追踪点集合中追踪点对应位置的追踪点,该第二视频帧为在该第一视频帧之后取得的一个视频帧;该电子设备根据该第二计算框点集合,确定第二区域,该第二区域为该电子设备确定的该第一文本行在该第二视频帧中的位置。
实施第一方面提供的方法,电子设备根据OCR识别出的各文本行的框点,将各文本行拆分到N个连续且宽度均匀的子区域中,再对各个均匀的子区域进行跟踪处理确定各文本行在第二视频帧中的位置,由于跟踪处理的尺度更加精细化,且各连续的子区域既可以呈直线又可以弯曲,因此其不仅可以兼容直线文本(文字中心点在一条直线上)或者弯曲文本场景,对于展现出形变性质的文本行也有很好的跟踪效果,能够准确跟踪预测文本行的位置。
结合第一方面,在一些实施例中,该电子设备根据该第二计算框点集合,确定第二区域,包括:该电子设备调整该第二计算框点集合中各框点的位置,得到第二调整框点集合,使得该第二调整框点集合确定的各子区域完全包围该第二追踪点集合中各追踪点;该电子设备根据该第二调整框点集合,确定该第二区域。
具体的调整方式有很多:
示例性的,在一些实施例中,电子设备可以根据第二追踪点集合中最高的追踪点和最低的追踪点整体调整框点位置:
调整第二计算上框点集合中的各框点的纵坐标大于第二追踪点集合中追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
调整第二计算下框点集合中的各框点的纵坐标小于第二追踪点集合中追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
示例性的,在一些实施例中,电子设备可以根据距离各框点预设距离范围内最高的追踪点和最低的追踪点调整框点位置:
调整第二计算上框点集合中的各框点的纵坐标大于距离各框点预设距离范围内的追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
调整第二计算下框点集合中的各框点的纵坐标小于距离各框点预设距离范围内的追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
还可以有其他的调整方式,只要最终使得调整得到的第二调整框点集合确定的各子区域能完全包围第二追踪点集合中各追踪点即可,此处不作限定。
本申请实施例中,在确定第二视频帧中跟踪确定了第二计算框点集合后,电子设备可以不直接使用该第二计算框点集合,而是先对第二计算框点集合中各框点的位置进行调整,得到能够将第二追踪点集合完全包围的第二调整框点集合,再使用该第二调整框点集合来确定第一文本行在该第二视频帧中所处的第二区域,进一步提升了在第二视频帧中对第一文本行位置预测的准确性,且完全包围第二追踪点能使得在进行后续视频帧位置预测时有更多的追踪点以供使用,提升视频文字跟踪的持续性。
结合第一方面,在一些实施例中,该电子设备根据该第二调整框点集合,确定该第二区域,包括:该电子设备对该第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到该第二区域。
具体的,在一些实施例中,电子设备可以分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合,得到平滑的包围曲线,形成第二区域,其中,第二调整框点集合可以划分为第二调整上框点集合和第二调整下框点集合,第二调整上框点集合为位于子区域上半部分的框点,第二调整下框点集合为位于子区域下半部分的框点。
进一步的,在一些实施例中,电子设备分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合时,可以分别计算第二调整上框点集合和第二调整下框点集合中的框点的线性相关系数,根据线性相关系数的数值确定采用的拟合方式:
示例性的,电子设备可以计算第二调整上框点集合和第二调整下框点集合中框点的皮尔逊相关系数,如果线性相关性较强(例如相关系数大于0.8),则可以确定采用线性拟合;如果线性相关性较弱,则可以确定采用二次等更高次的拟合。
本申请实施例中,电子设备对第二调整框点集合确定的各子区域的包围曲线进行拟合,即对包围曲线进行了平滑处理,使得各个子区域能继续保持连续的性质,文本行的包围曲线能够保持平滑,防止呈现锯齿状。
可选的,在一些实施例中,电子设备可以保存计算线性相关系数的中间量(例如计算皮尔逊相关系数时的中间量),以供在后续其他计算过程中需要使用相关中间量时可以直接使用保存的中间量,从而减少计算量。
结合第一方面,在一些实施例中,该电子设备根据该第一初始框点集合,确定第一扩展框点集合,包括:当该第一初始框点集合中框点的数目等于4时,该电子设备在4个框点确定的矩形的上下两边上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合;
当该第一初始框点集合中框点的数目大于4时,该电子设备分别对第一初始上框点集合与第一初始下框点集合中的框点进行拟合,得到上拟合曲线和下拟合曲线,其中,该第一初始上框点集合中包括该第一初始框点集合中位于该第一文本行上半部分的框点,该第一初始下框点集合中包括该第一初始框点集合中位于该第一文本行下半部分的框点;该电子设备分别在该上拟合曲线和该下拟合曲线上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合。
本申请实施例中,根据框点数目的不同,电子设备可以进行不同的处理来选取横坐标间隔均匀的点作为新的框点组成第一扩展框点集合,使得视频文字跟踪能很好的同时支持直线 文本和弯曲文本。
结合第一方面,在一些实施例中,该方法还包括:该电子设备根据第三追踪点集合相对于第二追踪点集合中各追踪点的位置,确定第三调整框点集合中各框点在第三视频帧中的位置,得到第三计算框点集合,其中,该第三追踪点集合中包括在该第三视频帧中预测出的与该第二追踪点集合中追踪点对应位置的追踪点,该第三视频帧为在该第二视频帧之后取得的一个视频帧;该电子设备调整该第三计算框点集合中各框点的位置,得到第三调整框点集合,使得该第三调整框点集合确定的各子区域完全包围该第三追踪点集合中各追踪点;该电子设备对该第三调整框点集合确定的各子区域的包围曲线进行平滑处理,得到第三区域,该第三区域为该电子设备确定的该第一文本行在该第三视频帧中的位置。
可以理解的是,按照该方法依此类推,可以跟踪确定该第一文本行在第四视频帧、第五视频帧、第六视频帧等后续视频帧中的位置。
本申请实施例中,进行OCR的视频帧为第一视频帧,第一视频帧之后需要处理的下一个视频帧为第二视频帧,第二视频帧之后需要处理的下一个视频帧为第三视频帧,第一视频帧通过OCR确定框点和追踪点,后续的视频帧处理过程中即可使用前一个视频帧确定的框点和追踪点,直到处理的相邻两帧中找到的相应位置的追踪数目的比例小于追踪点比例阈值时,认为跟踪失败,重新开始另一次跟踪流程,再次确定一个第一视频帧。如此即保障了该视频文字方法的持续高效运行。
结合第一方面,在一些实施例中,该方法还包括:该电子设备从该第一视频帧开始维护一个固定长度为预设帧数个视频帧的缓存,该缓存用于在OCR识别该第一视频帧的结果未返回时存储新产生的视频帧。
具体的,在一些实施例中,为了维护该固定长度的缓存,可以有很多不同的维护方式:
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,该电子设备在该缓存中每增加一个新视频帧时删除一个旧视频帧,其中,该缓存中存储的相邻视频帧的取得时间的差值小于预设间隔时间。
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,该电子设备在该缓存中每增加一个新视频帧时删除一个旧视频帧,使得缓存内剩余相邻帧之间的间隔尽量保持均匀。
本申请实施例中,维护固定长度的缓存目的是防止第一个视频帧中文字较多,OCR识别的时间过久,限制缓存的大小可以缩短缓存“追上”最新视频帧的时间,进而缩短用户等待结果返还的时间,提升体验。
第二方面,本申请实施例提供了一种电子设备,该电子设备包括:一个或多个处理器和存储器;该存储器与该一个或多个处理器耦合,该存储器用于存储计算机程序代码,该计算机程序代码包括计算机指令,该一个或多个处理器调用该计算机指令以使得该电子设备执行:对第一视频帧进行OCR检测,得到锚定各文本行的位置的框点,其中至少包括第一初始框点集合,该第一初始框点集合中包含OCR识别出的用于锚定第一文本行的位置的框点,该第一文本行为该第一视频帧中的任一个文本行,该第一初始框点集合中框点的数目不小于4;根据该第一初始框点集合,确定第一扩展框点集合,该第一扩展框点集合将该第一文本行框定在N个连续且宽度均匀的子区域中,该N为不小于2的正整数;根据第二追踪点集合相对于第一追踪点集合中各追踪点的位置,确定该第一扩展框点集合中各框点在第二视频帧中的位置,得到第二计算框点集合,其中,该第一追踪点集合中包括在该第一视频帧中该第一扩展框点集合确定的子区域中的追踪点,该第二追踪点集合中包括在该第二视频帧中预测出的与 该第一追踪点集合中追踪点对应位置的追踪点,该第二视频帧为在该第一视频帧之后取得的一个视频帧;根据该第二计算框点集合,确定第二区域,该第二区域为确定的该第一文本行在该第二视频帧中的位置。
第二方面提供的电子设备,根据OCR识别出的各文本行的框点,将各文本行拆分到N个连续且宽度均匀的子区域中,再对各个均匀的子区域进行跟踪处理确定各文本行在第二视频帧中的位置,由于跟踪处理的尺度更加精细化,且各连续的子区域既可以呈直线又可以弯曲,因此其不仅可以兼容直线文本(文字中心点在一条直线上)或者弯曲文本场景,对于展现出形变性质的文本行也有很好的跟踪效果,能够准确跟踪预测文本行的位置。
结合第二方面,在一些实施例中,该一个或多个处理器,具体用于调用该计算机指令以使得该电子设备执行:调整该第二计算框点集合中各框点的位置,得到第二调整框点集合,使得该第二调整框点集合确定的各子区域完全包围该第二追踪点集合中各追踪点;根据该第二调整框点集合,确定该第二区域。
具体的调整方式有很多:
示例性的,在一些实施例中,可以根据第二追踪点集合中最高的追踪点和最低的追踪点整体调整框点位置:
调整第二计算上框点集合中的各框点的纵坐标大于第二追踪点集合中追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
调整第二计算下框点集合中的各框点的纵坐标小于第二追踪点集合中追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
示例性的,在一些实施例中,可以根据距离各框点预设距离范围内最高的追踪点和最低的追踪点调整框点位置:
调整第二计算上框点集合中的各框点的纵坐标大于距离各框点预设距离范围内的追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
调整第二计算下框点集合中的各框点的纵坐标小于距离各框点预设距离范围内的追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
还可以有其他的调整方式,只要最终使得调整得到的第二调整框点集合确定的各子区域能完全包围第二追踪点集合中各追踪点即可,此处不作限定。
结合第二方面,在一些实施例中,该一个或多个处理器,具体用于调用该计算机指令以使得该电子设备执行:对该第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到该第二区域。
具体的,在一些实施例中,可以分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合,得到平滑的包围曲线,形成第二区域,其中,第二调整框点集合可以划分为第二调整上框点集合和第二调整下框点集合,第二调整上框点集合为位于子区域上半部分的框点,第二调整下框点集合为位于子区域下半部分的框点。
进一步的,在一些实施例中,分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合时,可以分别计算第二调整上框点集合和第二调整下框点集合中的框点的线性相关系数,根据线性相关系数的数值确定采用的拟合方式:
示例性的,可以计算第二调整上框点集合和第二调整下框点集合中框点的皮尔逊相关系数,如果线性相关性较强(例如相关系数大于0.8),则可以确定采用线性拟合;如果线性相关性较弱,则可以确定采用二次等更高次的拟合。
可选的,在一些实施例中,可以保存计算线性相关系数的中间量(例如计算皮尔逊相关系数时的中间量),以供在后续其他计算过程中需要使用相关中间量时可以直接使用保存的中间量,从而减少计算量。
结合第二方面,在一些实施例中,该一个或多个处理器,具体用于调用该计算机指令以使得该电子设备执行:当该第一初始框点集合中框点的数目等于4时,在4个框点确定的矩形的上下两边上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合;
当该第一初始框点集合中框点的数目大于4时,分别对第一初始上框点集合与第一初始下框点集合中的框点进行拟合,得到上拟合曲线和下拟合曲线,其中,该第一初始上框点集合中包括该第一初始框点集合中位于该第一文本行上半部分的框点,该第一初始下框点集合中包括该第一初始框点集合中位于该第一文本行下半部分的框点;分别在该上拟合曲线和该下拟合曲线上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合。
结合第二方面,在一些实施例中,该一个或多个处理器,还用于调用该计算机指令以使得该电子设备执行:根据第三追踪点集合相对于第二追踪点集合中各追踪点的位置,确定第三调整框点集合中各框点在第三视频帧中的位置,得到第三计算框点集合,其中,该第三追踪点集合中包括在该第三视频帧中预测出的与该第二追踪点集合中追踪点对应位置的追踪点,该第三视频帧为在该第二视频帧之后取得的一个视频帧;调整该第三计算框点集合中各框点的位置,得到第三调整框点集合,使得该第三调整框点集合确定的各子区域完全包围该第三追踪点集合中各追踪点;对该第三调整框点集合确定的各子区域的包围曲线进行平滑处理,得到第三区域,该第三区域为确定的该第一文本行在该第三视频帧中的位置。
可以理解的是,依此类推,电子设备可以跟踪确定该第一文本行在第四视频帧、第五视频帧、第六视频帧等后续视频帧中的位置。
结合第二方面,在一些实施例中,该一个或多个处理器,还用于调用该计算机指令以使得该电子设备执行:从该第一视频帧开始维护一个固定长度为预设帧数个视频帧的缓存,该缓存用于在OCR识别该第一视频帧的结果未返回时存储新产生的视频帧。
具体的,在一些实施例中,为了维护该固定长度的缓存,可以有很多不同的维护方式:
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,在该缓存中每增加一个新视频帧时删除一个旧视频帧,其中,该缓存中存储的相邻视频帧的取得时间的差值小于预设间隔时间。
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,在该缓存中每增加一个新视频帧时删除一个旧视频帧,使得缓存内剩余相邻帧之间的间隔尽量保持均匀。
第三方面,本申请实施例提供了一种电子设备,该电子设备包括:
OCR检测模块:用于对第一视频帧进行OCR检测,得到锚定各文本行的位置的框点,其中至少包括第一初始框点集合,该第一初始框点集合中包含OCR识别出的用于锚定第一文本行的位置的框点,该第一文本行为该第一视频帧中的任一个文本行,该第一初始框点集合中框点的数目不小于4;
框点扩展模块:用于根据该第一初始框点集合,确定第一扩展框点集合,该第一扩展框点集合将该第一文本行框定在N个连续且宽度均匀的子区域中,该N为不小于2的正整数;
框点计算模块:用于根据第二追踪点集合相对于第一追踪点集合中各追踪点的位置,确定该第一扩展框点集合中各框点在第二视频帧中的位置,得到第二计算框点集合,其中,该第一追踪点集合中包括在该第一视频帧中该第一扩展框点集合确定的子区域中的追踪点,该 第二追踪点集合中包括在该第二视频帧中预测出的与该第一追踪点集合中追踪点对应位置的追踪点,该第二视频帧为在该第一视频帧之后取得的一个视频帧;
区域确定模块:根据该第二计算框点集合,确定第二区域,该第二区域为确定的该第一文本行在该第二视频帧中的位置。
结合第三方面,在一些实施例中,该区域确定模块具体包括:
框点调整单元:用于调整该第二计算框点集合中各框点的位置,得到第二调整框点集合,使得该第二调整框点集合确定的各子区域完全包围该第二追踪点集合中各追踪点;
区域确定单元:用于根据该第二调整框点集合,确定该第二区域。
具体的调整方式有很多:
示例性的,在一些实施例中,框点调整单元可以根据第二追踪点集合中最高的追踪点和最低的追踪点整体调整框点位置:调整第二计算上框点集合中的各框点的纵坐标大于第二追踪点集合中追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;调整第二计算下框点集合中的各框点的纵坐标小于第二追踪点集合中追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
示例性的,在一些实施例中,框点调整单元可以根据距离各框点预设距离范围内最高的追踪点和最低的追踪点调整框点位置:调整第二计算上框点集合中的各框点的纵坐标大于距离各框点预设距离范围内的追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;调整第二计算下框点集合中的各框点的纵坐标小于距离各框点预设距离范围内的追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
还可以有其他的调整方式,只要最终使得调整得到的第二调整框点集合确定的各子区域能完全包围第二追踪点集合中各追踪点即可,此处不作限定。
结合第三方面,在一些实施例中,该区域确定单元具体用于:对该第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到该第二区域。
具体的,在一些实施例中,该区域确定单元可以分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合,得到平滑的包围曲线,形成第二区域,其中,第二调整框点集合可以划分为第二调整上框点集合和第二调整下框点集合,第二调整上框点集合为位于子区域上半部分的框点,第二调整下框点集合为位于子区域下半部分的框点。
进一步的,在一些实施例中,该区域确定单元分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合时,可以分别计算第二调整上框点集合和第二调整下框点集合中的框点的线性相关系数,根据线性相关系数的数值确定采用的拟合方式:
示例性的,该区域确定单元可以计算第二调整上框点集合和第二调整下框点集合中框点的皮尔逊相关系数,如果线性相关性较强(例如相关系数大于0.8),则该区域确定单元可以确定采用线性拟合;如果线性相关性较弱,则该区域确定单元可以确定采用二次等更高次的拟合。
可选的,在一些实施例中,该电子设备还可以包括中间量保存模块,用于保存计算线性相关系数的中间量(例如计算皮尔逊相关系数时的中间量),以供在后续其他计算过程中需要使用相关中间量时可以直接使用保存的中间量,从而减少计算量。
结合第三方面,在一些实施例中,该框点扩展模块具体包括:
直线文本框点扩展单元,用于当该第一初始框点集合中框点的数目等于4时,在4个框点确定的矩形的上下两边上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合;
弯曲文本框点扩展单元,用于当该第一初始框点集合中框点的数目大于4时,分别对第一初始上框点集合与第一初始下框点集合中的框点进行拟合,得到上拟合曲线和下拟合曲线,其中,该第一初始上框点集合中包括该第一初始框点集合中位于该第一文本行上半部分的框点,该第一初始下框点集合中包括该第一初始框点集合中位于该第一文本行下半部分的框点;分别在该上拟合曲线和该下拟合曲线上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合。
结合第三方面,在一些实施例中,该电子设备还可以包括:
跟踪计算框点模块:用于根据第三追踪点集合相对于第二追踪点集合中各追踪点的位置,确定第三调整框点集合中各框点在第三视频帧中的位置,得到第三计算框点集合,其中,该第三追踪点集合中包括在该第三视频帧中预测出的与该第二追踪点集合中追踪点对应位置的追踪点,该第三视频帧为在该第二视频帧之后取得的一个视频帧;
跟踪调整框点模块:用于调整该第三计算框点集合中各框点的位置,得到第三调整框点集合,使得该第三调整框点集合确定的各子区域完全包围该第三追踪点集合中各追踪点;
跟踪区域确定模块,用于对该第三调整框点集合确定的各子区域的包围曲线进行平滑处理,得到第三区域,该第三区域为确定的该第一文本行在该第三视频帧中的位置。
可以理解的是,依此类推,跟踪计算框点模块、跟踪调整框点模块和跟踪区域确定模块可以跟踪确定该第一文本行在第四视频帧、第五视频帧、第六视频帧等后续视频帧中的位置。
结合第三方面,在一些实施例中,该电子设备还可以包括:
缓存维护模块:用于从该第一视频帧开始维护一个固定长度为预设帧数个视频帧的缓存,该缓存用于在OCR识别该第一视频帧的结果未返回时存储新产生的视频帧。
具体的,在一些实施例中,为了维护该固定长度的缓存,可以有很多不同的维护方式:
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,缓存维护模块在该缓存中每增加一个新视频帧时删除一个旧视频帧,其中,该缓存中存储的相邻视频帧的取得时间的差值小于预设间隔时间。
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,缓存维护模块在该缓存中每增加一个新视频帧时删除一个旧视频帧,使得缓存内剩余相邻帧之间的间隔尽量保持均匀。
第四方面,本申请实施例提供了一种芯片,该芯片应用于电子设备,该芯片包括一个或多个处理器,该处理器用于调用计算机指令以使得该电子设备执行对第一视频帧进行OCR检测,得到锚定各文本行的位置的框点,其中至少包括第一初始框点集合,该第一初始框点集合中包含OCR识别出的用于锚定第一文本行的位置的框点,该第一文本行为该第一视频帧中的任一个文本行,该第一初始框点集合中框点的数目不小于4;根据该第一初始框点集合,确定第一扩展框点集合,该第一扩展框点集合将该第一文本行框定在N个连续且宽度均匀的子区域中,该N为不小于2的正整数;根据第二追踪点集合相对于第一追踪点集合中各追踪点的位置,确定该第一扩展框点集合中各框点在第二视频帧中的位置,得到第二计算框点集合,其中,该第一追踪点集合中包括在该第一视频帧中该第一扩展框点集合确定的子区域中的追踪点,该第二追踪点集合中包括在该第二视频帧中预测出的与该第一追踪点集合中追踪点对应位置的追踪点,该第二视频帧为在该第一视频帧之后取得的一个视频帧;根据该第二计算框点集合,确定第二区域,该第二区域为确定的该第一文本行在该第二视频帧中的位置。
结合第四方面,在一些实施例中,该一个或多个处理器,具体用于调用该计算机指令以 使得该电子设备执行:调整该第二计算框点集合中各框点的位置,得到第二调整框点集合,使得该第二调整框点集合确定的各子区域完全包围该第二追踪点集合中各追踪点;根据该第二调整框点集合,确定该第二区域。
具体的调整方式有很多:
示例性的,在一些实施例中,可以根据第二追踪点集合中最高的追踪点和最低的追踪点整体调整框点位置:
调整第二计算上框点集合中的各框点的纵坐标大于第二追踪点集合中追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
调整第二计算下框点集合中的各框点的纵坐标小于第二追踪点集合中追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
示例性的,在一些实施例中,可以根据距离各框点预设距离范围内最高的追踪点和最低的追踪点调整框点位置:
调整第二计算上框点集合中的各框点的纵坐标大于距离各框点预设距离范围内的追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
调整第二计算下框点集合中的各框点的纵坐标小于距离各框点预设距离范围内的追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
还可以有其他的调整方式,只要最终使得调整得到的第二调整框点集合确定的各子区域能完全包围第二追踪点集合中各追踪点即可,此处不作限定。
结合第四方面,在一些实施例中,该一个或多个处理器,具体用于调用该计算机指令以使得该电子设备执行:对该第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到该第二区域。
具体的,在一些实施例中,可以分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合,得到平滑的包围曲线,形成第二区域,其中,第二调整框点集合可以划分为第二调整上框点集合和第二调整下框点集合,第二调整上框点集合为位于子区域上半部分的框点,第二调整下框点集合为位于子区域下半部分的框点。
进一步的,在一些实施例中,分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合时,可以分别计算第二调整上框点集合和第二调整下框点集合中的框点的线性相关系数,根据线性相关系数的数值确定采用的拟合方式:
示例性的,可以计算第二调整上框点集合和第二调整下框点集合中框点的皮尔逊相关系数,如果线性相关性较强(例如相关系数大于0.8),则可以确定采用线性拟合;如果线性相关性较弱,则可以确定采用二次等更高次的拟合。
可选的,在一些实施例中,可以保存计算线性相关系数的中间量(例如计算皮尔逊相关系数时的中间量),以供在后续其他计算过程中需要使用相关中间量时可以直接使用保存的中间量,从而减少计算量。
结合第四方面,在一些实施例中,该一个或多个处理器,具体用于调用该计算机指令以使得该电子设备执行:当该第一初始框点集合中框点的数目等于4时,在4个框点确定的矩形的上下两边上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合;
当该第一初始框点集合中框点的数目大于4时,分别对第一初始上框点集合与第一初始下框点集合中的框点进行拟合,得到上拟合曲线和下拟合曲线,其中,该第一初始上框点集合中包括该第一初始框点集合中位于该第一文本行上半部分的框点,该第一初始下框点集合 中包括该第一初始框点集合中位于该第一文本行下半部分的框点;分别在该上拟合曲线和该下拟合曲线上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合。
结合第四方面,在一些实施例中,该一个或多个处理器,还用于调用该计算机指令以使得该电子设备执行:根据第三追踪点集合相对于第二追踪点集合中各追踪点的位置,确定第三调整框点集合中各框点在第三视频帧中的位置,得到第三计算框点集合,其中,该第三追踪点集合中包括在该第三视频帧中预测出的与该第二追踪点集合中追踪点对应位置的追踪点,该第三视频帧为在该第二视频帧之后取得的一个视频帧;调整该第三计算框点集合中各框点的位置,得到第三调整框点集合,使得该第三调整框点集合确定的各子区域完全包围该第三追踪点集合中各追踪点;对该第三调整框点集合确定的各子区域的包围曲线进行平滑处理,得到第三区域,该第三区域为确定的该第一文本行在该第三视频帧中的位置。
可以理解的是,依此类推,电子设备可以跟踪确定该第一文本行在第四视频帧、第五视频帧、第六视频帧等后续视频帧中的位置。
结合第四方面,在一些实施例中,该一个或多个处理器,还用于调用该计算机指令以使得该电子设备执行:从该第一视频帧开始维护一个固定长度为预设帧数个视频帧的缓存,该缓存用于在OCR识别该第一视频帧的结果未返回时存储新产生的视频帧。
具体的,在一些实施例中,为了维护该固定长度的缓存,可以有很多不同的维护方式:
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,在该缓存中每增加一个新视频帧时删除一个旧视频帧,其中,该缓存中存储的相邻视频帧的取得时间的差值小于预设间隔时间。
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,在该缓存中每增加一个新视频帧时删除一个旧视频帧,使得缓存内剩余相邻帧之间的间隔尽量保持均匀。
第五方面,本申请实施例提供一种包含指令的计算机程序产品,当上述计算机程序产品在电子设备上运行时,使得上述电子设备执行:对第一视频帧进行OCR检测,得到锚定各文本行的位置的框点,其中至少包括第一初始框点集合,该第一初始框点集合中包含OCR识别出的用于锚定第一文本行的位置的框点,该第一文本行为该第一视频帧中的任一个文本行,该第一初始框点集合中框点的数目不小于4;根据该第一初始框点集合,确定第一扩展框点集合,该第一扩展框点集合将该第一文本行框定在N个连续且宽度均匀的子区域中,该N为不小于2的正整数;根据第二追踪点集合相对于第一追踪点集合中各追踪点的位置,确定该第一扩展框点集合中各框点在第二视频帧中的位置,得到第二计算框点集合,其中,该第一追踪点集合中包括在该第一视频帧中该第一扩展框点集合确定的子区域中的追踪点,该第二追踪点集合中包括在该第二视频帧中预测出的与该第一追踪点集合中追踪点对应位置的追踪点,该第二视频帧为在该第一视频帧之后取得的一个视频帧;根据该第二计算框点集合,确定第二区域,该第二区域为确定的该第一文本行在该第二视频帧中的位置。
结合第五方面,在一些实施例中,当上述计算机程序产品在电子设备上运行时,具体使得上述电子设备执行:调整该第二计算框点集合中各框点的位置,得到第二调整框点集合,使得该第二调整框点集合确定的各子区域完全包围该第二追踪点集合中各追踪点;根据该第二调整框点集合,确定该第二区域。
具体的调整方式有很多:
示例性的,在一些实施例中,可以根据第二追踪点集合中最高的追踪点和最低的追踪点 整体调整框点位置:
调整第二计算上框点集合中的各框点的纵坐标大于第二追踪点集合中追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
调整第二计算下框点集合中的各框点的纵坐标小于第二追踪点集合中追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
示例性的,在一些实施例中,可以根据距离各框点预设距离范围内最高的追踪点和最低的追踪点调整框点位置:
调整第二计算上框点集合中的各框点的纵坐标大于距离各框点预设距离范围内的追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
调整第二计算下框点集合中的各框点的纵坐标小于距离各框点预设距离范围内的追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
还可以有其他的调整方式,只要最终使得调整得到的第二调整框点集合确定的各子区域能完全包围第二追踪点集合中各追踪点即可,此处不作限定。
结合第五方面,在一些实施例中,当上述计算机程序产品在电子设备上运行时,具体使得上述电子设备执行:对该第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到该第二区域。
具体的,在一些实施例中,可以分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合,得到平滑的包围曲线,形成第二区域,其中,第二调整框点集合可以划分为第二调整上框点集合和第二调整下框点集合,第二调整上框点集合为位于子区域上半部分的框点,第二调整下框点集合为位于子区域下半部分的框点。
进一步的,在一些实施例中,分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合时,可以分别计算第二调整上框点集合和第二调整下框点集合中的框点的线性相关系数,根据线性相关系数的数值确定采用的拟合方式:
示例性的,可以计算第二调整上框点集合和第二调整下框点集合中框点的皮尔逊相关系数,如果线性相关性较强(例如相关系数大于0.8),则可以确定采用线性拟合;如果线性相关性较弱,则可以确定采用二次等更高次的拟合。
可选的,在一些实施例中,当上述计算机程序产品在电子设备上运行时,使得上述电子设备保存计算线性相关系数的中间量(例如计算皮尔逊相关系数时的中间量),以供在后续其他计算过程中需要使用相关中间量时可以直接使用保存的中间量,从而减少计算量。
结合第五方面,在一些实施例中,当上述计算机程序产品在电子设备上运行时,具体使得上述电子设备执行:当该第一初始框点集合中框点的数目等于4时,在4个框点确定的矩形的上下两边上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合;
当该第一初始框点集合中框点的数目大于4时,分别对第一初始上框点集合与第一初始下框点集合中的框点进行拟合,得到上拟合曲线和下拟合曲线,其中,该第一初始上框点集合中包括该第一初始框点集合中位于该第一文本行上半部分的框点,该第一初始下框点集合中包括该第一初始框点集合中位于该第一文本行下半部分的框点;分别在该上拟合曲线和该下拟合曲线上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合。
结合第五方面,在一些实施例中,当上述计算机程序产品在电子设备上运行时,还使得上述电子设备执行:根据第三追踪点集合相对于第二追踪点集合中各追踪点的位置,确定第三调整框点集合中各框点在第三视频帧中的位置,得到第三计算框点集合,其中,该第三追 踪点集合中包括在该第三视频帧中预测出的与该第二追踪点集合中追踪点对应位置的追踪点,该第三视频帧为在该第二视频帧之后取得的一个视频帧;调整该第三计算框点集合中各框点的位置,得到第三调整框点集合,使得该第三调整框点集合确定的各子区域完全包围该第三追踪点集合中各追踪点;对该第三调整框点集合确定的各子区域的包围曲线进行平滑处理,得到第三区域,该第三区域为确定的该第一文本行在该第三视频帧中的位置。
可以理解的是,依此类推,电子设备可以跟踪确定该第一文本行在第四视频帧、第五视频帧、第六视频帧等后续视频帧中的位置。
结合第五方面,在一些实施例中,当上述计算机程序产品在电子设备上运行时,还使得上述电子设备执行:从该第一视频帧开始维护一个固定长度为预设帧数个视频帧的缓存,该缓存用于在OCR识别该第一视频帧的结果未返回时存储新产生的视频帧。
具体的,在一些实施例中,为了维护该固定长度的缓存,可以有很多不同的维护方式:
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,在该缓存中每增加一个新视频帧时删除一个旧视频帧,其中,该缓存中存储的相邻视频帧的取得时间的差值小于预设间隔时间。
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,在该缓存中每增加一个新视频帧时删除一个旧视频帧,使得缓存内剩余相邻帧之间的间隔尽量保持均匀。
第六方面,本申请实施例提供一种计算机可读存储介质,包括指令,当上述指令在电子设备上运行时,使得上述电子设备执行:对第一视频帧进行OCR检测,得到锚定各文本行的位置的框点,其中至少包括第一初始框点集合,该第一初始框点集合中包含OCR识别出的用于锚定第一文本行的位置的框点,该第一文本行为该第一视频帧中的任一个文本行,该第一初始框点集合中框点的数目不小于4;根据该第一初始框点集合,确定第一扩展框点集合,该第一扩展框点集合将该第一文本行框定在N个连续且宽度均匀的子区域中,该N为不小于2的正整数;根据第二追踪点集合相对于第一追踪点集合中各追踪点的位置,确定该第一扩展框点集合中各框点在第二视频帧中的位置,得到第二计算框点集合,其中,该第一追踪点集合中包括在该第一视频帧中该第一扩展框点集合确定的子区域中的追踪点,该第二追踪点集合中包括在该第二视频帧中预测出的与该第一追踪点集合中追踪点对应位置的追踪点,该第二视频帧为在该第一视频帧之后取得的一个视频帧;根据该第二计算框点集合,确定第二区域,该第二区域为确定的该第一文本行在该第二视频帧中的位置。
结合第六方面,在一些实施例中,当上述指令在电子设备上运行时,具体使得上述电子设备执行:调整该第二计算框点集合中各框点的位置,得到第二调整框点集合,使得该第二调整框点集合确定的各子区域完全包围该第二追踪点集合中各追踪点;根据该第二调整框点集合,确定该第二区域。
具体的调整方式有很多:
示例性的,在一些实施例中,可以根据第二追踪点集合中最高的追踪点和最低的追踪点整体调整框点位置:
调整第二计算上框点集合中的各框点的纵坐标大于第二追踪点集合中追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
调整第二计算下框点集合中的各框点的纵坐标小于第二追踪点集合中追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
示例性的,在一些实施例中,可以根据距离各框点预设距离范围内最高的追踪点和最低的追踪点调整框点位置:
调整第二计算上框点集合中的各框点的纵坐标大于距离各框点预设距离范围内的追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
调整第二计算下框点集合中的各框点的纵坐标小于距离各框点预设距离范围内的追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
还可以有其他的调整方式,只要最终使得调整得到的第二调整框点集合确定的各子区域能完全包围第二追踪点集合中各追踪点即可,此处不作限定。
结合第六方面,在一些实施例中,当上述指令在电子设备上运行时,具体使得上述电子设备执行:对该第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到该第二区域。
具体的,在一些实施例中,可以分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合,得到平滑的包围曲线,形成第二区域,其中,第二调整框点集合可以划分为第二调整上框点集合和第二调整下框点集合,第二调整上框点集合为位于子区域上半部分的框点,第二调整下框点集合为位于子区域下半部分的框点。
进一步的,在一些实施例中,分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合时,可以分别计算第二调整上框点集合和第二调整下框点集合中的框点的线性相关系数,根据线性相关系数的数值确定采用的拟合方式:
示例性的,可以计算第二调整上框点集合和第二调整下框点集合中框点的皮尔逊相关系数,如果线性相关性较强(例如相关系数大于0.8),则可以确定采用线性拟合;如果线性相关性较弱,则可以确定采用二次等更高次的拟合。
可选的,在一些实施例中,当上述指令在电子设备上运行时,使得上述电子设备保存计算线性相关系数的中间量(例如计算皮尔逊相关系数时的中间量),以供在后续其他计算过程中需要使用相关中间量时可以直接使用保存的中间量,从而减少计算量。
结合第六方面,在一些实施例中,当上述指令在电子设备上运行时,具体使得上述电子设备执行:当该第一初始框点集合中框点的数目等于4时,在4个框点确定的矩形的上下两边上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合;
当该第一初始框点集合中框点的数目大于4时,分别对第一初始上框点集合与第一初始下框点集合中的框点进行拟合,得到上拟合曲线和下拟合曲线,其中,该第一初始上框点集合中包括该第一初始框点集合中位于该第一文本行上半部分的框点,该第一初始下框点集合中包括该第一初始框点集合中位于该第一文本行下半部分的框点;分别在该上拟合曲线和该下拟合曲线上取横坐标间隔均匀的点作为新的框点,组成该第一扩展框点集合。
结合第六方面,在一些实施例中,当上述指令在电子设备上运行时,还使得上述电子设备执行:根据第三追踪点集合相对于第二追踪点集合中各追踪点的位置,确定第三调整框点集合中各框点在第三视频帧中的位置,得到第三计算框点集合,其中,该第三追踪点集合中包括在该第三视频帧中预测出的与该第二追踪点集合中追踪点对应位置的追踪点,该第三视频帧为在该第二视频帧之后取得的一个视频帧;调整该第三计算框点集合中各框点的位置,得到第三调整框点集合,使得该第三调整框点集合确定的各子区域完全包围该第三追踪点集合中各追踪点;对该第三调整框点集合确定的各子区域的包围曲线进行平滑处理,得到第三区域,该第三区域为确定的该第一文本行在该第三视频帧中的位置。
可以理解的是,依此类推,电子设备可以跟踪确定该第一文本行在第四视频帧、第五视频帧、第六视频帧等后续视频帧中的位置。
结合第六方面,在一些实施例中,当上述指令在电子设备上运行时,还使得上述电子设备执行:从该第一视频帧开始维护一个固定长度为预设帧数个视频帧的缓存,该缓存用于在OCR识别该第一视频帧的结果未返回时存储新产生的视频帧。
具体的,在一些实施例中,为了维护该固定长度的缓存,可以有很多不同的维护方式:
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,在该缓存中每增加一个新视频帧时删除一个旧视频帧,其中,该缓存中存储的相邻视频帧的取得时间的差值小于预设间隔时间。
示例性的,当该缓存中存储的视频帧数目等于该预设帧数时,在该缓存中每增加一个新视频帧时删除一个旧视频帧,使得缓存内剩余相邻帧之间的间隔尽量保持均匀。
可以理解地,上述第二方面提供的电子设备、第三方面提供的电子设备、第四方面提供的芯片、第五方面提供的计算机程序产品和第六方面提供的计算机存储介质均用于执行本申请实施例所提供的方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。
图1是现有技术中一个倾斜矩形的倾斜角度变化的示意图;
图2是现有技术中一个使用倾斜矩形确定弯曲文本位置的场景示意图;
图3是现有技术中一个使用倾斜矩形确定形变文本位置的场景示意图;
图4是采用本申请实施例中视频文字跟踪方法确定弯曲文本位置的一个场景示意图;
图5是采用本申请实施例中视频文字跟踪方法确定形变文本位置的一个场景示意图;
图6是本申请实施例中视频文字跟踪方法一个流程示意图;
图7是本申请实施例中OCR检测确定框点的一个示例性场景示意图;
图8是本申请实施例中确定框定文本行的均匀子区域的一个示例性场景示意图;
图9是本申请实施例中确定框点在第二视频帧中位置的一个示例性场景示意图;
图10是本申请实施例中调整框点位置的一个示例性场景示意图;
图11是本申请实施例中子区域包围曲线平滑处理的一个示例性场景示意图;
图12是本申请实施例中缓存视频帧调度的一个场景示意图;
图13是本申请实施例中一个电子设备的示例性结构示意图;
图14是本申请实施例中一个电子设备的示例性软件结构框图。
本申请以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括复数表达形式,除非其上下文中明确地有相反指示。还应当理解,本申请中使用的术语“和/或”是指并包含一个或多个所列出项目的任何或所有可能组合。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
由于本申请实施例涉及OCR及文字跟踪相关技术,为了便于理解,下面先对本申请实施例涉及的相关术语及相关概念进行介绍。
(1)OCR
OCR一般指电子设备检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程。
(2)框点
框点指OCR识别过程中产生的,用于框定文本行位置的矩形框的顶点。如图7所示,OCR检测出的框点确定的区域可以框定文本行的位置。
(3)追踪点
追踪点,在本申请实施例中也可称为角点、特征点等。
角点检测(Corner Detection)是计算机视觉系统中用来获得图像特征的一种方法,广泛应用于运动检测、图像匹配、视频跟踪、三维建模和目标识别等领域中。
实际应用中,大多数的角点检测方法检测的是拥有特定特征的图像点,这些特征点在图像中有具体的坐标,并具有某些数学特征,如局部最大或最小灰度、某些梯度特征等。
如图9所示,通过确定在不同视频帧中相对应的追踪点的位置,能确定视频帧之间的位置变化方式,从而确定视频帧中其他点的可能位置。
具体的,可以通过视频帧A和视频帧B中相应追踪点的位置变化,计算这两个视频帧之间的投影变换矩阵,然后将视频帧A中某点的坐标代入该投影变换矩阵,即可计算得到该点在视频帧B中大致的坐标。
本申请实施例中电子设备将文本行区域拆分为各个子区域,对各子区域进行跟踪再经过处理联结成新文本行,不仅可以兼容于直线文本(文字中心点在一条直线上)或者弯曲文本场景,对于展现出形变性质的文本行也有很好的跟踪效果,能够准确跟踪预测文本行的位置。
如图4所示,为采用本申请实施例中视频文字跟踪方法确定弯曲文本位置的一个场景示意图。如图5所示,为采用本申请实施例中视频文字跟踪方法确定形变文本位置的一个场景示意图。
下面对本申请实施例中的视频文字跟踪方法进行说明:
实施例1:
如图6所示,为本申请实施例中视频文字跟踪方法一个流程示意图:
S601、对第一视频帧进行OCR检测,得到锚定各文本行的位置的框点,其中至少包括第一初始框点集合,该第一初始框点集合中包含OCR识别出的用于锚定第一文本行的位置的框点,该第一文本行为第一视频帧中的任一个文本行,该第一文本行在该第一视频帧中位于由该第一初始框点集合中框点确定的第一区域,该第一初始框点集合中框点的数目不小于4;
如图7所示,为本申请实施例中OCR检测确定框点的一个示例性场景示意图。对第一视频帧进行OCR检测后,可以得到锚定各文本行的位置的框点,以其中任一个文本行作为第一文本行,则得到的框点中至少包括用于锚定该第一文本行位置的框点,在本申请实施例中简称为第一初始框点集合。该第一初始框点集合中框点的数目为2的倍数且不小于4。由该第一初始框点集合中各框点的连线可以围成第一区域,第一文本行即位于该第一区域中。
可以理解的是,若第一文本行为直线文本,则第一初始框点集合中框点的数目为4;若第一文本行为弯曲文本,则第一初始框点集合中框点的数目大于4且为2的倍数。
其中,第一视频帧为一个视频帧。该第一视频帧可以为视频拍摄时的一个视频帧,也可以为视频播放时的一个视频帧,此处不做限定。例如,该第一视频帧可以为视频拍摄时镜头稳定后取得的第一个视频帧。
S602、根据该第一初始框点集合,确定第一扩展框点集合,该第一扩展框点集合将该第一文本行框定在N个连续且宽度均匀的子区域中,N为不小于2的正整数;
如图8所示,为本申请实施例中确定框定文本行的均匀子区域的一个示例性场景示意图。第一初始框点集合确定了第一文本行所在的第一区域,且第一初始框点集合可以将第一区域划分为多个连续且不规则的四边形子区域。为了使得为框定文本行的子区域均匀,则需要确定第一扩展框点集合。因此,确定第一扩展框点集合的过程即为确定框定该第一文本行的N个连续且宽度均匀的子区域的过程。
下面对确定第一扩展框点集合的过程进行具体描述。由于第一初始框定集合中框点的数目为2的倍数且不小于4。根据第一初始框点集合中框点的数目是否为4,确定第一扩展框点集合的过程有所不同:
(1)第一初始框点集合中框点的数目等于4;
第一初始框点集合中框点的数目等于4,即表示第一文本行为直线文本,因此使用框点数为4的倾斜矩形即可锚定第一文本行的位置。此时只需要在该矩形的上下两边上取横坐标间隔均匀的点即可,将取得的横坐标间隔均匀的点作为新的框点,组成第一扩展框点集合。
(2)第一初始框点集合中框点的数目大于4。
第一初始框点集合中框点数目大于4,即表示第一文本行为弯曲文本,因此会使用数目大于4的框点来锚定该第一文本行的位置。该第一初始框点集合中的框点可以划分为第一初始上框点集合和第一初始下框点集合,其中第一初始上框点集合中包括该第一初始框点集合中位于该第一文本行上半部分的框点,该第一初始下框点集合中包括该第一初始框点集合中位于该第一文本行下半部分的框点。
分别对第一初始上框点集合与第一初始下框点集合中的框点进行拟合,得到上拟合曲线和下拟合曲线。如图8所示,分别在上拟合曲线和下拟合曲线上取横坐标间隔均匀的点,将取得的横坐标间隔均匀的点作为新的框点,组成第一扩展框点集合。
则第一扩展框点集合中的框点构成的N个子区域连续且宽度均匀,并框定了该第一文本行。
需要说明的是,在矩形的上下两边或拟合曲线的上下两边取新的框点时,新的框点之间的横坐标间隔可以根据实际情况来确定,只要能够使得最终确定的子区域连续且宽度均匀,例如可以根据文本行的总长度和/或文本行中字体的宽度来确定,此处不作限定。示例性的,该新的框点之间的横坐标间隔可以大约为文本行中字体宽度的两倍。
S603、根据第二追踪点集合相对于第一追踪点集合中各追踪点的位置,确定第一扩展框点集合中各框点在第二视频帧中的位置,得到第二计算框点集合,其中,第一追踪点集合中包括在该第一视频帧中该第一扩展框点集合确定的子区域中的追踪点,第二追踪点集合中包括在第二视频帧中预测出的与该第一追踪点集合中追踪点对应位置的追踪点,第二视频帧为在该第一视频帧之后取得的一个视频帧;
如图9所示,为本申请实施例中确定框点在第二视频帧中位置的一个示例性场景示意图。
下面对本步骤进行具体描述:
1、电子设备确定第一扩展框点集合后,可以确定第一追踪点集合,该第一追踪点集合中包括在该第一视频帧中该第一扩展框点集合确定的子区域中的追踪点。
具体的,电子设备可以使用角点等追踪点(关键点)检测技术,在该第一扩展框点集合确定的子区域中确定出一定数目的追踪点,组成第一追踪点集合。
可以理解的是,与此同时,电子设备还确定了第一视频帧中框定其他文本行的子区域中的追踪点。
2、电子设备根据该第一追踪点集合,可以确定第二追踪点集合,该第二追踪点集合中包括在第二视频帧中预测出的与该第一追踪点集合中追踪点对应位置的追踪点,该第二视频帧为在该第一视频帧之后取得的一个视频帧。
通过光流(Optical Flow)等追踪点跟踪算法,电子设备可以预测出第一视频帧的第一追踪点集合中的部分追踪点在第二视频帧中的位置,作为第二追踪点集合。
3、根据第二追踪点集合相对于第一追踪点集合中各追踪点的位置,电子设备可以确定第一扩展框点集合中各框点在第二视频帧中的位置,得到第二计算框点集合。
具体的,电子设备可以根据第一追踪点集合和第二追踪点集合中对应位置的追踪点的位置关系,得到第一视频帧到第二视频帧的投影变化矩阵;
再根据该投影变化矩阵,计算第一扩展框点集合中的这些框点在第二视频帧中的位置,将计算出来的框点作为第二计算框点集合。
可以理解的是,由于拍摄角度变化、画面元素变化等因素,在一些情况下,在第二视频帧中并不能完全找到第一追踪点集合中所有追踪点在第二视频帧中的对应位置,因此,第二追踪点集合中的追踪点数目一般小于等于第一追踪点集合中的数目。
S604、调整第二计算框点集合中各框点的位置,得到第二调整框点集合,使得第二调整框点集合确定的各子区域完全包围第二追踪点集合中各追踪点;
如图10所示,为本申请实施例中调整框点位置的一个示例性场景示意图。
如图10中(a)所示,根据投影变化矩阵和第一扩展框点集合计算出的第二计算框点集合中某些框点可能偏移较远,并不一定与其在第一视频帧中相应的位置一致,因此需要对框点位置进行调整,使得框点确定的各子区域能如图10中(b)所示完全包围第二追踪点集合中各追踪点。
具体的可选调整策略有很多,例如:
(1)根据第二追踪点集合中最高的追踪点和最低的追踪点整体调整框点位置。
若以视频帧左下角为原点,原点向右作为横坐标正方向,原点向上作为纵坐标正方向为例进行描述,即为:根据第二追踪点集合中追踪点纵坐标的最大值和最小值整体调整框点纵 坐标。
具体的,第二计算框点集合可以划分为第二计算上框点集合和第二计算下框点集合,其中,第二计算上框点集合中包括第二计算框点集合中位于子区域上半部分的框点;第二计算下框点集合中包括第二计算框点集合中位于子区域下半部分的框点。
可以调整第二计算上框点集合中的各框点的纵坐标大于第二追踪点集合中追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
可以调整第二计算下框点集合中的各框点的纵坐标小于第二追踪点集合中追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
若框点纵坐标就在该范围内,则该框点不需要进行调整;若框点纵坐标不在该范围内,则可以将框点纵坐标通过最小移动距离调整到该范围内。
示例性的,预设参数可以设置为0.5。
(2)根据距离各框点预设距离范围内最高的追踪点和最低的追踪点调整框点位置。
若以视频帧左下角为原点,原点向右作为横坐标正方向,原点向上作为纵坐标正方向为例进行描述,即为:根据距离各框点预设距离范围内的追踪点纵坐标的最大值和最小值调整框点纵坐标。
具体的,可以调整第二计算上框点集合中的各框点的纵坐标大于距离各框点预设距离范围内的追踪点纵坐标的最大值,且小于追踪点纵坐标的最大值与预设参数倍字体高度之和;
可以调整第二计算下框点集合中的各框点的纵坐标小于距离各框点预设距离范围内的追踪点纵坐标的最小值,且大于追踪点纵坐标的最小值与预设参数倍字体高度之差。
若框点纵坐标就在该范围内,则该框点不需要进行调整;若框点纵坐标不在该范围内,则可以将框点纵坐标通过最小移动距离调整到该范围内。
示例性的,预设参数可以设置为0.5,预设距离可以设置为1个字体宽度。
还可以有其他的调整策略,此处不作限定,只要最终使得调整得到的第二调整框点集合确定的各子区域能完全包围第二追踪点集合中各追踪点即可。
S605、对第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到第二区域,该第二区域为电子设备确定的该第一文本行在该第二视频帧中的位置。
如图11所示,为本申请实施例中子区域包围曲线平滑处理的一个示例性场景示意图。
得到的第二调整框点集合可以划分为第二调整上框点集合和第二调整下框点集合,其中,第二调整上框点集合为位于子区域上半部分的框点,第二调整下框点集合为位于子区域下半部分的框点。
电子设备可以分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合,得到平滑的包围曲线,形成第二区域,该第二区域即为电子设备确定的该第一文本行在第二视频帧中的位置。
优选的,电子设备分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合时,可以分别计算第二调整上框点集合和第二调整下框点集合中的框点的线性相关系数,根据线性相关系数的数值确定采用的拟合方式:
例如,电子设备可以计算第二调整上框点集合和第二调整下框点集合中框点的皮尔逊相关系数,如果线性相关性较强(例如相关系数大于0.8),则可以确定采用线性拟合;如果线性相关性较弱,则可以确定采用二次等更高次的拟合。
拟合的目的是进行平滑处理,使得各个子区域能继续保持连续的性质,文本行的包围曲线能够保持平滑,防止呈现锯齿状。
优选的,电子设备可以保存计算线性相关系数的中间量(例如计算皮尔逊相关系数时的中间量),以供在后续其他计算过程中需要使用相关中间量时可以直接使用保存的中间量,从而减少计算量。
可以理解的是,若在实时拍摄取景的增强现实(Augmented Reality,AR)翻译或者视频字幕翻译等场景中,确定第二区域为第一文本行在该第二视频帧中的位置后,在OCR识别出第一文本行的文字并翻译完成返回翻译结果时,可以将翻译结果回填到该第二区域。从而实现在视频中文字相应位置直接显示翻译的效果。
本申请实施例中电子设备根据第一视频帧中第一文本行OCR检测得到的第一初始框点集合,确定出将第一文本行框定在N个连续且宽度均匀的子区域中的第一扩展框点集合,再经过跟踪确定该第一扩展框点集合中框点在第二视频帧中相应的位置,形成第二计算框点集合。对该第二计算框点集合中框点的位置进行调整,得到第二调整框点集合以完全包围追踪点,并对第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到确定该第一文本行在该第二视频帧中位置的第二区域。由于为对各宽度均匀子区域的跟踪,并根据跟踪点对各子区域的框点位置进行了调整,提升了跟踪的精细度,因此不仅对普通弯曲文本行和对展现出形变性质的弯曲文本行能准确跟踪,而且还兼容直线文本场景,提升了对视频文字跟踪的效果。
实施例2:
在实时拍摄取景的增强现实(Augmented Reality,AR)翻译或者视频字幕翻译等场景中,从第一视频帧触发OCR,到OCR返回第一视频帧的文字识别结果间可能有数百毫秒的延迟,当前的视频帧已经是触发OCR的十余帧后(若以每秒传输30帧计算)。
该第一视频帧可以是实时拍摄取景时镜头稳定后取得的第一个视频帧,也可以是视频字幕翻译时视频中的第一个视频帧,还可以是视频中文字内容变化后取得的第一个视频帧,此处不作限定。
如图12所示为本申请实施例中缓存视频帧调度的一个场景示意图。
为了确保OCR识别能追上最新的视频帧,电子设备可以从第一视频帧开始维护一个固定长度为预设帧数个视频帧的缓存,该缓存用于在OCR识别第一视频帧的结果未返回时存储新产生的视频帧。
例如,该预设帧数可以为10,则该缓存中存储有第一视频帧后的10个以内的视频帧,等到OCR识别第一视频帧的结果返回后,即可从缓存中存储的这些视频帧开始做跟踪,追上最新的视频帧。
由于该缓存最多能存储预设帧数个视频帧,如果缓存已满,则每添进一个新视频帧必须删去一个旧视频帧,删除的策略是使得缓存内剩余相邻帧之间的间隔尽量保持均匀。
例如,若当前缓存中已经存储了10个视频帧,编号为【1、2、3、4、5、6、7、8、9、10】。当产生第11个视频帧时,可以将第2个视频帧删除,第11个视频帧存储进去,成为【1、3、4、5、6、7、8、9、10、11】。当产生第12个视频帧时,可以将第4个视频帧删除,第12个视频帧存储进去,成为【1、3、5、6、7、8、9、10、11、12】。当产生第13个视频帧时,可以将第6个视频帧删除,第13个视频帧存储进去,成为【1、3、5、7、8、9、10、11、 12、13】。当产生第14个视频帧时,可以将第8个视频帧删除,第14个视频帧存储进去,成为【1、3、5、7、9、10、11、12、13、14】。当产生第15个视频帧时,可以将第10个视频帧删除,第15个视频帧存储进去,成为【1、3、5、7、9、11、12、13、14、15】。当产生第16个视频帧时,可以将第12个视频帧删除,第16个视频帧存储进去,成为【1、3、5、7、9、11、13、14、15、16】。当产生第17个视频帧时,可以将第14个视频帧删除,第17个视频帧存储进去,成为【1、3、5、7、9、11、13、15、16、17】。当产生第18个视频帧时,可以将第16个视频帧删除,第18个视频帧存储进去,成为【1、3、5、7、9、11、13、15、17、18】。当产生第19个视频帧时,可以将第17个视频帧删除,第19个视频帧存储进去,成为【1、3、5、7、9、11、13、15、18、19】。当产生第20个视频帧时,可以将第19个视频帧删除,第20个视频帧存储进去,成为【1、3、5、7、9、11、13、15、18、20】。依次类推,可以有很多其他的新增和删除视频帧的方法使得在固定长度的缓存中使得相邻视频帧之间的间隔尽量保持均匀。
维护固定长度的缓存目的是防止第一个视频帧中文字较多,OCR识别的时间过久,限制缓存的大小可以缩短缓存“追上”最新视频帧的时间,进而缩短用户等待结果返还的时间,提升体验。
在本申请实施例中,除进行OCR的一个视频帧外,还缓存有若干个视频帧。进行OCR的这个视频帧为上述实施例1中的第一视频帧,缓存中的第一个视频帧为上述实施例1中的第二视频帧,缓存中的第二个视频帧为第三视频帧。
通过实施例1所述的视频文字跟踪方法在第二视频帧中确定第一文本行所在的位置(第一区域)之前,已经确定了第一文本行在第二视频帧中对应的第二调整框点集合和第二追踪点集合。
依据实施例1中步骤S603至步骤S605类似的追踪方法,可以在第三视频帧中确定第一文本行所在的位置,以及第一文本行在第三视频帧中对应的第三调整框点集合和第三追踪点集合,其步骤可以如下所示:
1、根据第三追踪点集合相对于第二追踪点集合中各追踪点的位置,确定第三调整框点集合中各框点在第三视频帧中的位置,得到第三计算框点集合,其中,第三追踪点集合中包括在第三视频帧中预测出的与该第二追踪点集合中追踪点对应位置的追踪点,第三视频帧为在该第二视频帧之后取得的一个视频帧;
2、调整第三计算框点集合中各框点的位置,得到第三调整框点集合,使得第三调整框点集合确定的各子区域完全包围第三追踪点集合中各追踪点;
3、对第三调整框点集合确定的各子区域的包围曲线进行平滑处理,得到第三区域,该第三区域为电子设备确定的该第一文本行在该第三视频帧中的位置。
步骤1-3中具体的执行过程与步骤S603~S605类似,此处不作赘述。
可以理解的是,依据上述步骤1~3类似的追踪方法,可以在后面的每个视频帧中确定第一文本行的位置,将OCR识别的结果或翻译结果回填到该位置中,直到有文本行移出取景视野以外、被其他物体遮挡或视频文字内容更换等情形,使得在相邻两帧中找出对应位置的追踪点数目的比例(视频帧中追踪点的数目相对于做OCR的第一视频帧帧中追踪点的数目的比例)小于追踪点比例阈值时,认为此次跟踪失败,待镜头稳定时或视频文字更新时重新进行OCR开始另一次跟踪流程。
下面介绍本申请实施例提供的示例性电子设备100。
图13是本申请实施例提供的电子设备100的结构示意图。
下面以电子设备100为例对实施例进行具体说明。应该理解的是,电子设备100可以具有比图中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
电子设备100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸 传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
SIM接口可以被用于与SIM卡接口195通信,实现传送数据到SIM卡或读取SIM卡中数据的功能。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复 用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode, AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用(比如人脸识别功能,指纹识别功能、移动支付功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如人脸信息模板数据,指纹信息模板等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110 中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。
图14是本发明实施例的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图14所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序(也可以称为应用)。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图14所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器,本地Profile管理助手(Local Profile Assistant,LPA)等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话界面形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
安卓运行时(Android Runtime)包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),二维图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了二维(2-Dimensional,2D)和三维(3-Dimensional,3D)图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现3D图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传 感器驱动,虚拟卡驱动。
下面结合捕获拍照场景,示例性说明电子设备100软件以及硬件的工作流程。
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头193捕获静态图像或视频。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。
上述实施例中所用,根据上下文,术语“当…时”可以被解释为意思是“如果…”或“在…后”或“响应于确定…”或“响应于检测到…”。类似地,根据上下文,短语“在确定…时”或“如果检测到(所陈述的条件或事件)”可以被解释为意思是“如果确定…”或“响应于确定…”或“在检测到(所陈述的条件或事件)时”或“响应于检测到(所陈述的条件或事件)”。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘)等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。
Claims (21)
- 一种视频文字跟踪方法,其特征在于,包括:电子设备对第一视频帧进行OCR检测,得到锚定各文本行的位置的框点,其中至少包括第一初始框点集合,所述第一初始框点集合中包含OCR识别出的用于锚定第一文本行的位置的框点,所述第一文本行为所述第一视频帧中的任一个文本行,所述第一初始框点集合中框点的数目不小于4;所述电子设备根据所述第一初始框点集合,确定第一扩展框点集合,所述第一扩展框点集合将所述第一文本行框定在N个连续且宽度均匀的子区域中,所述N为不小于2的正整数;所述电子设备根据第二追踪点集合相对于第一追踪点集合中各追踪点的位置,确定所述第一扩展框点集合中各框点在第二视频帧中的位置,得到第二计算框点集合,其中,所述第一追踪点集合中包括在所述第一视频帧中所述第一扩展框点集合确定的子区域中的追踪点,所述第二追踪点集合中包括在所述第二视频帧中预测出的与所述第一追踪点集合中追踪点对应位置的追踪点,所述第二视频帧为在所述第一视频帧之后取得的一个视频帧;所述电子设备根据所述第二计算框点集合,确定第二区域,所述第二区域为所述电子设备确定的所述第一文本行在所述第二视频帧中的位置。
- 根据权利要求1所述的方法,其特征在于,所述电子设备根据所述第二计算框点集合,确定第二区域,具体包括:所述电子设备调整所述第二计算框点集合中各框点的位置,得到第二调整框点集合,使得所述第二调整框点集合确定的各子区域完全包围所述第二追踪点集合中各追踪点;所述电子设备根据所述第二调整框点集合,确定所述第二区域。
- 根据权利要求2所述的方法,其特征在于,所述电子设备根据所述第二调整框点集合,确定所述第二区域,具体包括:所述电子设备对所述第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到所述第二区域。
- 根据权利要求3所述的方法,其特征在于,所述电子设备对所述第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到所述第二区域,具体包括:所述电子设备分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合,得到平滑的包围曲线,形成所述第二区域,所述第二调整上框点集合为位于子区域上半部分的框点,所述第二调整下框点集合为位于子区域下半部分的框点。
- 根据权利要求2至4中任一项所述的方法,其特征在于,所述电子设备调整所述第二计算框点集合中各框点的位置,得到第二调整框点集合,具体包括:所述电子设备根据所述第二追踪点集合中最高的追踪点和最低的追踪点调整所述第二计算框点集合中各框点的位置,得到所述第二调整框点集合;或,所述电子设备根据所述第二追踪点集合中距离各框点预设距离范围内最高的追踪点和最低的追踪点调整所述第二计算框点集合中各框点的位置,得到所述第二调整框点集合。
- 根据权利要求1至5中任一项所述的方法,其特征在于,所述电子设备根据所述第一初始框点集合,确定第一扩展框点集合,具体包括:当所述第一初始框点集合中框点的数目等于4时,所述电子设备在4个框点确定的矩形的上下两边上取横坐标间隔均匀的点作为新的框点,组成所述第一扩展框点集合;当所述第一初始框点集合中框点的数目大于4时,所述电子设备分别对第一初始上框点集合与第一初始下框点集合中的框点进行拟合,得到上拟合曲线和下拟合曲线,其中,所述第一初始上框点集合中包括所述第一初始框点集合中位于所述第一文本行上半部分的框点,所述第一初始下框点集合中包括所述第一初始框点集合中位于所述第一文本行下半部分的框点;所述电子设备分别在所述上拟合曲线和所述下拟合曲线上取横坐标间隔均匀的点作为新的框点,组成所述第一扩展框点集合。
- 根据权利要求1或6中任一项所述的方法,其特征在于,所述方法还包括:所述电子设备根据第三追踪点集合相对于第二追踪点集合中各追踪点的位置,确定第三调整框点集合中各框点在第三视频帧中的位置,得到第三计算框点集合,其中,所述第三追踪点集合中包括在所述第三视频帧中预测出的与所述第二追踪点集合中追踪点对应位置的追踪点,所述第三视频帧为在所述第二视频帧之后取得的一个视频帧;所述电子设备调整所述第三计算框点集合中各框点的位置,得到第三调整框点集合,使得所述第三调整框点集合确定的各子区域完全包围所述第三追踪点集合中各追踪点;所述电子设备对所述第三调整框点集合确定的各子区域的包围曲线进行平滑处理,得到第三区域,所述第三区域为所述电子设备确定的所述第一文本行在所述第三视频帧中的位置。
- 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:所述电子设备从所述第一视频帧开始维护一个固定长度为预设帧数个视频帧的缓存,所述缓存用于在OCR识别所述第一视频帧的结果未返回时存储新产生的视频帧。
- 根据权利要求8所述的方法,其特征在于,所述方法还包括:当所述缓存中存储的视频帧数目等于所述预设帧数时,所述电子设备在所述缓存中每增加一个新视频帧时删除一个旧视频帧,其中,所述缓存中存储的相邻视频帧的取得时间的差值小于预设间隔时间。
- 一种电子设备,其特征在于,所述电子设备包括:一个或多个处理器和存储器;所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:对第一视频帧进行OCR检测,得到锚定各文本行的位置的框点,其中至少包括第一初始框点集合,所述第一初始框点集合中包含OCR识别出的用于锚定第一文本行的位置的框点,所述第一文本行为所述第一视频帧中的任一个文本行,所述第一初始框点集合中框点的数目不小于4;根据所述第一初始框点集合,确定第一扩展框点集合,所述第一扩展框点集合将所述第一文本行框定在N个连续且宽度均匀的子区域中,所述N为不小于2的正整数;根据第二追踪点集合相对于第一追踪点集合中各追踪点的位置,确定所述第一扩展框点集合中各框点在第二视频帧中的位置,得到第二计算框点集合,其中,所述第一追踪点集合中包括在所述第一视频帧中所述第一扩展框点集合确定的子区域中的追踪点,所述第二追踪点集合中包括在所述第二视频帧中预测出的与所述第一追踪点集合中追踪点对应位置的追踪点,所述第二视频帧为在所述第一视频帧之后取得的一个视频帧;根据所述第二计算框点集合,确定第二区域,所述第二区域为确定的所述第一文本行在所述第二视频帧中的位置。
- 根据权利要求10所述的电子设备,其特征在于,所述一个或多个处理器,具体用于调用所述计算机指令以使得所述电子设备执行:调整所述第二计算框点集合中各框点的位置,得到第二调整框点集合,使得所述第二调整框点集合确定的各子区域完全包围所述第二追踪点集合中各追踪点;根据所述第二调整框点集合,确定所述第二区域。
- 根据权利要求11所述的电子设备,其特征在于,所述一个或多个处理器,具体用于调用所述计算机指令以使得所述电子设备执行:对所述第二调整框点集合确定的各子区域的包围曲线进行平滑处理,得到所述第二区域。
- 根据权利要求12所述的电子设备,其特征在于,所述一个或多个处理器,具体用于调用所述计算机指令以使得所述电子设备执行:分别对第二调整上框点集合和第二调整下框点集合中的框点进行拟合,得到平滑的包围曲线,形成所述第二区域,所述第二调整上框点集合为位于子区域上半部分的框点,所述第二调整下框点集合为位于子区域下半部分的框点。
- 根据权利要求11至13中任一项所述的电子设备,其特征在于,所述一个或多个处理器,具体用于调用所述计算机指令以使得所述电子设备执行:根据所述第二追踪点集合中最高的追踪点和最低的追踪点调整所述第二计算框点集合中各框点的位置,得到所述第二调整框点集合;或,根据所述第二追踪点集合中距离各框点预设距离范围内最高的追踪点和最低的追踪点调整所述第二计算框点集合中各框点的位置,得到所述第二调整框点集合。
- 根据权利要求10至14中任一项所述的电子设备,其特征在于,所述一个或多个处理器,具体用于调用所述计算机指令以使得所述电子设备执行:当所述第一初始框点集合中框点的数目等于4时,在4个框点确定的矩形的上下两边上取横坐标间隔均匀的点作为新的框点,组成所述第一扩展框点集合;当所述第一初始框点集合中框点的数目大于4时,分别对第一初始上框点集合与第一初始下框点集合中的框点进行拟合,得到上拟合曲线和下拟合曲线,其中,所述第一初始上框点集合中包括所述第一初始框点集合中位于所述第一文本行上半部分的框点,所述第一初始下框点集合中包括所述第一初始框点集合中位于所述第一文本行下半部分的框点;分别在所述上拟合曲线和所述下拟合曲线上取横坐标间隔均匀的点作为新的框点,组成所述第一扩展框点集合。
- 根据权利要求10至15中任一项所述的电子设备,其特征在于,所述一个或多个处理器,还用于调用所述计算机指令以使得所述电子设备执行:根据第三追踪点集合相对于第二追踪点集合中各追踪点的位置,确定第三调整框点集合中各框点在第三视频帧中的位置,得到第三计算框点集合,其中,所述第三追踪点集合中包括在所述第三视频帧中预测出的与所述第二追踪点集合中追踪点对应位置的追踪点,所述第三视频帧为在所述第二视频帧之后取得的一个视频帧;调整所述第三计算框点集合中各框点的位置,得到第三调整框点集合,使得所述第三调整框点集合确定的各子区域完全包围所述第三追踪点集合中各追踪点;对所述第三调整框点集合确定的各子区域的包围曲线进行平滑处理,得到第三区域,所述第三区域为确定的所述第一文本行在所述第三视频帧中的位置。
- 根据权利要求10至16中任一项所述的电子设备,其特征在于,所述一个或多个处 理器,还用于调用所述计算机指令以使得所述电子设备执行:从所述第一视频帧开始维护一个固定长度为预设帧数个视频帧的缓存,所述缓存用于在OCR识别所述第一视频帧的结果未返回时存储新产生的视频帧。
- 根据权利要求17所述的电子设备,其特征在于,所述一个或多个处理器,还用于调用所述计算机指令以使得所述电子设备执行:当所述缓存中存储的视频帧数目等于所述预设帧数时,在所述缓存中每增加一个新视频帧时删除一个旧视频帧,其中,所述缓存中存储的相邻视频帧的取得时间的差值小于预设间隔时间。
- 一种芯片,所述芯片应用于电子设备,所述芯片包括一个或多个处理器,所述处理器用于调用计算机指令以使得所述电子设备执行如权利要求1-9中任一项所述的方法。
- 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-9中任一项所述的方法。
- 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在电子设备上运行时,使得所述电子设备执行如权利要求1-9中任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/800,347 US20230058296A1 (en) | 2020-02-21 | 2021-01-14 | Video text tracking method and electronic device |
EP21757916.8A EP4086809A4 (en) | 2020-02-21 | 2021-01-14 | TELETEXT TRACKING METHOD AND ELECTRONIC DEVICE |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010108338.4 | 2020-02-21 | ||
CN202010108338.4A CN113297875B (zh) | 2020-02-21 | 2020-02-21 | 一种视频文字跟踪方法及电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021164479A1 true WO2021164479A1 (zh) | 2021-08-26 |
Family
ID=77317690
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/071796 WO2021164479A1 (zh) | 2020-02-21 | 2021-01-14 | 一种视频文字跟踪方法及电子设备 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230058296A1 (zh) |
EP (1) | EP4086809A4 (zh) |
CN (1) | CN113297875B (zh) |
WO (1) | WO2021164479A1 (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6937766B1 (en) * | 1999-04-15 | 2005-08-30 | MATE—Media Access Technologies Ltd. | Method of indexing and searching images of text in video |
CN109800757A (zh) * | 2019-01-04 | 2019-05-24 | 西北工业大学 | 一种基于布局约束的视频文字追踪方法 |
CN110276349A (zh) * | 2019-06-24 | 2019-09-24 | 腾讯科技(深圳)有限公司 | 视频处理方法、装置、电子设备及存储介质 |
CN110555433A (zh) * | 2018-05-30 | 2019-12-10 | 北京三星通信技术研究有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
CN110796130A (zh) * | 2019-09-19 | 2020-02-14 | 北京迈格威科技有限公司 | 用于文字识别的方法、装置及计算机存储介质 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6473522B1 (en) * | 2000-03-14 | 2002-10-29 | Intel Corporation | Estimating text color and segmentation of images |
US8316238B2 (en) * | 2006-10-25 | 2012-11-20 | Verizon Patent And Licensing Inc. | Method and system for providing image processing to track digital information |
CN100593792C (zh) * | 2008-03-10 | 2010-03-10 | 北京航空航天大学 | 一种视频中的文本跟踪和多帧增强方法 |
JP5491836B2 (ja) * | 2009-01-30 | 2014-05-14 | 株式会社東芝 | 超音波診断装置、超音波画像処理装置、医用画像診断装置及び医用画像処理装置 |
CN101833664A (zh) * | 2010-04-21 | 2010-09-15 | 中国科学院自动化研究所 | 基于稀疏表达的视频图像文字检测方法 |
CN111466112A (zh) * | 2018-08-10 | 2020-07-28 | 华为技术有限公司 | 一种图像拍摄方法及电子设备 |
CN110147724B (zh) * | 2019-04-11 | 2022-07-01 | 北京百度网讯科技有限公司 | 用于检测视频中的文本区域的方法、装置、设备以及介质 |
CN110633664A (zh) * | 2019-09-05 | 2019-12-31 | 北京大蛋科技有限公司 | 基于人脸识别技术追踪用户的注意力方法和装置 |
-
2020
- 2020-02-21 CN CN202010108338.4A patent/CN113297875B/zh active Active
-
2021
- 2021-01-14 WO PCT/CN2021/071796 patent/WO2021164479A1/zh unknown
- 2021-01-14 EP EP21757916.8A patent/EP4086809A4/en active Pending
- 2021-01-14 US US17/800,347 patent/US20230058296A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6937766B1 (en) * | 1999-04-15 | 2005-08-30 | MATE—Media Access Technologies Ltd. | Method of indexing and searching images of text in video |
CN110555433A (zh) * | 2018-05-30 | 2019-12-10 | 北京三星通信技术研究有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
CN109800757A (zh) * | 2019-01-04 | 2019-05-24 | 西北工业大学 | 一种基于布局约束的视频文字追踪方法 |
CN110276349A (zh) * | 2019-06-24 | 2019-09-24 | 腾讯科技(深圳)有限公司 | 视频处理方法、装置、电子设备及存储介质 |
CN110796130A (zh) * | 2019-09-19 | 2020-02-14 | 北京迈格威科技有限公司 | 用于文字识别的方法、装置及计算机存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US20230058296A1 (en) | 2023-02-23 |
CN113297875B (zh) | 2023-09-29 |
EP4086809A4 (en) | 2023-06-21 |
EP4086809A1 (en) | 2022-11-09 |
CN113297875A (zh) | 2021-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020253719A1 (zh) | 一种录屏方法及电子设备 | |
WO2020259452A1 (zh) | 一种移动终端的全屏显示方法及设备 | |
WO2020224485A1 (zh) | 一种截屏方法及电子设备 | |
CN115473957B (zh) | 一种图像处理方法和电子设备 | |
CN115866121B (zh) | 应用界面交互方法、电子设备和计算机可读存储介质 | |
CN109559270B (zh) | 一种图像处理方法及电子设备 | |
CN114650363B (zh) | 一种图像显示的方法及电子设备 | |
WO2021258814A1 (zh) | 视频合成方法、装置、电子设备及存储介质 | |
CN113132526A (zh) | 一种页面绘制方法及相关装置 | |
CN113935898A (zh) | 图像处理方法、系统、电子设备及计算机可读存储介质 | |
WO2020233593A1 (zh) | 一种前景元素的显示方法和电子设备 | |
CN114283195B (zh) | 生成动态图像的方法、电子设备及可读存储介质 | |
CN115633255B (zh) | 视频处理方法和电子设备 | |
CN115333941A (zh) | 获取应用运行情况的方法及相关设备 | |
WO2021238740A1 (zh) | 一种截屏方法及电子设备 | |
CN114205457B (zh) | 一种移动用户界面元素的方法、电子设备及存储介质 | |
WO2022033344A1 (zh) | 视频防抖方法、终端设备和计算机可读存储介质 | |
WO2021204103A1 (zh) | 照片预览方法、电子设备和存储介质 | |
WO2021164479A1 (zh) | 一种视频文字跟踪方法及电子设备 | |
CN117724863A (zh) | 一种目标信号处理方法和电子设备 | |
CN114691248B (zh) | 显示虚拟现实界面的方法、装置、设备和可读存储介质 | |
CN115686403A (zh) | 显示参数的调整方法、电子设备、芯片及可读存储介质 | |
CN115994007A (zh) | 动画效果显示方法及电子设备 | |
CN113970965A (zh) | 消息显示方法和电子设备 | |
CN113495733A (zh) | 主题包安装方法、装置、电子设备及计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21757916 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021757916 Country of ref document: EP Effective date: 20220801 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |