WO2019104478A1 - 一种识别截图文字的方法及终端 - Google Patents

一种识别截图文字的方法及终端 Download PDF

Info

Publication number
WO2019104478A1
WO2019104478A1 PCT/CN2017/113333 CN2017113333W WO2019104478A1 WO 2019104478 A1 WO2019104478 A1 WO 2019104478A1 CN 2017113333 W CN2017113333 W CN 2017113333W WO 2019104478 A1 WO2019104478 A1 WO 2019104478A1
Authority
WO
WIPO (PCT)
Prior art keywords
control
screenshot
visible
terminal
text
Prior art date
Application number
PCT/CN2017/113333
Other languages
English (en)
French (fr)
Inventor
朱超
庄志山
陈绍君
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2017/113333 priority Critical patent/WO2019104478A1/zh
Priority to CN201780082015.9A priority patent/CN110168566B/zh
Publication of WO2019104478A1 publication Critical patent/WO2019104478A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Definitions

  • the embodiments of the present invention relate to the field of terminals, and in particular, to a method and a terminal for identifying a screenshot text.
  • terminals such as mobile phones have become an indispensable part of people's daily lives. Users can not only communicate with other users, but also browse or process various types of information.
  • the user For content that is of interest to the mobile phone display, such as the user is interested in some text, the user usually uses the screen capture function to save it in the form of a screenshot to facilitate subsequent use. If it is necessary to identify the text in the screenshot, the prior art is usually implemented using Optical Character Recognition (OCR) technology.
  • OCR Optical Character Recognition
  • OCR technology to identify the text in the screenshot usually requires several steps such as preprocessing, feature extraction, classification training recognition, and post processing. Since the classification training identification process needs to use enough sample data for marker training, the post-processing link needs to continuously correct the recognized results. Therefore, it takes a long time to recognize the screenshot text.
  • the embodiment of the present application provides a method and a terminal for recognizing a screenshot text, and solves the problem that the OCR technology is used to recognize a screenshot text.
  • a first aspect of the embodiments of the present application provides a method for identifying a screenshot text, including:
  • the terminal Receiving, by the terminal, the first input of the user; the terminal acquiring the intercepted area in response to the received first input, where the intercepted area is an entire area or a partial area of the interface displayed by the terminal; the terminal intercepts the content of the page in the intercepted area, and generates a screenshot; The terminal acquires the visibility attribute in the intercepted area as a visible control, and obtains the text content in the visible control as the visible attribute; the terminal stores the text content of the at least one control in the visible control as a visible control in association with the screenshot.
  • the terminal acquires the interception region in response to the received first input, intercepts the page content in the intercepted region, generates a screenshot, and acquires a visible property in the intercepted region as a visible control.
  • Obtaining the visibility attribute as the text content in the visible control, and the visibility property obtained by the terminal is the text content of at least one control in the visible control as the text content of the screenshot is stored in association with the screenshot. In this way, by using the text content in the visible control property in the intercepted area obtained during the screen capture as the text of the screenshot, the time taken to recognize the text is reduced compared to the recognition of the screenshot text by the OCR technique.
  • the terminal stores the text content of the at least one control of the visible attribute as a visible control in association with the screenshot, which may include: the terminal has the visibility attribute as The textual content of at least one of the visible controls is stored in association with the storage path of the screenshot.
  • the terminal stores, in association with the screenshot, the text content of the at least one control in the visibility attribute as a visible control, which may specifically include : the terminal stores the text property of the at least one control in the visibility property as a visible control in a screenshot In the header information.
  • the screenshot can still be found by the text content saved in the header information of the screenshot.
  • the obtaining the text attribute in the visible control is specifically: the terminal acquiring the visibility attribute is visible.
  • the type is a first type of control, the first type is a text control type and/or an image control type; the terminal obtains text content from a text attribute of the type of the first type of control. In this way, by taking the text content of the screenshot from the filtered controls that may contain the text content, the time taken to obtain the screenshot text is further saved.
  • the method for identifying the screenshot text may further include: the terminal constructing a search index file according to the stored text content and a storage path of the screenshot, the search index The file is used to find screenshots. In this way, by establishing a search index according to the text content of the obtained screenshot, the user's demand for high-precision search of the picture text is satisfied.
  • a second aspect of the embodiments of the present application provides a terminal, including: one or more processors, a memory, and an input unit; a memory, an input unit coupled to one or more processors, and a memory for storing computer program code, the computer
  • the program code includes computer instructions, when the one or more processors execute the computer instructions, the input unit is configured to receive a first input of the user, and the processor is configured to acquire the intercepted region in response to the first input, and intercept the intercepted region The content of the page, and generate a screenshot, obtain the visible property in the intercepted area as a visible control, and obtain the text content in the visible control property, and the intercepted area is the entire area or partial area of the interface displayed by the terminal a memory for storing the text content of at least one of the controls having the visibility attribute as visible in association with the screenshot.
  • the memory is specifically configured to store the text content of the at least one control in the visible property as a visible storage in association with the storage path of the screenshot.
  • the memory is specifically configured to store text content of at least one control of the visible property as a visible control in the header information of the screenshot .
  • the processor is specifically configured to acquire a visibility attribute as a visible control, and the type is a first type of control, the first type For text control types and/or image control types, get the text content from the text property of the control of type first.
  • the processor is further configured to build a search index file according to the stored text content and a storage path of the screenshot, where the search index file is used. For finding screenshots.
  • a third aspect of the embodiments of the present application provides a terminal, including:
  • a receiving unit configured to receive a first input of the user, and an acquiring unit, configured to acquire a intercepted area in response to the first input received by the receiving unit, where the intercepted area is an entire area or a partial area of the interface displayed by the terminal;
  • the generating unit is configured to generate a screenshot;
  • the acquiring unit is further configured to obtain a control in which the visibility attribute in the intercepted area is visible, and obtain the visibility attribute as a visible control.
  • the text unit is configured to store the visibility attribute acquired by the obtaining unit as the text content of the at least one control in the visible control and the screenshot generated by the generating unit.
  • the storage unit is specifically configured to set the visibility attribute to The textual content of at least one of the visible controls is stored in association with the storage path of the screenshot.
  • the storage unit is specifically configured to store the text content of the at least one control in the visible property as a visible control in the header information of the screenshot.
  • the acquiring unit is specifically configured to obtain a control of the first type in the control whose visibility attribute is visible, and the type is the first type. Get the text content in the text property of the control, the first type is the text control type and / or the image control type.
  • the method further includes: a building unit, configured to build a search index file according to the stored text content and a storage path of the screenshot, and the search index file is used to search Screenshot.
  • a fourth aspect of the embodiments of the present application provides a computer storage medium, comprising computer instructions, when the computer instructions are run on the terminal, causing the terminal to perform any one of the possible implementations of the first aspect or the first aspect.
  • the method of identifying the screenshot text is a computer storage medium, comprising computer instructions, when the computer instructions are run on the terminal, causing the terminal to perform any one of the possible implementations of the first aspect or the first aspect.
  • a fifth aspect of the embodiments of the present application provides a computer program product, when the computer program product is run on a computer, causing the computer to perform the identification of any of the first aspect or the possible implementation of the first aspect The method of the screenshot text.
  • the terminal of the second aspect and the third aspect, the computer storage medium of the fourth aspect, and the computer program product of the fifth aspect are both used to perform the corresponding Methods, and therefore, the beneficial effects that can be achieved can be referred to the beneficial effects in the corresponding methods provided above, and are not described herein again.
  • FIG. 1 is a schematic structural diagram of hardware of a mobile phone according to an embodiment of the present application.
  • 2A is a front view of a mobile phone according to an embodiment of the present application.
  • 2B is a schematic diagram 1 of a display interface according to an embodiment of the present application.
  • FIG. 2C is a schematic diagram 2 of a display interface according to an embodiment of the present application.
  • 2D is a schematic diagram 3 of a display interface provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a system of a terminal according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a method for identifying a screenshot text according to an embodiment of the present application
  • FIG. 5 is a schematic diagram 4 of a display interface according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of constructing a search index according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram 5 of a display interface according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a format of an Exif according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of another terminal according to an embodiment of the present application.
  • the text displayed by the terminal is sometimes saved in the form of a screenshot for subsequent viewing.
  • the prior art is usually implemented using OCR technology.
  • Adopt OCR technology recognizes screenshot text and usually takes a long time.
  • the method of the present invention provides a method for recognizing the screenshot text.
  • the time taken to recognize the screenshot text is reduced compared to the recognition of the screenshot text by the OCR technique.
  • Controls Elements that are rendered in a graphical user interface are often referred to as controls, which can provide a user with certain operations or for displaying certain content.
  • visibility attribute whether the control is visible or not is referred to as a visibility attribute.
  • Visibility attributes have three possible values: visible, invisible, and gone. Among them, visible means visible, invisible means invisible but occupies the layout position, and gone indicates invisible and does not occupy the layout position.
  • the control of the visibility attribute being visible can be simply understood as a control that the user wants to see in the program development design, and the visibility property is invisible and the control of the goal can be simply understood as not in the program development design. The controls that you want the user to see.
  • the visibility properties of some controls may need to be switched, which may be set to invisible by default, and changed to visible when needed, ie from invisible to visible.
  • the method for identifying the screenshot text provided by the embodiment of the present application may be applied to the terminal.
  • the terminal can be a tablet, a desktop, a laptop, a laptop, an Ultra-mobile Personal Computer (UMPC), a handheld computer, a netbook, a Personal Digital Assistant (PDA).
  • UMPC Ultra-mobile Personal Computer
  • PDA Personal Digital Assistant
  • the device such as a wearable electronic device, a smart watch, or the like, may also be the mobile phone 100 shown in FIG. 1.
  • the specific form of the terminal in the embodiment of the present application is not particularly limited.
  • the terminal in the embodiment of the present application may be the mobile phone 100.
  • FIG. 1 is a schematic diagram of the hardware structure of the mobile phone 100. It should be understood that the illustrated handset 100 is merely one example of a terminal. Also, the handset 100 may have more or fewer components than those shown in the figures, two or more components shown in the figures may be combined, or may have different component arrangements.
  • the mobile phone 100 may include: a display 101 , an input unit 102 , a processor 103 , a memory 104 , a power source 105 , a radio frequency (RF) circuit 106 , a sensor 107 , an audio circuit 108 , a speaker 109 , and a microphone 110 .
  • RF radio frequency
  • RF radio frequency
  • 111 Wireless Fidelity module 111 and other components, these components can be connected by bus, or directly connected.
  • the display 101 can be used to display information input by the user or information provided to the user, as well as various menus of the mobile phone 100, and can also accept input operations of the user.
  • the display 101 may include a display panel 101-1 and a touch panel 101-2.
  • the display panel 101-1 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • the touch panel 101-2 may also be referred to as a touch screen, a touch sensitive screen, a touch screen, etc., and may collect users on or attached thereto.
  • the near-contact or non-contact operation (such as the operation of the user using a finger, a stylus, or the like on the touch panel 101-2 or in the vicinity of the touch panel 101-2 may also include a somatosensory operation; It includes operation types such as single-point control operation and multi-point control operation, and drives the corresponding connection device according to a preset program.
  • the touch panel 101-2 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects a touch orientation and a posture of the user, and detects a signal brought by the touch operation, and transmits a signal to the touch controller; the touch controller receives the touch signal from the touch detection device, and converts the received touch signal
  • the information that processor 103 can process is then sent to processor 103 and can receive commands from processor 103 and execute them.
  • the touch panel 101-2 can be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave.
  • the touch panel 101-2 can be implemented by any technology developed in the future. limited.
  • the touch panel 101-2 can cover the display panel 101-1, and the user can display the content according to the display panel 101-1 (the displayed content includes any one or more of the following combinations: a soft keyboard, a virtual mouse, The virtual keys, icons, and the like are operated on or near the touch panel 101-2 covered on the display panel 101-1.
  • the touch panel 101-2 detects an operation thereon or nearby, it is transmitted to the processor 103 through the input/output subsystem to determine the user input, and then the processor 103 passes through the input/output subsystem on the display panel 101 according to the user input. A corresponding visual output is provided on the -1.
  • the touch panel 101-2 and the display panel 101-1 are used as two separate components to implement the input and input functions of the mobile phone 100, in some embodiments, the touch panel 101 may be The input and output functions of the mobile phone 100 are realized by integrating with the display panel 101-1.
  • the input unit 102 may be the touch panel 101-2 or other input devices.
  • the other input device can be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the handset 100.
  • the other input device may include any one or more of the following combinations: a physical keyboard, function keys (such as a volume control button, a switch button, etc.), a trackball, a mouse, a joystick, a light mouse (light mouse) Is a touch-sensitive surface that does not display a visual output, or an extension of a touch-sensitive surface formed by a touch screen, and the like.
  • the other input devices are coupled to other input device controllers of the input/output subsystem and are in signal communication with the processor 103 under the control of other device input controllers.
  • the processor 103 is the control center of the handset 100, connecting various portions of the entire handset 100 using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 104, and recalling data stored in the memory 104.
  • the various functions and processing data of the mobile phone 100 are executed to perform overall monitoring of the mobile phone 100.
  • the processor 103 may include one or more processing units; the processor 103 may integrate an application processor and a modem processor.
  • the application processor mainly processes an operating system, a user interface, an application, and the like, and the modem processor mainly processes wireless communication. It can be understood that the above-mentioned modem processor can also be separately provided from the application processor.
  • the processor can be a modem, an application processor, or a modem and an application processor.
  • the memory 104 can be used to store data, software programs, and modules.
  • the processor 103 executes various functions and data processing of the mobile phone 100 by executing data, software programs, and modules stored in the memory 104, for example, performing the embodiments of the present application.
  • the memory 104 can mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the mobile phone 100 (such as audio data, Phone book, etc.).
  • the memory 104 can be volatile Volatile memory, such as random-access memory (RAM), high-speed random access memory; or non-volatile memory (Non-Volatile Memory), such as disk storage devices, flash devices, read-only Read-Only Memory (ROM), Flash Memory, Hard Disk Drive (HDD) or Solid-State Drive (SSD); or a combination of the above types of memory.
  • volatile Volatile memory such as random-access memory (RAM), high-speed random access memory
  • non-Volatile Memory such as disk storage devices, flash devices, read-only Read-Only Memory (ROM), Flash Memory, Hard Disk Drive (HDD) or Solid-State Drive (SSD); or a combination of the above types of memory.
  • the power source 105 which can be a battery, is logically coupled to the processor 103 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
  • the RF circuit 106 can be used for transmitting and receiving information or during a call, receiving and transmitting signals, and in particular, processing the received downlink information of the base station to the processor 103; in addition, transmitting the designed uplink data to the base station.
  • RF circuitry 106 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.
  • LNA Low Noise Amplifier
  • RF circuitry 106 can also communicate with the network and other devices via wireless communication.
  • the wireless communication may use any communication standard or protocol, including one or a combination of the following: Global System of Mobile communication (GSM), General Packet Radio Service (GPRS) ), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), e-mail, Short Message Service (Short Messaging Service) , SMS) and so on.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • SMS Short Message Service
  • the handset 100 may also include at least one sensor 107, such as a light sensor, a speed sensor, a Global Position System (GPS) sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 101-1 according to the brightness of the ambient light, and the proximity sensor may close the display panel when the mobile phone 100 moves to the ear. 101-1 and / or backlight.
  • the accelerometer sensor can detect the magnitude of the acceleration of the mobile phone 100 in various directions (generally three axes), and can detect the magnitude and direction of gravity when stationary, and can be used to identify the gesture of the mobile phone 100 (such as horizontal and vertical).
  • the audio circuit 108, the speaker 109, and the microphone 110 can provide an audio interface between the user and the handset 100.
  • the audio circuit 108 can transmit the converted electrical data of the received audio data to the speaker 109 for conversion to the sound signal output by the speaker 109; on the other hand, the microphone 110 converts the collected sound signal into an electrical signal by the audio circuit 108. After receiving, it is converted into audio data, and then the audio data is output to the RF circuit 106 for transmission to, for example, another mobile phone, or the audio data is output to the processor 103 for further processing.
  • the WiFi module 111 may be a module including a driver of a WiFi chip and a WiFi chip, and the WiFi chip is capable of running a wireless Internet standard protocol.
  • an operating system is running.
  • the iOS operating system developed by Apple
  • the Windows operating system developed by Microsoft The running application can be installed on this operating system.
  • the mobile phone 100 may further include components such as a Bluetooth module, a camera, and the like.
  • the Bluetooth module is a Bluetooth-enabled Printed Circuit Board Assembly (PCBA) for short-range wireless communication.
  • PCBA Bluetooth-enabled Printed Circuit Board Assembly
  • the input unit 102 can be configured to receive a first input of the user.
  • the processor 103 is configured to: in response to the first input received by the input unit 102, acquire a clipping region, intercept the page content in the intercepted region, and generate a screenshot.
  • FIG. 2A is a front view of the mobile phone 100 according to an embodiment of the present application.
  • the first input is a volume control button (such as a volume “+” key) 201 and a switch button 202 that are simultaneously pressed by the mobile phone 100.
  • the volume control button such as the volume "+” key
  • the switch button 202 of the mobile phone 100 can be simultaneously pressed to trigger the processor 103 to start the screen capture. Function to generate screenshots.
  • the first input is a double tap operation.
  • a double-click operation can be performed, and the processor 103 starts the screen capture function in response to the double-click operation, so that the page content in the interface displayed by the mobile phone 100 can be intercepted, thereby generating a screenshot.
  • the first input is an operation on a virtual button such as a screen button.
  • the user wants to use the screen capture function, as shown in (a) of FIG. 2B, the user performs a slide operation in accordance with the slide trajectory 203.
  • the pull-down menu 204 in response to the sliding operation display 101 displaying a pull-down menu 204, the pull-down menu 204 includes a screen capture button 205.
  • the user can click the screen capture button 205, and the processor 103 activates the screen capture function in response to the click operation, so that the page content in the interface currently displayed by the mobile phone 100 can be intercepted, thereby generating a screenshot.
  • the processor 103 can start the screen capture function and intercept the page content that cannot be displayed in the current interface.
  • the user is Browse long web content, or a large number of words, PDF documents, etc., to meet the needs of users with long screen shots.
  • the mobile phone 100 can display a hovering box 208, and the user can change the size of the hovering box 208 by a selection operation to intercept the page content in a part of the current interface, thereby generating a screenshot.
  • the volume control button such as the volume "+” button
  • the switch button 202 of the mobile phone 100 or performs a double tap gesture, or clicks the screen capture button 204, as shown in FIG. 2D.
  • the mobile phone 100 can display a hovering box 208, and the user can change the size of the hovering box 208 by a selection operation to intercept the page content in a part of the current interface, thereby generating a screenshot.
  • the processor 103 can also be configured to obtain a control in which the visibility attribute in the intercepted area is visible, and obtain the text content in the visible control as the visible attribute.
  • the memory 104 can be configured to store the text content of at least one of the controls whose visibility attribute is visible in association with the screenshot.
  • the acquired visibility attribute is that the text content of at least one control in the visible control may be saved in the header information of the screenshot, and the screenshot is stored in the memory 104, and may also be stored in the memory 104 in association with the storage path of the screenshot. In order for the user to use it later.
  • the processor in this embodiment can obtain all the visibility attributes in the intercepted area as visible, so that the processor can obtain the text content in the visible control.
  • FIG. 3 is a schematic structural diagram of a system of a terminal according to an embodiment of the present disclosure.
  • the operating system of the terminal is an Android system.
  • the system architecture may include an application layer 301, an application framework layer 302, a system runtime layer 303, and a Linux kernel layer 304.
  • Application layer 301, application framework layer 302, system runtime layer 303, and Linux kernel layer 304 run in the application processor. among them:
  • the application layer 301 is a hierarchy of interaction with users in the Android system.
  • Application layer 301 includes a terminal Various applications (third party applications and/or system applications) that can access the services provided by the application framework layer 302 according to different applications. For example, when intercepting content in the displayed interface, the screen capture application can access the screen capture interface management service provided by the application framework layer 302.
  • the application framework layer 302 is used to provide the application layer 301 with relevant Application Program Interfaces (APIs) and services to provide support for the operation of applications in the application layer 301.
  • APIs Application Program Interfaces
  • the application framework layer 302 provides different APIs for the application layer 301, and the services are different.
  • the application framework layer 302 can provide the application layer 301 with an API related to the screen capture function, and provide the screen capture interface management service for the application layer 301 to implement the screen capture function.
  • System runtime layer 303 and Linux kernel layer 304 are used to support the normal operation of the system.
  • an input for instructing to activate the screen capture function can be performed.
  • the application framework layer 302 monitors the user's input, it notifies the screen capture application of the application layer 301.
  • the screen capture application in the application layer 301 accesses the screen capture interface management service provided by the application framework layer 302 through the related API to implement the screen capture function, thereby generating a screenshot.
  • the application framework layer 302 calls the relevant interface to obtain a control in the displayed interface that intercepts the visibility attribute in the area, and obtains the text content in the visible control.
  • the application layer 301 can also store the screenshots in association with the textual content of at least one of the controls whose visibility properties are visible.
  • the application framework layer 302 can obtain all of the visibility properties in the intercepted region as visible controls, and can retrieve the textual content of all controls whose visibility properties are visible.
  • the application layer 301 can store the screenshot in association with the textual content of at least one of the controls whose visibility attributes are visible.
  • FIG. 4 is a schematic flowchart of a method for identifying a screenshot text according to an embodiment of the present application. As shown in FIG. 4, the method for recognizing the screenshot text may include:
  • the terminal receives a first input of the user.
  • the screen capture function of the mobile phone can store the text to be saved in the form of a screenshot.
  • the user can perform a first input to trigger the phone to initiate a screen capture function.
  • the application framework layer can monitor the user's first input.
  • the first input may specifically be a function key of the user to the mobile phone (eg, a volume control button: volume “+” button
  • a function key of the user to the mobile phone eg, a volume control button: volume “+” button
  • the operation of the volume "-" button, the switch button, etc.) or the function combination button may be the operation of the user's virtual button on the mobile phone, or may be user input.
  • the voice command or may also be a preset gesture input by the user.
  • the preset gesture may be any one of a click gesture, a swipe gesture, a pressure recognition gesture, a long press gesture, an area change gesture, a double press gesture, and a double tap gesture.
  • the first input may include: operation of a function key or a function key of the mobile phone by the user or operation of a virtual key of the mobile phone or input voice command or input preset Any of the gestures, and a sliding operation input by the user, the sliding operation is used to trigger the mobile phone to display the content of the page that the current interface fails to display.
  • the first input may specifically include: operation of the function key or function combination key of the mobile phone or operation of the virtual key of the mobile phone or input voice Any one of an instruction or an input preset gesture, and a selection operation input by the user, which is used to select an area in the currently displayed interface that needs to be intercepted.
  • the terminal responds to the first input to obtain a intercepted area.
  • the intercepted area is the entire area or part of the displayed interface.
  • the entire area of the displayed interface may be the entire area of the currently displayed interface, or the entire area of the interface displayed by scrolling.
  • the intercepted area may be acquired according to the first input of the user.
  • the intercepted area is the currently displayed interface. The entire area.
  • the interception area is Scroll through the entire area of the displayed interface.
  • the interception area is A partial area of the currently displayed interface.
  • the first input is a voice command input by the user, such as a voice command being "execute a screen capture operation.”
  • the application framework layer monitors the voice instruction "execute screen capture operation" input by the user, it can be determined according to the voice instruction that the screen capture operation needs to be performed, and the interception area is the entire screen range.
  • the application framework layer can determine that the intercepted area is the display range of the mobile phone screen, such as the intercepted area is [(0, 0), (1920, 1080)], where (0, 0) is the coordinates of the upper left corner of the mobile phone screen, ( 1920, 1080) is the coordinates of the lower right corner of the phone screen.
  • the first input is a preset gesture and a selection operation input by the user
  • the selection operation is for selecting an area that needs to be intercepted.
  • the application framework layer monitors the preset gesture and the selection operation input by the user, it may be determined according to the preset gesture that the screen capture operation needs to be performed, and according to the selection operation, it may be determined that the area selected by the user is to be intercepted, and according to the area finally selected by the user, You can determine the interception area.
  • the user wants to intercept all the page content displayed in the current interface, and the first input is an operation of the user's function key combination of the mobile phone: the combination of the volume “+” key and the switch button.
  • the user is interested in the chat content 501 in the chat interface, and wants to save the chat content 501 in the form of a screenshot.
  • the user simultaneously presses the volume "+" key 502 and the switch button 503.
  • a corresponding screen capture event is generated, which is used to indicate intercepting the content in the interface displayed by the mobile phone.
  • the application framework layer can determine that a screen capture operation needs to be performed according to the screen capture event, and the entire screen range needs to be intercepted to determine the interception area as the display range of the mobile phone screen.
  • the terminal intercepts the content of the page in the intercepted area, and generates a screenshot.
  • the application framework layer can notify the application layer of the screen capture application user that the screen capture function is desired.
  • Screen capture in the application layer The application accesses the screen capture interface management service provided by the application framework layer through the related API, and can intercept the page content in the intercepted area in the displayed interface. After the interception is successful, the screenshot of the application layer can generate screenshots. And, the phone can also display the generated screenshots and switch back to the screen when the screen was taken.
  • the terminal acquires a visible property in the intercepted area as a visible control.
  • the terminal can also obtain all the visibility attributes in the intercepted area as visible controls.
  • the application framework layer can first use the Activity Manager class to get the displayed interface. Specifically, the application framework layer calls the interface ActivityManager.getRunningTasks(int M) to obtain the task list currently running by the mobile phone, and obtains the information of the topmost activity from the obtained task list through the variable topActivity, such as the class name of the topmost activity.
  • the application framework layer can obtain the displayed interface by means of reflection, according to the class name of the topmost Activity obtained, using the ActivityThread.currentActivityThread method and the mActivities member variable.
  • the displayed interface may be an interface of an application in the mobile phone, or may be a desktop of the mobile phone.
  • the application may be a system application or a third party application.
  • the application framework layer loops through each control according to the number of controls included in the window view, and calls the interface View.getLocationOnScreen(int[]) to obtain the position of each control in the displayed interface, and combines the acquired in S402.
  • the intercepted area is the entire area of the currently displayed interface.
  • the application framework layer calls the related interface to obtain the window view including the control 504 (return button icon), the control 505 (title bar), the control 506 (chat detail button icon), the control 507 (avatar icon 1), and the control 508 (conversation content) 1), control 509 (conversation content 2), control 510 (avatar icon 2), control 511 (voice input button icon), control 512 (input box), and control 513 (option button icon).
  • the control 504-control 513 is within the intercepted area of the currently displayed interface, and the control 504-control 513 is a control whose visibility attribute is visible.
  • the terminal acquires the text property of the visible property as a visible control.
  • the terminal can obtain the text content of all the visibility attributes as visible controls.
  • the application framework layer can obtain the visibility property as visible in the control, and the type is the first type of control.
  • the first type may be a Text View type and/or an Image View type.
  • the first type of control can also be a button, an ActionBar, and the like.
  • the application framework layer gets the text content from the literal property of the type of control of the first type. For example, for a control of type Text View, the application framework layer can call the interface View.getText() to get the text content of the control from the text field of the control. For controls of type Image View, the application framework layer can Call the interface View.getContentDescription() to get the text content of the control from the content description field of the control.
  • the application framework layer obtains a control whose visible property is visible, and the control of type Text View is control 505, control 508, and control 509, and the control of type Image View is control 507. And control 510.
  • the control 505 the control 508 and the control 509, the interface View.getText() can be respectively called, and the text content of the control 505 is obtained as "AMIX”, and the text content of the control 508 is "Notification: starting from October 1, the community The vaccination time is updated to: 8:30-12:00 every Tuesday, please record", the text of the control 509 is "received, thank you!.
  • the interface View.getContentDescription() is called to determine that the text content is not included in control 507 and control 510. What you can get is that the text content of the interface that the user wants to intercept includes: “AMIX”, “Notification: Starting from October 1st, the community vaccination time is updated to: Every Tuesday 8:30-12:00, please Record” and "receive, thank you!.
  • the terminal stores the text content of the at least one control in the visible property as a visible control in association with the screenshot.
  • the terminal can store all the visibility attributes of the text content of at least one of the visible controls in association with the screenshot.
  • the application layer may associate the acquired visibility attribute with the text content of at least one control in the visible control as the text content of the screenshot and the screenshot.
  • the text property of the at least one control in the visible property is stored in association with the storage path of the screenshot.
  • the text content of at least one of the controls whose visibility attribute is visible can be stored in the database as the text content of the screenshot as a storage path of the screenshot, which can be used to subsequently search for related screenshots.
  • the visibility attribute saved by the terminal is that the text content of at least one control in the visible control can also be displayed to the user through the application, which is convenient for the user to view and use, and the application may be an album application or a notepad application.
  • the terminal constructs a search index file according to the storage path of the screenshot and the stored text content.
  • the application layer can build a search index file based on the storage path of the screenshot and the stored text content. After the search index file is built, the user can find the screenshot that he wants to view through the built search index file when the user needs it later.
  • the database (for example, the database is SQLite) stores information such as the storage path _data of the screenshot and the text content of the screenshot.
  • the application layer creates the Lucene-based search index file by using the open source software Lucene framework to record the record line number id in the database, as well as the storage path _data and text content.
  • the search index file corresponds to the information of the screenshot stored in the database.
  • the search engine searches Engine
  • an index (Index) API to input information such as a storage path _data and text content of a screenshot stored in the database, and a record line number _id of the screenshot information in the database. engine.
  • the search engine creates a search index file according to the input record number _id in the database, the storage path_data and the text content, and stores the created search index file in the index database (Indices Database).
  • the keyword can be entered through a user interface (UI) that provides a search portal.
  • the search engine calls the search API to obtain the keyword input by the user, and in the index database.
  • the matching is performed to obtain one or more search index files that match the keyword, and corresponding screenshots are obtained from the database and presented to the user according to the correspondence between the search index file and the information of the screenshots stored in the database.
  • the application layer may also trigger the application layer to update the database record.
  • the user can click the icon 701 of the photo album application in the mobile phone desktop.
  • the mobile phone detects the click operation of the icon 701 of the photo album application in the mobile phone desktop
  • the mobile phone opens the photo album application, as shown in (b) of FIG. 7, and displays the main interface of the photo album application.
  • the user can input a keyword of the picture to be viewed in the search box 702, such as inputting the keyword "vaccine", and clicking the search button icon 703.
  • the mobile phone can perform matching in the index database and obtain a picture matching the keyword "vaccine”.
  • the mobile phone displays the matching picture 704.
  • the screenshot function of the application layer can also generate and save the header information of the screenshot, wherein the header
  • the information includes the acquired visibility property as the textual content of at least one of the controls in the visible control.
  • the header information can be an Exchangeable image file format (Exif), which is used to record the attribute information of the screenshot, such as: sensitivity, aperture size, picture size, thumbnail, shooting time, camera model and acquisition. Text content to, etc.
  • FIG. 8 is a schematic diagram of a format of an Exif provided by an embodiment of the present application. Referring to FIG. 5, it can be seen from FIG.
  • the Exif of the screenshot includes the model of the device that generated the screenshot: the mobile phone 1-XX, the sensitivity (ISO): 100, the shooting time (Date Taken): 20171010, and the screenshot content.
  • search_text “AMIX”, “Notification: Starting from October 1, the community's vaccination time is updated to: every Tuesday at 8:30-12:00, please record” and "Receive, thank you!.
  • the search_text field is used to save the content of the obtained screenshot, that is, the text content acquired in S404. If the image corresponding to the keyword is not matched in the index database, the mobile phone can find the corresponding screenshot by matching the header information of the image.
  • the terminal acquires the interception region in response to the received first input, intercepts the page content in the intercepted region, generates a screenshot, and acquires a visible property in the intercepted region as a visible control.
  • Obtaining the visibility attribute as the text content in the visible control, and the visibility property obtained by the terminal is the text content of at least one control in the visible control as the text content of the screenshot is stored in association with the screenshot.
  • the time for recognizing the text is reduced and the text is improved compared to recognizing the screenshot text by using the OCR technology.
  • the accuracy of the identification is Moreover, by establishing a search index according to the text content of the obtained screenshot, the user's demand for high-precision search of the picture text is satisfied.
  • the embodiment of the present application provides a terminal for performing the foregoing method.
  • the embodiment of the present application may divide the function module into the terminal according to the foregoing method example.
  • each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • FIG. 9 is a schematic diagram showing a possible structure of a terminal involved in the foregoing embodiment, where the terminal 900 may include: a receiving unit 901 and an acquiring unit. 902.
  • the receiving unit 901 is configured to support the terminal to execute S401 in the foregoing method embodiments and/or other processes for the techniques described herein.
  • the obtaining unit 902 is configured to support the terminal to execute S402, S404, S405 and/or other processes for the techniques described herein in the foregoing method embodiments.
  • the intercepting unit 903 is configured to support the terminal to execute the content of the page in the intercepted intercepting area and/or other processes used in the techniques described herein described in S403 in the foregoing method embodiment.
  • the generating unit 904 is configured to support the terminal to execute the generated screenshot and/or other processes for the techniques described herein in S403 in the foregoing method embodiment.
  • the storage unit 905 is configured to support the terminal to execute S406 in the foregoing method embodiments and/or other processes for the techniques described herein.
  • the terminal may further include: a building unit 906.
  • the building unit 906 is configured to support the terminal to execute S407 in the foregoing method embodiment.
  • FIG. 10 shows a possible structural diagram of the terminal involved in the above embodiment.
  • the terminal 1000 can include a processing module 1001, a storage module 1002, and a display module 1003.
  • the processing module 1001 is configured to control and manage the actions of the terminal.
  • the display module 1003 is configured to display an image generated by the processing module 1001.
  • the storage module 1002 is configured to save program codes and data of the terminal.
  • the terminal may further include a communication module, and the communication module is configured to support communication between the terminal and other network entities.
  • the processing module 1001 may be configured to support the terminal to execute S401, S402, S403, S404, S405, and/or S407 in the foregoing method embodiments.
  • the storage module can be used to support the terminal to execute S406 in the foregoing method embodiment.
  • the processing module 1001 can be a processor or a controller.
  • the communication module can be a transceiver, an RF circuit or a communication interface or the like.
  • the storage module 1002 can be a memory.
  • the processing module 1001 is a processor
  • the communication module is an RF circuit
  • the storage module 1002 is a memory
  • the display module 1003 is a display
  • the terminal provided by the embodiment of the present application may be the mobile phone shown in FIG. 1 .
  • the foregoing communication module may include not only an RF circuit but also a WiFi module and a Bluetooth module. Communication modules such as RF circuits, WiFi modules, and Bluetooth modules can be collectively referred to as communication interfaces.
  • the processor, the RF circuit, the touch screen and the memory can be coupled together by a bus.
  • the embodiment of the present application further provides a computer storage medium, where the computer program code stores a computer program code, and when the processor executes the computer program code, the terminal performs the related method steps in FIG. 4 to implement the foregoing embodiment.
  • the method of identifying the screenshot text is not limited to a computer storage medium, where the computer program code stores a computer program code, and when the processor executes the computer program code, the terminal performs the related method steps in FIG. 4 to implement the foregoing embodiment.
  • the embodiment of the present application further provides a computer program product, when the computer program product is run on a computer, causing the computer to execute the related method steps in FIG. 4 to implement the method for recognizing the screenshot text in the above embodiment.
  • the terminal, the computer storage medium or the computer program product provided by the embodiment of the present application are all used to perform the corresponding method provided above. Therefore, the beneficial effects that can be achieved can be referred to the corresponding method provided above. The beneficial effects will not be described here.
  • the functional units in the various embodiments of the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the medium includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) or processor to perform all or part of the steps of the methods described in the various embodiments of the present application.
  • the foregoing storage medium includes: a flash memory, a mobile hard disk, a read only memory, a random access memory, a magnetic disk, or an optical disk, and the like, which can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请实施例公开了一种识别截图文字的方法及终端,涉及终端领域,解决了采用OCR技术识别截图文字耗时长的问题。具体方案为:终端接收用户的第一输入;终端响应第一输入获取截取区域,截取区域为终端显示的界面的全部区域或部分区域;终端截取所述截取区域中的页面内容,并生成截图;终端获取截取区域内可见性属性为可见的控件,并获取可见性属性为可见的控件中的文本内容;终端将可见性属性为可见的控件中至少一个控件的文本内容与截图关联存储。

Description

一种识别截图文字的方法及终端 技术领域
本申请实施例涉及终端领域,尤其涉及一种识别截图文字的方法及终端。
背景技术
随着通信技术的不断发展,手机等终端已成为人们日常生活中不可或缺的一部分。用户利用手机不仅可以与其他用户交流通信,还可以浏览或处理各类信息。
在使用过程中,对于手机显示的感兴趣的内容,如用户对某一些文字感兴趣,用户通常会使用截屏功能,将其以截图的形式保存下来,以方便后续使用。若需要识别截图中的文字,现有技术通常采用光学字符识别(Optical Character Recognition,OCR)技术来实现。
采用OCR技术识别截图中的文字,通常需执行预处理、特征提取、分类训练识别、后处理等几个步骤。而由于分类训练识别环节需要使用足够多的样本数据进行标记训练,后处理环节需要对识别出的结果进行不断的校正,因此,识别截图文字的耗时较长。
发明内容
本申请实施例提供一种识别截图文字的方法及终端,解决了采用OCR技术识别截图文字耗时长的问题。
为达到上述目的,本申请实施例采用如下技术方案:
本申请实施例的第一方面,提供一种识别截图文字的方法,包括:
终端接收用户的第一输入;终端响应接收到的第一输入获取截取区域,该截取区域为终端显示的界面的全部区域或部分区域;终端截取所述截取区域中的页面内容,并生成截图;终端获取所述截取区域内可见性属性为可见的控件,并获取可见性属性为可见的控件中的文本内容;终端将可见性属性为可见的控件中至少一个控件的文本内容与截图关联存储。
本申请实施例提供的识别截图文字的方法,终端响应于接收到的第一输入获取截取区域,截取所述截取区域中的页面内容,生成截图,并获取截取区域内可见性属性为可见的控件,获取可见性属性为可见的控件中的文本内容,终端将获得的可见性属性为可见的控件中至少一个控件的文本内容作为截图的文本内容与截图关联存储。这样,通过将在截屏时获取的截取区域内可见性属性为可见的控件中的文本内容作为截图的文字,相较于采用OCR技术识别截图文字,降低了识别文字花费的时间。
结合第一方面,在一种可能的实现方式中,所述终端将所述可见性属性为可见的控件中至少一个控件的文本内容与截图关联存储,具体的可以包括:终端将可见性属性为可见的控件中至少一个控件的文本内容与截图的存储路径关联存储。
结合第一方面或上述可能的实现方式,在另一种可能的实现方式中,所述终端将所述可见性属性为可见的控件中至少一个控件的文本内容与截图关联存储,具体的可以包括:终端将所述可见性属性为可见的控件中至少一个控件的文本内容存储在截图 的头信息中。这样,在保存的截图的文字与截图的存储路径的关联关系删除的情况下,仍可以通过截图的头信息中保存的文本内容查找到截图。
结合第一方面或上述可能的实现方式,在另一种可能的实现方式中,所述获取所述可见性属性为可见的控件中的文本内容具体的可以包括:终端获取可见性属性为可见的控件中,类型为第一类型的控件,第一类型为文本控件类型和/或图像控件类型;终端从类型为第一类型的控件的文字属性中获取文本内容。这样,通过从筛选出来的可能包含文本内容的控件中获取截图的文本内容,进一步节省了获取截图文字花费的时间。
结合第一方面或上述可能的实现方式,在另一种可能的实现方式中,该识别截图文字的方法还可以包括:终端根据存储的文本内容和截图的存储路径构建搜索索引文件,该搜索索引文件用于查找截图。这样,通过根据获取到的截图的文本内容建立搜索索引,满足用户对图片文字高精度搜索的需求。
本申请实施例的第二方面,提供一种终端,包括:一个或多个处理器、存储器和输入单元;存储器、输入单元与一个或多个处理器耦合,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行所述计算机指令时,输入单元,用于接收用户的第一输入;处理器,用于响应第一输入获取截取区域,截取所述截取区域中的页面内容,并生成截图,获取所述截取区域内可见性属性为可见的控件,并获取可见性属性为可见的控件中的文本内容,截取区域为终端显示的界面的全部区域或部分区域;存储器,用于将可见性属性为可见的控件中至少一个控件的文本内容与截图关联存储。
结合第二方面,在一种可能的实现方式中,所述存储器,具体用于将可见性属性为可见的控件中至少一个控件的文本内容与截图的存储路径关联存储。
结合第二方面或上述可能的实现方式,在另一种可能的实现方式中,所述存储器,具体用于将可见性属性为可见的控件中至少一个控件的文本内容存储在截图的头信息中。
结合第二方面或上述可能的实现方式,在另一种可能的实现方式中,所述处理器,具体用于获取可见性属性为可见的控件中,类型为第一类型的控件,第一类型为文本控件类型和/或图像控件类型,从类型为第一类型的控件的文字属性中获取文本内容。
结合第二方面或上述可能的实现方式,在另一种可能的实现方式中,所述处理器,还用于根据存储的文本内容和截图的存储路径构建搜索索引文件,所述搜索索引文件用于查找截图。
本申请实施例的第三方面,提供一种终端,包括:
接收单元,用于接收用户的第一输入;获取单元,用于响应接收单元接收到的第一输入获取截取区域,所述截取区域为终端显示的界面的全部区域或部分区域;截取单元,用于截取获取单元获取到的截取区域中的页面内容,生成单元,用于生成截图;获取单元,还用于获取截取区域内可见性属性为可见的控件,并获取可见性属性为可见的控件中的文本内容;存储单元,用于将获取单元获取到的可见性属性为可见的控件中至少一个控件的文本内容与生成单元生成的截图关联存储。
结合第三方面,在一种可能的实现方式中,存储单元,具体用于将可见性属性为 可见的控件中至少一个控件的文本内容与截图的存储路径关联存储。
结合第三方面或上述可能的实现方式,在另一种可能的实现方式中,存储单元,具体用于将可见性属性为可见的控件中至少一个控件的文本内容存储在截图的头信息中。
结合第三方面或上述可能的实现方式,在另一种可能的实现方式中,获取单元,具体用于获取可见性属性为可见的控件中类型为第一类型的控件,从类型为第一类型的控件的文字属性中获取文本内容,第一类型为文本控件类型和/或图像控件类型。
结合第三方面或上述可能的实现方式,在另一种可能的实现方式中,还包括:构建单元,用于根据存储的文本内容和截图的存储路径构建搜索索引文件,搜索索引文件用于查找截图。
本申请实施例的第四方面,提供一种计算机存储介质,包括计算机指令,当计算机指令在终端上运行时,使得终端执行如第一方面或第一方面的可能的实现方式中任一项所述的识别截图文字的方法。
本申请实施例的第五方面,提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面或第一方面的可能的实现方式中任一项所述的识别截图文字的方法。
可以理解的,上述所有描述的实施例中,获取截取区域内可见性属性为可见的控件,可以获取截取区域内所有可见性属性为可见的控件。
可以理解地,上述提供的第二方面和第三方面所述的终端、第四方面所述的计算机存储介质,以及第五方面所述的计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种手机的硬件结构示意图;
图2A为本申请实施例提供的一种手机的正面示意图;
图2B为本申请实施例提供的一种显示界面示意图一;
图2C为本申请实施例提供的一种显示界面示意图二;
图2D为本申请实施例提供的一种显示界面示意图三;
图3为本申请实施例提供的一种终端的系统架构示意图;
图4为本申请实施例提供的一种识别截图文字的方法流程示意图;
图5为本申请实施例提供的一种显示界面示意图四;
图6为本申请实施例提供的一种构建搜索索引的架构示意图;
图7为本申请实施例提供的一种显示界面示意图五;
图8为本申请实施例提供的一种Exif的格式示意图;
图9为本申请实施例提供的一种终端的组成示意图;
图10为本申请实施例提供的另一种终端的组成示意图。
具体实施方式
在用户使用终端的过程中,有时会将终端显示的文字以截图的形式保存下来,方便后续查看。对于截图中文字的识别,现有技术通常采用OCR技术来实现。而采用 OCR技术识别截图文字,通常花费的时间较长。为了解决采用OCR技术识别截图文字耗时长的问题,本申请实施例提供一种识别截图文字的方法,终端在检测到用户执行截屏操作时,截取获取的截取区域中的页面内容,生成截图,并提取截取区域内可见性属性为可见的控件,从可见性属性为可见的控件中获取文本内容,将获取到的可见性属性为可见的控件中至少一个控件的文本内容作为截图的文本内容与生成的截图关联存储。这样,通过将在截屏时获取的截取区域内可见性属性为可见的控件中的文本内容作为截图的文字,相较于采用OCR技术识别截图文字,降低了识别截图文字花费的时间。
为了方便清楚地理解下述各实施例,首先给出相关技术的简要介绍:
控件:通常将在图形用户界面中呈现的元素称为控件,其能够为用户提供一定的操作或用于显示一定内容。
在本申请实施例中,将控件是否可见属性称之为可见性(visibility)属性。可见性属性有三种可能的取值,分别为:可见(visible)、不可见(invisible)和gone。其中,visible表示可见,invisible表示不可见但占据布局位置,gone表示不可见且不占用布局位置。在本申请实施例中,可见性属性为visible的控制可以简单的理解为程序开发设计中希望用户能够看到的控件,可见性属性为invisible和gone的控制可简单的理解为程序开发设计中不希望用户看到的控件。另外,在程序开发过程中,有些控件的可见性属性可能需要进行切换,可能默认设置为invisible,在需要的时候改变为visible,即由不可见变为可见。
下面将结合附图对本申请实施例的实施方式进行详细描述。
需要说明的是,本申请实施例提供的识别截图文字的方法,可以应用于终端。示例性的,该终端可以为平板电脑、桌面型、膝上型、笔记本电脑、超级移动个人计算机(Ultra-mobile Personal Computer,UMPC)、手持计算机、上网本、个人数字助理(Personal Digital Assistant,PDA)、可穿戴电子设备、智能手表等设备,也可以是图1所示的手机100,本申请实施例中对终端的具体形式不做特殊限制。
如图1所示,本申请实施例中的终端可以为手机100。图1为手机100的硬件结构示意图。应该理解的是,图示手机100仅仅是终端的一个范例。并且,手机100可以具有比图中所示出的更多的或者更少的部件,可以组合图中所示出的两个或更多的部件,或者可以具有不同的部件布置。
如图1所示,手机100可以包括:显示器101、输入单元102、处理器103、存储器104、电源105、射频(Radio Frequency,RF)电路106、传感器107、音频电路108、扬声器109、麦克风110、无线保真(Wireless Fidelity,WiFi)模块111等部件,这些部件之间可以以总线连接,也可以直连连接。
其中,显示器101可用于显示由用户输入的信息或提供给用户的信息,以及手机100的各种菜单,还可以接受用户的输入操作。具体的,显示器101可以包括显示面板101-1以及触控面板101-2。
其中,显示面板101-1可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置。
触控面板101-2,也可以称为触摸屏、触敏屏、触控屏等,可收集用户在其上或附 近的接触或者非接触操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板101-2上或在触控面板101-2附近的操作,也可以包括体感操作;该操作包括单点控制操作、多点控制操作等操作类型),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板101-2可以包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位、姿势,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信号,并将接收到的触摸信号转换成处理器103能够处理的信息,然后将该信息送给处理器103,并能接收处理器103发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板101-2,也可以采用未来发展的任何技术实现触控面板101-2,本申请实施例对此不进行限定。
进一步的,触控面板101-2可覆盖显示面板101-1,用户可以根据显示面板101-1显示的内容(该显示的内容包括以下任意一种或多种的组合:软键盘、虚拟鼠标、虚拟按键、图标等等),在显示面板101-1上覆盖的触控面板101-2上或者附近进行操作。触控面板101-2检测到在其上或附近的操作后,通过输入/输出子系统传送给处理器103以确定用户输入,随后处理器103根据用户输入通过输入/输出子系统在显示面板101-1上提供相应的视觉输出。虽然在图1中,触控面板101-2与显示面板101-1是作为两个独立的部件来实现手机100的输入和输入功能,但是在某些实施例中,可以将触控面板101-2与显示面板101-1集成而实现手机100的输入和输出功能。
输入单元102,可以为上述触控面板101-2,也可以为其他输入设备。所述的其他输入设备可用于接收输入的数字或字符信息,以及产生与手机100的用户设置以及功能控制有关的键信号输入。具体地,所述的其他输入设备可以包括以下任意一种或多种的组合:物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆、光鼠(光鼠是不显示可视输出的触摸敏感表面,或者是由触摸屏形成的触摸敏感表面的延伸)等。所述的其他输入设备与输入/输出子系统的其他输入设备控制器相连接,在其他设备输入控制器的控制下与处理器103进行信号交互。
处理器103是手机100的控制中心,利用各种接口和线路连接整个手机100的各个部分,通过运行或执行存储在存储器104内的软件程序和/或模块,以及调用存储在存储器104内的数据,执行手机100的各种功能和处理数据,从而对手机100进行整体监控。可选的,处理器103可包括一个或多个处理单元;处理器103可集成应用处理器和调制解调处理器。其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以与应用处理器分开设置,处理器可以是调制解调器,也可以是应用处理器,也可以是调制解调器和应用处理器。
存储器104可用于存储数据、软件程序以及模块,处理器103通过运行存储在存储器104的数据、软件程序以及模块,从而执行手机100的各种功能应用以及数据处理,例如,执行本申请实施例提供的识别截图文字的方法。存储器104可主要包括存储程序区和存储数据区。其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机100的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器104可以是易失 性存储器(Volatile Memory),例如随机存取存储器(Random-Access Memory,RAM)、高速随机存取存储器;或者非易失性存储器(Non-Volatile Memory),例如磁盘存储器件、闪存器件、只读存储器(Read-Only Memory,ROM),快闪存储器(Flash Memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合。
电源105,可以为电池,通过电源管理系统与处理器103逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
RF电路106可用于收发信息或通话过程中,信号的接收和发送,特别地,将接收到的基站的下行信息给处理器103处理;另外,将设计上行的数据发送给基站。通常,RF电路106包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路106还可以通过无线通信与网络和其他设备通信。所述的无线通信可以使用任意一种通信标准或协议,包括以下一种或多种的组合:全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
手机100还可以包括至少一个传感器107,比如光传感器、速度传感器、全球定位系统(Global Position System,GPS)传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板101-1的亮度,接近传感器可在手机100移动到耳边时,关闭显示面板101-1和/或背光。作为速度传感器的一种,加速计传感器可检测手机100在各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机100姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等。至于手机100还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器、压力传感器等其他传感器,在此不再赘述。
音频电路108、扬声器109、麦克风110可提供用户与手机100之间的音频接口。音频电路108可将接收到的音频数据转换后的电信号,传输到扬声器109,由扬声器109转换为声音信号输出;另一方面,麦克风110将收集的声音信号转换为电信号,由音频电路108接收后转换为音频数据,再将音频数据输出至RF电路106以发送给如另一手机,或者将音频数据输出至处理器103以便进一步处理。
WiFi模块111,可以是包括WiFi芯片和WiFi芯片的驱动程序的模块,WiFi芯片具备运行无线互联网标准协议的能力。
另外,在上述部件之上,运行有操作系统。例如苹果公司所开发的iOS操作系统,谷歌公司所开发的Android开源操作系统,微软公司所开发的Windows操作系统等。在该操作系统上可以安装运行应用程序。且,尽管未示出,手机100还可以包括蓝牙模块、摄像头等部件。蓝牙模块,是一种集成蓝牙功能的印刷电路板组件(Printed Circuit Board Assembly,PCBA),用于短距离无线通讯。
具体的,在本申请实施例中,输入单元102可以用于接收用户的第一输入。
处理器103可以用于,响应于输入单元102接收到的第一输入获取截取区域,截取所述截取区域内的页面内容,生成截图。
例如,图2A为本申请实施例提供的手机100的正面示意图,假设第一输入为同时按下手机100的音量控制按键(如音量“+”键)201和开关按键202。在用户想要使用截屏功能截取手机100显示的界面中的页面内容时,可以同时按下手机100的音量控制按键(如音量“+”键)201和开关按键202,以触发处理器103启动截屏功能,从而生成截图。
再例如,假设第一输入为双击操作。在用户想要使用截屏功能时,可以执行双击操作,处理器103响应该双击操作启动截屏功能,这样就可以截取手机100显示的界面中的页面内容,从而生成截图。
再例如,如图2B所示,假设第一输入为对虚拟按键(如截屏按钮)的操作。在用户想要使用截屏功能时,如图2B中的(a)所示,用户按照滑动轨迹203执行滑动操作。如图2B中的(b)所示,响应于该滑动操作显示器101显示下拉菜单204,该下拉菜单204中包括截屏按钮205。用户可以点击截屏按钮205,处理器103响应该点击操作启动截屏功能,这样就可以截取手机100当前显示的界面中的页面内容,从而生成截图。
更进一步的,例如,在用户同时按下手机100的音量控制按键(如音量“+”键)201和开关按键202,或者执行双击手势,或者点击截屏按钮205后,如图2C中所示,手机100可以显示悬浮框206,用户可以点击悬浮框206右下角的滚动截屏按钮207,此时,处理器103可以启动截屏功能,并截取在当前界面中无法全部显示的页面内容,例如,用户正在浏览的较长的网页内容,或页数较多的word、PDF文档等,以满足用户长截屏的需求。
或者,更进一步的,例如,在用户同时按下手机100的音量控制按键(如音量“+”键)201和开关按键202,或者执行双击手势,或者点击截屏按钮204后,如图2D所示,手机100可以显示悬浮框208,用户可以通过选择操作改变该悬浮框208的大小,以便截取当前界面中的部分区域中的页面内容,从而生成截图。
处理器103还可以用于获取截取区域内可见性属性为可见的控件,获取可见性属性为可见的控件中的文本内容。
存储器104可以用于,将可见性属性为可见的控件中至少一个控件的文本内容与截图关联存储。在具体实现中,获取的可见性属性为可见的控件中至少一个控件的文本内容可以保存在截图的头信息中,该截图存储在存储器104中,也可以与截图的存储路径关联存储在存储器104中,以便用户后续使用。
可以理解的,本实施例中的处理器可以获取截取区域内所有可见性属性为可见的控件,这样处理器可以获取可见性属性为可见的控件中的文本内容。
图3为本申请实施例提供的一种终端的系统架构示意图。其中,以终端的操作系统是Android系统为例,如图3所示,该系统架构可以包括应用程序层301、应用程序框架层302、系统运行库层303,Linux内核层304。应用程序层301、应用程序框架层302、系统运行库层303和Linux内核层304运行在应用处理器中。其中:
应用程序层301是Android系统中与用户交互的层级。应用程序层301包括终端 的各种应用(第三方应用和/或系统应用),其可以根据不同应用访问应用程序框架层302提供的服务。例如,在截取显示的界面中的内容时,截屏应用可以访问应用程序框架层302提供的截屏接口管理服务。
应用程序框架层302用于为应用程序层301提供相关的应用程序接口(Application Program Interface,API)和服务,以为应用程序层301中应用的运行提供支撑。其中,对于不同的应用,应用程序框架层302为应用程序层301提供的API不同,服务也不同。例如,在截取显示的界面中的内容时,应用程序框架层302可以为应用程序层301提供截屏功能相关的API,并为应用程序层301提供截屏接口管理服务,以实现截屏功能。
系统运行库层303和Linux内核层304用于对系统的正常运转做支撑。
需要说明的是,本申请实施例虽然以Android系统为例进行说明,但是其基本原理同样适用于基于iOS或Windows等操作系统的终端。
示例性的,在用户想要使用截屏功能时,可以执行用于指示启动截屏功能的输入。应用程序框架层302监测到用户的输入时,通知应用程序层301的截屏应用。应用程序层301中的截屏应用通过相关API访问应用程序框架层302提供的截屏接口管理服务,以实现截屏功能,从而生成截图。并且,应用程序框架层302调用相关接口获取显示的界面中截取区域内可见性属性为可见的控件,并获取可见性属性为可见的控件中的文本内容。应用程序层301还可将截图与可见性属性为可见的控件中至少一个控件的文本内容关联存储。
可以理解的,应用程序框架层302可以获取截取区域内所有可见性属性为可见的控件,并可以获取所有可见性属性为可见的控件的文本内容。应用程序层301可以将截图与所有可见性属性为可见的控件中至少一个控件的文本内容关联存储。
为了便于理解,以下结合附图对本申请实施例提供的识别截图文字的方法进行具体介绍。以下均是以终端是手机为例进行介绍的。
图4为本申请实施例提供的一种识别截图文字的方法流程示意图。如图4所示,该识别截图文字的方法可以包括:
S401、终端接收用户的第一输入。
在用户使用手机的过程中,若对手机显示的界面中的文字比较感兴趣,想要保存显示的界面中的文字时,可以使用手机的截屏功能,以截图的形式将想要保存的文字存储下来。此时,用户可以执行第一输入,以触发手机启动截屏功能。当用户执行第一输入时,应用程序框架层可以监测到用户的第一输入。
在本申请实施例中,当用户想要截取当前界面中的显示的全部页面内容时,所述第一输入具体的可以是用户对手机的功能键(如,音量控制按键:音量“+”键,音量“-”键,开关按键等)或功能组合键(如,音量“+”键与开关按键的组合)的操作,或者可以是用户对手机的虚拟按键的操作,或者可以是用户输入的语音指令,或者还可以是用户输入的预设手势。在本申请的一些实施例中,所述预设手势可以为单击手势、滑动手势、压力识别手势、长按手势、面积变化手势、双按手势、双击手势中的任意一种。
当用户想要截取当前界面中显示的全部页面内容,以及当前界面中未能显示的页 面内容,即用户想要进行长截屏时,所述第一输入可以包括:用户对手机的功能键或功能组合键的操作或者对手机的虚拟按键的操作或者输入的语音指令或者输入的预设手势中的任意一种,以及用户输入的滑动操作,该滑动操作用于触发手机显示当前界面未能显示的页面内容。
当用户想要截取当前显示的界面中的部分页面内容时,所述第一输入具体的可以包括:用户对手机的功能键或功能组合键的操作或者对手机的虚拟按键的操作或者输入的语音指令或者输入的预设手势中的任意一种,以及用户输入的选择操作,该选择操作用于选择当前显示的界面中需要截取的区域。
S402、终端响应第一输入,获取截取区域。
截取区域为显示的界面的全部区域或部分区域。其中,显示的界面的全部区域可以是当前显示的界面的全部区域,或者滚动显示的界面的全部区域。
在应用程序框架层监测到用户的第一输入时,可以根据用户的第一输入获取截取区域。当第一输入为用户对手机的功能键或功能组合键的操作或者对手机的虚拟按键的操作或者输入的语音指令或者输入的预设手势中的任意一种时,截取区域为当前显示的界面的全部区域。当第一输入为用户对手机的功能键或功能组合键的操作或者对手机的虚拟按键的操作或者输入的语音指令或者输入的预设手势中的任意一种,以及滑动操作时,截取区域为滚动显示的界面的全部区域。当第一输入为用户对手机的功能键或功能组合键的操作或者对手机的虚拟按键的操作或者输入的语音指令或者输入的预设手势中的任意一种,以及选择操作时,截取区域为当前显示的界面的部分区域。
例如,假设第一输入为用户输入的语音指令,如语音指令为“执行截屏操作”。应用程序框架层监测到用户输入的语音指令“执行截屏操作”时,根据该语音指令可以确定需执行截屏操作,且截取区域是整个屏幕范围。应用程序框架层可以确定出截取区域为手机屏幕的显示范围,如截取区域为[(0,0),(1920,1080)],其中,(0,0)为手机屏幕的左上角坐标,(1920,1080)为手机屏幕的右下角坐标。
例如,假设第一输入为用户输入的预设手势和选择操作,选择操作用于选择需要截取的区域。应用程序框架层监测到用户输入的预设手势和选择操作时,根据预设手势可以确定需执行截屏操作,根据选择操作可以确定需截取的是用户选择的区域,并根据用户最终选择的区域,便可以确定出截取区域。
假设用户想要截取当前界面中的显示的全部页面内容,以第一输入是用户对手机的功能组合键:音量“+”键与开关按键的组合的操作为例。例如,如图5所示,用户在使用手机的微信应用与朋友聊天的过程中,对聊天界面中的聊天内容501比较感兴趣,并想以截图的形式将该聊天内容501保存下来。用户同时按下音量“+”键502与开关按键503。应用程序框架层检测到用户同时对音量“+”键502与开关按键503的操作之后,生成对应的截屏事件,该截屏事件用于指示截取手机显示的界面中的内容。应用程序框架层根据截屏事件可以确定需执行截屏操作,且需截取的是整个屏幕范围,即可以确定出截取区域为手机屏幕的显示范围。
S403、终端截取截取区域中的页面内容,并生成截图。
其中,在应用程序框架层监测到用户的第一输入并获取到截取区域后,应用程序框架层可以通知应用程序层的截屏应用用户想要使用截屏功能。应用程序层中的截屏 应用通过相关API访问应用程序框架层提供的截屏接口管理服务,可以截取显示的界面中截取区域内的页面内容。在截取成功之后,应用程序层的截屏应用可以生成截图。并且,手机还可以显示生成的截图,并切换回截屏时的界面。
S404、终端获取截取区域内的可见性属性为可见的控件。
可以理解的,终端也可以获取截取区域内的所有可见性属性为可见的控件。
以手机的操作系统为Android系统为例。例如,基于Android 8.0版本,应用程序框架层可以先采用Activity Manager类获取显示的界面。具体的,应用程序框架层调用接口ActivityManager.getRunningTasks(int M)获取手机当前运行的任务列表,并从获取到的任务列表中通过变量topActivity获取最顶层Activity的信息,如最顶层Activity的类名。应用程序框架层通过反射方式,根据获取到的最顶层Activity的类名,使用ActivityThread.currentActivityThread方法以及mActivities成员变量,便可以获取到显示的界面。所述显示的界面可以是手机中的应用的某个界面,或者,也可以是手机的桌面。所述应用可以为系统应用,也可以为第三方应用。
然后,应用程序框架层可以调用接口View decorView=activity.getWindow().getDecorView()获取显示的界面的整个窗口视图,并调用接口decorView.getChild(i)获取所述窗口视图包括的控件(View)。应用程序框架层根据窗口视图包括的控件的个数getChildViewCount,循环遍历每一个控件,调用接口View.getLocationOnScreen(int[])来获取每个控件在显示的界面中的位置,并结合S402中获取的截取区域和控件在显示的界面中的位置,便可获取到显示的界面中截取区域内的控件。在获取到显示的界面中截取区域内的控件之后,应用程序框架层调用接口View.getVisibility()==View.Visible便可获取到这些控件中可见性属性为可见的控件。
例如,结合图5所示,假设截取区域为当前显示的界面的全部区域。应用程序框架层调用相关接口获取到窗口视图中包括控件504(返回按钮图标)、控件505(标题栏)、控件506(聊天详情按钮图标)、控件507(头像图标1)、控件508(对话内容1)、控件509(对话内容2)、控件510(头像图标2)、控件511(语音输入按钮图标)、控件512(输入框)和控件513(选项按钮图标)。在调用相关接口循环遍历了每一个控件之后,确定出控件504-控件513均在当前显示的界面中截取区域内,且控件504-控件513均为可见性属性为可见的控件。
S405、终端获取可见性属性为可见的控件中的文本内容。
可以理解的,终端可以获取所有可见性属性为可见的控件的文本内容。
在应用程序框架层获取到显示的界面中截取区域内的可见性属性为可见的控件后,可以调用接口View.getText()或View.getContentDescription()获取可见性属性为可见的控件中的文本内容。
进一步的,应用程序框架层可以获取可见性属性为可见的控件中,类型为第一类型的控件。其中,第一类型可以是文本控件(Text View)类型和/或图像控件(Image View)类型。第一类型的控件还可以是按钮(Button)、ActionBar等。应用程序框架层从类型为第一类型的控件的文字属性中获取文本内容。例如,对于类型为Text View类型的控件,应用程序框架层可以调用接口View.getText(),从该控件的文本(text)字段中获取该控件的文本内容。对于类型为Image View类型的控件,应用程序框架层可以 调用接口View.getContentDescription(),从该控件的内容描述(content Description)字段中获取该控件的文本内容。
例如,结合图5所示,应用程序框架层获取到可见性属性为可见的控件中,类型为Text View类型的控件为控件505、控件508和控件509,类型为Image View类型的控件为控件507和控件510。对于控件505、控件508和控件509,可以分别调用接口View.getText(),获取到控件505的文本内容为“AMIX”,控件508的文本内容为“通知:由10月1日开始,本社区打疫苗时间更新为:每周二上午8:30-12:00,请记录”,控件509的文本内容为“收到,谢谢!”。对于控件507和控件510,调用接口View.getContentDescription(),确定控件507和控件510中不包含文本内容。可以得到的是,用户想要截取的界面中文本内容包括:“AMIX”,“通知:由10月1日开始,本社区打疫苗时间更新为:每周二上午8:30-12:00,请记录”和“收到,谢谢!”。
S406、终端将可见性属性为可见的控件中至少一个控件的文本内容与截图关联存储。
可以理解的,终端可以将所有可见性属性为可见的控件中至少一个控件的文本内容与截图关联存储。
其中,终端中通常会保存有大量的图片,若用户想要查找到以截图形式保存的文本内容,有可能需要花费较长的时间。在本申请实施例中,为了方便用户后续能够及时查找到想要查看的截图,应用程序层可以将获取的可见性属性为可见的控件中至少一个控件的文本内容作为截图的文本内容与截图关联存储在存储器中。在具体实现中,可以将可见性属性为可见的控件中至少一个控件的文本内容与截图的存储路径关联存储。例如,可以将可见性属性为可见的控件中至少一个控件的文本内容作为截图的文本内容与截图的存储路径关联存储在数据库中,该数据库可用于后续搜索相关截图。另外,终端保存的可见性属性为可见的控件中至少一个控件的文本内容也可以通过应用展示给用户,方便用户查看使用,该应用可以是相册应用,也可以是记事本应用等。
S407、终端根据截图的存储路径和存储的文本内容构建搜索索引文件。
应用程序层可以根据截图的存储路径和存储的文本内容构建搜索索引文件。在搜索索引文件构建好之后,用户后续需要时,可以通过构建的搜索索引文件查找到想要查看的截图。
例如,结合图6所示的构建搜索索引的架构示意图。数据库(例如,数据库为SQLite)中存储有截图的存储路径_data和截图的文本内容等信息。应用程序层通过开源软件Lucene框架,将截图在数据库中的记录行号id,以及存储路径_data和文本内容等信息,创建成基于Lucene的搜索索引文件。该搜索索引文件与数据库中存储的该截图的信息对应。具体的,搜索引擎(Search Engine)调用索引(Index)API,以将数据库中存储的截图的存储路径_data和文本内容等信息,以及该截图的信息在数据库中的记录行号_id输入搜索引擎。搜索引擎根据输入的截图在数据库中的记录行号_id,以及存储路径_data和文本内容等信息,创建一搜索索引文件,并将创建的搜索索引文件存储在索引数据库(Indices Database)中。在用户需要查找截图时,可以通过在提供搜索入口的用户界面(user interface,UI)输入关键字。应用程序层在检测到用户的查询请求后,由搜索引擎调用搜索(Search)API获取用户输入的关键字,并在索引数据库 中进行匹配,以获得与关键字匹配的一条或多条搜索索引文件,并根据搜索索引文件与数据库中存储的截图的信息的对应关系,从数据库中获取对应的截图呈现给用户。并且,在用户将截图从当前存储的文件夹移动到其他文件夹下,即截图的存储路径发生变化时,本申请实施例中还可以触发应用程序层更新数据库记录。
例如,如图7中的(a)所示,用户可以在手机桌面中点击相册应用的图标701。手机检测到用户对手机桌面中相册应用的图标701的点击操作后,打开相册应用,如图7中的(b)所示,显示相册应用的主界面。在手机显示的相册应用的主界面中,用户可以在搜索框702中输入想要查看的图片的关键字,如输入关键字“疫苗”,并点击搜索按钮图标703。手机检测到用户对搜索按钮图标703的点击操作后,可以在索引数据库中进行匹配,并获取到与该关键字“疫苗”匹配的图片。在获取到与关键字“疫苗”相匹配的图片之后,如图7中的(c)所示,手机显示匹配的图片704。
在一些实施例中,为了能够在数据库中内容被清除的情况下,用户仍能够快速查找到想要查看的截图,应用程序层的截屏应用还可以生成并保存截图的头信息,其中,该头信息中包括获取的可见性属性为可见的控件中至少一个控件的文本内容。头信息可以为截图的可交换图像格式(Exchangeable image file format,Exif),其用于记录截图的属性信息,如:感光度、光圈大小、图片尺寸、缩略图、拍摄时间、拍摄器材型号和获取到的文本内容等。例如,图8为本申请实施例提供的一种Exif的格式示意图。结合图5,由图8可知,该截图的Exif中包括生成截图的设备的型号(Model):手机1-XX,感光度(ISO):100,拍摄时间(Date Taken):20171010,以及截图内容(search_text):“AMIX”,“通知:由10月1日开始,本社区打疫苗时间更新为:每周二上午8:30-12:00,请记录”和“收到,谢谢!”。其中,search_text字段用于保存获取到的截图的内容,即S404中获取到的文本内容。若在索引数据库中未匹配与关键字对应的图片,手机可以通过匹配图片的头信息来查找到对应的截图。
本申请实施例提供的识别截图文字的方法,终端响应于接收到的第一输入获取截取区域,截取所述截取区域中的页面内容,生成截图,并获取截取区域内可见性属性为可见的控件,获取可见性属性为可见的控件中的文本内容,终端将获得的可见性属性为可见的控件中至少一个控件的文本内容作为截图的文本内容与截图关联存储。这样,通过将在截屏时获取的截取区域内可见性属性为可见的控件中的文本内容作为截图的文字,相较于采用OCR技术识别截图文字,降低了识别文字花费的时间,并提高了文字识别的准确率。并且,通过根据获取到的截图的文本内容建立搜索索引,满足用户对图片文字高精度搜索的需求。
本申请实施例提供一种终端,用于执行上述方法。本申请实施例可以根据上述方法示例对终端进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图9示出了上述实施例中所涉及的终端的一种可能的结构示意图,该终端900可以包括:接收单元901和获取单元 902。
其中,接收单元901,用于支持终端执行上述方法实施例中的S401和/或用于本文所描述的技术的其它过程。
获取单元902,用于支持终端执行上述方法实施例中的S402、S404、S405和/或用于本文所描述的技术的其它过程。
截取单元903,用于支持终端执行上述方法实施例中的S403所述的截取截取区域内的页面内容和/或用于本文所描述的技术的其它过程。
生成单元904,用于支持终端执行上述方法实施例中的S403所述的生成截图和/或用于本文所描述的技术的其它过程。
存储单元905,用于支持终端执行上述方法实施例中的S406和/或用于本文所描述的技术的其它过程。
在本申请实施例中,进一步的,如图9所示,该终端还可以包括:构建单元906。
构建单元906,用于支持终端执行上述方法实施例中的S407。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在采用集成的单元的情况下,图10示出了上述实施例中所涉及的终端的一种可能的结构示意图。该终端1000可以包括:处理模块1001、存储模块1002和显示模块1003。处理模块1001用于对终端的动作进行控制管理。显示模块1003用于显示处理模块1001生成的图像。存储模块1002,用于保存终端的程序代码和数据。进一步的,该终端还可以包括通信模块,该通信模块用于支持终端与其他网络实体的通信。
在本申请实施例中,处理模块1001可以用于支持终端执行上述方法实施例中的S401、S402、S403、S404、S405和/或S407。存储模块可以用于支持终端执行上述方法实施例中的S406。
其中,处理模块1001可以是处理器或控制器。通信模块可以是收发器、RF电路或通信接口等。存储模块1002可以是存储器。
当处理模块1001为处理器,通信模块为RF电路,存储模块1002为存储器,显示模块1003为显示器时,本申请实施例所提供的终端可以为图1所示的手机。其中,上述通信模块不仅可以包括RF电路,还可以包括WiFi模块和蓝牙模块。RF电路、WiFi模块和蓝牙模块等通信模块可以统称为通信接口。其中,上述处理器、RF电路、触控屏和存储器可以通过总线耦合在一起。
本申请实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机程序代码,当上述处理器执行该计算机程序代码时,该终端执行图4中的相关方法步骤实现上述实施例中的识别截图文字的方法。
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行图4中的相关方法步骤实现上述实施例中的识别截图文字的方法。
其中,本申请实施例提供的终端、计算机存储介质或者计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种识别截图文字的方法,其特征在于,包括:
    终端接收用户的第一输入;
    所述终端响应所述第一输入获取截取区域,所述截取区域为所述终端显示的界面的全部区域或部分区域;
    所述终端截取所述截取区域中的页面内容,生成截图;
    所述终端获取所述截取区域内可见性属性为可见的控件,并获取所述可见性属性为可见的控件中的文本内容;
    所述终端将所述可见性属性为可见的控件中至少一个控件的文本内容与所述截图关联存储。
  2. 根据权利要求1所述的方法,其特征在于,所述终端将所述可见性属性为可见的控件中至少一个控件的文本内容与所述截图关联存储,包括:
    所述终端将所述可见性属性为可见的控件中至少一个控件的文本内容与所述截图的存储路径关联存储。
  3. 根据权利要求1或2所述的方法,其特征在于,所述终端将所述可见性属性为可见的控件中至少一个控件的文本内容与所述截图关联存储,包括:
    所述终端将所述可见性属性为可见的控件中至少一个控件的文本内容存储在所述截图的头信息中。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述获取所述可见性属性为可见的控件中的文本内容,包括:
    所述终端获取所述可见性属性为可见的控件中,类型为第一类型的控件,所述第一类型为文本控件类型和/或图像控件类型;
    所述终端从类型为所述第一类型的控件的文字属性中获取文本内容。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述方法还包括:
    所述终端根据存储的文本内容和所述截图的存储路径构建搜索索引文件,所述搜索索引文件用于查找所述截图。
  6. 一种终端,其特征在于,包括:一个或多个处理器、存储器和输入单元;所述存储器、所述输入单元与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,
    所述输入单元,用于接收用户的第一输入;
    所述处理器,用于响应所述第一输入获取截取区域,截取所述截取区域中的页面内容,并生成截图,获取所述截取区域内可见性属性为可见的控件,并获取所述可见性属性为可见的控件中的文本内容,所述截取区域为所述终端显示的界面的全部区域或部分区域;
    所述存储器,用于将所述可见性属性为可见的控件中至少一个控件的文本内容与所述截图关联存储。
  7. 根据权利要求6所述的终端,其特征在于,所述存储器,具体用于将所述可见性属性为可见的控件中至少一个控件的文本内容与所述截图的存储路径关联存储。
  8. 根据权利要求6或7所述的终端,其特征在于,所述存储器,具体用于将所述可见性属性为可见的控件中至少一个控件的文本内容存储在所述截图的头信息中。
  9. 根据权利要求6-8中任一项所述的终端,其特征在于,所述处理器,具体用于获取所述可见性属性为可见的控件中,类型为第一类型的控件,所述第一类型为文本控件类型和/或图像控件类型,从类型为所述第一类型的控件的文字属性中获取文本内容。
  10. 根据权利要求6-9中任一项所述的终端,其特征在于,
    所述处理器,还用于根据存储的文本内容和所述截图的存储路径构建搜索索引文件,所述搜索索引文件用于查找所述截图。
PCT/CN2017/113333 2017-11-28 2017-11-28 一种识别截图文字的方法及终端 WO2019104478A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2017/113333 WO2019104478A1 (zh) 2017-11-28 2017-11-28 一种识别截图文字的方法及终端
CN201780082015.9A CN110168566B (zh) 2017-11-28 2017-11-28 一种识别截图文字的方法及终端

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/113333 WO2019104478A1 (zh) 2017-11-28 2017-11-28 一种识别截图文字的方法及终端

Publications (1)

Publication Number Publication Date
WO2019104478A1 true WO2019104478A1 (zh) 2019-06-06

Family

ID=66664286

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/113333 WO2019104478A1 (zh) 2017-11-28 2017-11-28 一种识别截图文字的方法及终端

Country Status (2)

Country Link
CN (1) CN110168566B (zh)
WO (1) WO2019104478A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442510A (zh) * 2019-06-19 2019-11-12 中国平安财产保险股份有限公司 一种页面属性获取方法、装置及计算机设备、存储介质
CN111291644A (zh) * 2020-01-20 2020-06-16 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN112364616A (zh) * 2019-07-26 2021-02-12 珠海金山办公软件有限公司 一种电子表格的处理方法、装置、电子设备及存储介质
WO2021121093A1 (zh) * 2019-12-16 2021-06-24 维沃移动通信有限公司 图像控制方法、电子设备及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112764857A (zh) * 2021-01-21 2021-05-07 维沃移动通信有限公司 信息处理方法、装置及电子设备
CN113485621B (zh) * 2021-07-19 2024-05-28 维沃移动通信有限公司 图像截取方法、装置、电子设备及存储介质
CN113723401A (zh) * 2021-08-23 2021-11-30 上海千映智能科技有限公司 一种基于形态学方法的歌单提取方法
CN114064790A (zh) * 2021-11-12 2022-02-18 盐城金堤科技有限公司 关系图谱是否正常加载的判断方法及其装置
CN115033318B (zh) * 2021-11-22 2023-04-14 荣耀终端有限公司 图像的文字识别方法、电子设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682102A (zh) * 2012-04-30 2012-09-19 上海量明科技发展有限公司 桌面保存的方法、客户端及系统
CN103064782A (zh) * 2011-10-21 2013-04-24 腾讯科技(深圳)有限公司 一种获取控件的方法及装置
CN103164300A (zh) * 2011-12-13 2013-06-19 腾讯科技(深圳)有限公司 一种移动终端触摸屏自动测试方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8781103B2 (en) * 2012-12-12 2014-07-15 Genesys Telecommunications Laboratories, Inc. System and method for call and data matching in a contact center
CN104657423B (zh) * 2015-01-16 2018-07-06 白天 应用间内容分享方法及其装置
CN106507183B (zh) * 2016-11-01 2020-04-07 青岛海信电器股份有限公司 视频名称的获取方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064782A (zh) * 2011-10-21 2013-04-24 腾讯科技(深圳)有限公司 一种获取控件的方法及装置
CN103164300A (zh) * 2011-12-13 2013-06-19 腾讯科技(深圳)有限公司 一种移动终端触摸屏自动测试方法及装置
CN102682102A (zh) * 2012-04-30 2012-09-19 上海量明科技发展有限公司 桌面保存的方法、客户端及系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442510A (zh) * 2019-06-19 2019-11-12 中国平安财产保险股份有限公司 一种页面属性获取方法、装置及计算机设备、存储介质
CN112364616A (zh) * 2019-07-26 2021-02-12 珠海金山办公软件有限公司 一种电子表格的处理方法、装置、电子设备及存储介质
CN112364616B (zh) * 2019-07-26 2024-04-30 珠海金山办公软件有限公司 一种电子表格的处理方法、装置、电子设备及存储介质
WO2021121093A1 (zh) * 2019-12-16 2021-06-24 维沃移动通信有限公司 图像控制方法、电子设备及存储介质
CN111291644A (zh) * 2020-01-20 2020-06-16 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN111291644B (zh) * 2020-01-20 2023-04-18 北京百度网讯科技有限公司 用于处理信息的方法和装置

Also Published As

Publication number Publication date
CN110168566A (zh) 2019-08-23
CN110168566B (zh) 2021-12-14

Similar Documents

Publication Publication Date Title
WO2019104478A1 (zh) 一种识别截图文字的方法及终端
WO2020239019A1 (zh) 一种分享内容的方法及电子设备
US10841265B2 (en) Apparatus and method for providing information
AU2014201716B2 (en) Apparatus and method for providing additional information by using caller phone number
WO2018072459A1 (zh) 一种屏幕截图和读取的方法及终端
US10275295B2 (en) Method and apparatus for presenting clipboard contents on a mobile terminal
AU2010327453B2 (en) Method and apparatus for providing user interface of portable device
KR102083209B1 (ko) 데이터 제공 방법 및 휴대 단말
US10775979B2 (en) Buddy list presentation control method and system, and computer storage medium
WO2020258929A1 (zh) 文件夹界面切换方法及终端设备
WO2019062910A1 (zh) 一种复制和粘贴的方法、数据处理装置和用户设备
US20100088628A1 (en) Live preview of open windows
WO2021047230A1 (zh) 一种获取截图信息的方法及装置
KR20160021637A (ko) 컨텐츠 처리 방법 및 그 전자 장치
WO2019214072A1 (zh) 一种显示输入法虚拟键盘的方法及终端
WO2019000437A1 (zh) 显示图形用户界面的方法和移动终端
CN112214138B (zh) 基于手势显示图形用户界面的方法及电子设备
US20140337720A1 (en) Apparatus and method of executing function related to user input on screen
US11481357B2 (en) Album display method, electronic device, and storage medium
WO2014206055A1 (en) A method and system for generating a user interface
US9977661B2 (en) Method and system for generating a user interface
WO2020259033A1 (zh) 文件管理方法及终端
WO2021109960A1 (zh) 图像处理方法、电子设备及存储介质
WO2020019330A1 (zh) 一种邮件翻译的方法及电子设备
WO2020073290A1 (zh) 一种网络页面保存方法及终端

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17933354

Country of ref document: EP

Kind code of ref document: A1