CN106484297B

CN106484297B - Character picking device and method

Info

Publication number: CN106484297B
Application number: CN201610884064.1A
Authority: CN
Inventors: 李光宇; 王猛
Original assignee: Nubia Technology Co Ltd
Current assignee: Nubia Technology Co Ltd
Priority date: 2016-10-10
Filing date: 2016-10-10
Publication date: 2020-03-27
Anticipated expiration: 2036-10-10
Also published as: CN106484297A

Abstract

The invention discloses a character pick-up device and a method, wherein the device comprises: the device comprises a shooting module, a first determining module and a playing module. The shooting module shoots an object in front of a camera of the terminal where the shooting module is located in a preset character picking mode. The playing module converts the characters in the shot dynamic images into voice for playing. Through the scheme of the embodiment of the invention, the text content can be known through the terminal, and the problem that the blind or amblyopia people cannot read due to the vision problem is solved.

Description

Character picking device and method

Technical Field

The invention relates to the field of terminal application, in particular to a character pickup device and a character pickup method.

Background

At present, the blind or the amblyopia person have a lot of inconveniences in life because of the eyesight problem, for example, when a restaurant eats, one person can not order dishes through a paper menu, when the person goes out, the person can not watch a bus stop board, and the like.

Disclosure of Invention

The invention mainly aims to provide a character pickup device and a character pickup method, which can know the character content through a terminal and solve the problem that blind people or amblyopia people cannot read due to the vision problem.

In order to achieve the above object, the present invention provides a character pick-up apparatus, comprising: shooting module and play module.

And the shooting module is used for shooting an object in front of the camera of the terminal where the shooting module is positioned in a preset character picking mode.

And the playing module is used for converting the characters in the shot dynamic images into voice for playing.

Optionally, the apparatus further comprises: the device comprises a detection module and a mode entering module.

And the detection module is used for detecting the trigger condition of the character pick-up mode.

And the mode entering module is used for entering a character picking mode when the triggering condition is detected and determined to be effective.

Optionally, the shooting module shooting an object in front of the camera of the terminal where the shooting module is located includes:

detecting an object in front of the camera; wherein the object includes textual information on a side thereof opposite the camera.

And adjusting the focal length according to preset conditions.

The central area of the character part in the object including the character information is taken as a shooting focus and shooting is performed.

Optionally, the apparatus further comprises a reminder module.

And the reminding module is used for sending out reminding information when one surface of the object opposite to the camera does not contain character information.

The reminding information comprises: vibration of the motor at a preset position.

Optionally, the preset conditions include: the size of the text.

The shooting module adjusts the focal length according to the preset condition and comprises:

and detecting the size of characters in the dynamic image under the current focal length.

And comparing the detected character size with a preset character size.

And when the detected character size is consistent with the preset character size, keeping the current focal length.

And when the detected character size is inconsistent with the preset character size, adjusting the focal length of the camera to be a first focal length to enable the character size in the dynamic image to be consistent with the preset character size.

Optionally, the apparatus further comprises: a first determination module.

The first determining module is used for determining the preset character size according to the fingerprint size of the user before the focal length is adjusted according to the preset condition.

Optionally, the determining, by the first determining module, the preset character size according to the fingerprint size of the user includes:

collecting fingerprint information when a user touches a terminal screen; the fingerprint information includes the fingerprint size.

The fingerprint height and width are extracted from the fingerprint size.

And determining the height and width of the fingerprint as the height and width of characters in the preset character size.

Optionally, the apparatus further comprises: a second determination module.

And the second determining module is used for detecting the touch operation on the shot dynamic image and determining the touch position.

And the playing module is also used for converting the characters corresponding to the touch position into voice for playing.

Optionally, the apparatus further comprises: and a character position determining module.

The text position determination module is used for:

after the touch position is determined, comparing the coordinates of the touch position with the coordinates of each character in the photo, and determining that the touch position corresponds to the character when the coordinates of the touch position are consistent with the coordinates of any character in the dynamic image; and when the coordinates of the touch position are inconsistent with the coordinates of each character in the dynamic image, determining that the touch position does not correspond to the character.

Optionally, the text position determining module is further configured to:

after the touch position is determined, when no corresponding character exists at the touch position, the position of a first character closest to the current touch position is detected.

And determining the relative direction of the position of the first character and the current touch position.

And controlling a preset motor in the corresponding direction to vibrate.

Optionally, the converting, by the playing module, the text corresponding to the touch position into the voice for playing includes:

and when the touch position is on the straight line of the characters or the column of the characters, converting the line of the characters or the column of the characters into voice for playing.

Optionally, the apparatus further comprises: and setting a module.

The setting module is used for:

characters adjacent to each other in the longitudinal direction are kept at a preset first interval, and a plurality of characters on the same straight line in the transverse direction are used as a line of characters.

Characters adjacent to each other in the transverse direction are kept at a preset second interval, and a plurality of characters on the same straight line in the longitudinal direction are used as a line of characters.

In addition, in order to achieve the above object, the present invention further provides a text pickup method, including:

and shooting an object in front of a camera of the terminal in a preset character pickup mode.

And converting the characters in the shot dynamic image into voice for playing.

Optionally, the method further comprises:

a trigger condition of a text pick-up mode is detected.

And entering a character picking mode when the trigger condition is detected and determined to be effective.

Optionally, the shooting an object in front of the camera of the terminal where the object is located includes:

And adjusting the focal length according to preset conditions.

Optionally, the method further comprises:

and when the surface of the object opposite to the camera does not contain the character information, sending out reminding information.

Optionally, the preset conditions include: the size of the text.

Adjusting the focal length according to the preset condition includes:

And comparing the detected character size with a preset character size.

Optionally, the method further comprises: and before the focal length is adjusted according to the preset condition, determining the preset character size according to the fingerprint size of the user.

Optionally, the determining the preset character size according to the fingerprint size of the user includes:

collecting fingerprint information when a user touches a terminal screen; the fingerprint information includes a fingerprint size.

The fingerprint height and width are extracted from the fingerprint size.

Optionally, the method further comprises:

a touch operation on the photographed moving image is detected and a touch position is determined.

And converting the characters corresponding to the touch position into voice for playing.

Optionally, the method further comprises:

And controlling a preset motor in the corresponding direction to vibrate.

Optionally, converting the text corresponding to the touch position into voice for playing includes:

Optionally, the method further comprises:

The invention provides a character pick-up device and a method, wherein the device comprises: the device comprises a shooting module, a first determining module and a playing module. The shooting module shoots an object in front of a camera of the terminal where the shooting module is located in a preset character picking mode. The playing module converts the characters in the shot dynamic images into voice for playing. Through the scheme of the embodiment of the invention, the text content can be known through the terminal, and the problem that the blind or amblyopia people cannot read due to the vision problem is solved.

Drawings

Fig. 1 is a schematic diagram of a hardware structure of an alternative mobile terminal for implementing various embodiments of the present invention;

FIG. 2 is a diagram of a wireless communication system for the mobile terminal shown in FIG. 1;

FIG. 3 is a block diagram of a text pick-up device according to an embodiment of the present invention;

FIG. 4 is a flowchart of a text pick-up method according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a text pick-up method according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a user clicking when an image is too small in the text pick-up method according to the embodiment of the present invention;

fig. 7 is a schematic diagram of a user clicking after focusing in the text pickup method according to the embodiment of the present invention;

fig. 8 is a schematic view illustrating an embodiment of an audible alert motor in the text pick-up method according to the embodiment of the invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

An alternative mobile terminal for implementing various embodiments of the present invention will now be described with reference to the accompanying drawings. In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.

The mobile terminal may be implemented in various forms. For example, the terminal described in the present invention may include a mobile terminal such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. In the following, it is assumed that the terminal is a mobile terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiment of the present invention can be applied to a fixed type terminal in addition to elements particularly used for moving purposes.

Fig. 1 is a schematic hardware configuration of a mobile terminal implementing various embodiments of the present invention.

The mobile terminal 100 may include a wireless communication unit 110, an a/V (audio/video) input unit 120, a user input unit 130, a sensing unit 140, an output unit 150, a memory 160, an interface unit 170, a controller 180, and a power supply unit 190, etc. Fig. 1 illustrates a mobile terminal having various components, but it is to be understood that not all illustrated components are required to be implemented. More or fewer components may alternatively be implemented. Elements of the mobile terminal will be described in detail below.

The wireless communication unit 110 typically includes one or more components that allow radio communication between the mobile terminal 100 and a wireless communication system or network. For example, the wireless communication unit may include at least one of a broadcast receiving module 111, a mobile communication module 112, a wireless internet module 113, a short-range communication module 114, and a location information module 115.

The broadcast receiving module 111 receives a broadcast signal and/or broadcast associated information from an external broadcast management server via a broadcast channel. The broadcast channel may include a satellite channel and/or a terrestrial channel. The broadcast management server may be a server that generates and transmits a broadcast signal and/or broadcast associated information or a server that receives a previously generated broadcast signal and/or broadcast associated information and transmits it to a terminal. The broadcast signal may include a TV broadcast signal, a radio broadcast signal, a data broadcast signal, and the like. Also, the broadcast signal may further include a broadcast signal combined with a TV or radio broadcast signal. The broadcast associated information may also be provided via a mobile communication network, and in this case, the broadcast associated information may be received by the mobile communication module 112. The broadcast signal may exist in various forms, for example, it may exist in the form of an Electronic Program Guide (EPG) of Digital Multimedia Broadcasting (DMB), an Electronic Service Guide (ESG) of digital video broadcasting-handheld (DVB-H), and the like. The broadcast receiving module 111 may receive a signal broadcast by using various types of broadcasting systems. In particular, the broadcast receiving module 111 may receive a broadcast signal by using a signal such as multimedia broadcasting-terrestrial (DMB-T), digital multimedia broadcasting-satellite (DMB-S), digital video broadcasting-handheld (DVB-H), forward link media (MediaFLO)^@) A digital broadcasting system of a terrestrial digital broadcasting integrated service (ISDB-T), etc. receives digital broadcasting. The broadcast receiving module 111 may be constructed to be suitable for various broadcasting systems that provide broadcast signals as well as the above-mentioned digital broadcasting systems. The broadcast signal and/or broadcast associated information received via the broadcast receiving module 111 may be stored in the memory 160 (or other type of storage medium).

The mobile communication module 112 transmits and/or receives radio signals to and/or from at least one of a base station (e.g., access point, node B, etc.), an external terminal, and a server. Such radio signals may include voice call signals, video call signals, or various types of data transmitted and/or received according to text and/or multimedia messages.

The wireless internet module 113 supports wireless internet access of the mobile terminal. The module may be internally or externally coupled to the terminal. The wireless internet access technology to which the module relates may include WLAN (wireless LAN) (Wi-Fi), Wibro (wireless broadband), Wimax (worldwide interoperability for microwave access), HSDPA (high speed downlink packet access), and the like.

The short-range communication module 114 is a module for supporting short-range communication. Some examples of short-range communication technologies include bluetooth^TMRadio Frequency Identification (RFID), infrared data association (IrDA), Ultra Wideband (UWB), zigbee^TMAnd so on.

The location information module 115 is a module for checking or acquiring location information of the mobile terminal. A typical example of the location information module is a GPS (global positioning system). According to the current technology, the GPS module 115 calculates distance information and accurate time information from three or more satellites and applies triangulation to the calculated information, thereby accurately calculating three-dimensional current location information according to longitude, latitude, and altitude. Currently, a method for calculating position and time information uses three satellites and corrects an error of the calculated position and time information by using another satellite. In addition, the GPS module 115 can calculate speed information by continuously calculating current position information in real time.

The a/V input unit 120 is used to receive an audio or video signal. The a/V input unit 120 may include a camera 121 and a microphone 1220, and the camera 121 processes image data of still pictures or video obtained by an image capturing apparatus in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 151. The image frames processed by the camera 121 may be stored in the memory 160 (or other storage medium) or transmitted via the wireless communication unit 110, and two or more cameras 1210 may be provided according to the construction of the mobile terminal. The microphone 122 may receive sounds (audio data) via the microphone in a phone call mode, a recording mode, a voice recognition mode, or the like, and can process such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the mobile communication module 112 in case of a phone call mode. The microphone 122 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.

The user input unit 130 may generate key input data according to a command input by a user to control various operations of the mobile terminal. The user input unit 130 allows a user to input various types of information, and may include a keyboard, dome sheet, touch pad (e.g., a touch-sensitive member that detects changes in resistance, pressure, capacitance, and the like due to being touched), scroll wheel, joystick, and the like. In particular, when the touch pad is superimposed on the display unit 151 in the form of a layer, a touch screen may be formed.

The sensing unit 140 detects a current state of the mobile terminal 100 (e.g., an open or closed state of the mobile terminal 100), a position of the mobile terminal 100, presence or absence of contact (i.e., touch input) by a user with the mobile terminal 100, an orientation of the mobile terminal 100, acceleration or deceleration movement and direction of the mobile terminal 100, and the like, and generates a command or signal for controlling an operation of the mobile terminal 100. For example, when the mobile terminal 100 is implemented as a slide-type mobile phone, the sensing unit 140 may sense whether the slide-type phone is opened or closed. In addition, the sensing unit 140 can detect whether the power supply unit 190 supplies power or whether the interface unit 170 is coupled with an external device. The sensing unit 140 may include a proximity sensor 1410 as will be described below in connection with a touch screen.

The interface unit 170 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The identification module may store various information for authenticating a user using the mobile terminal 100 and may include a User Identity Module (UIM), a Subscriber Identity Module (SIM), a Universal Subscriber Identity Module (USIM), and the like. In addition, a device having an identification module (hereinafter, referred to as an "identification device") may take the form of a smart card, and thus, the identification device may be connected with the mobile terminal 100 via a port or other connection means. The interface unit 170 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal and the external device.

In addition, when the mobile terminal 100 is connected with an external cradle, the interface unit 170 may serve as a path through which power is supplied from the cradle to the mobile terminal 100 or may serve as a path through which various command signals input from the cradle are transmitted to the mobile terminal. Various command signals or power input from the cradle may be used as signals for recognizing whether the mobile terminal is accurately mounted on the cradle. The output unit 150 is configured to provide output signals (e.g., audio signals, video signals, alarm signals, vibration signals, etc.) in a visual, audio, and/or tactile manner. The output unit 150 may include a display unit 151, an audio output module 152, an alarm unit 153, and the like.

The display unit 151 may display information processed in the mobile terminal 100. For example, when the mobile terminal 100 is in a phone call mode, the display unit 151 may display a User Interface (UI) or a Graphical User Interface (GUI) related to a call or other communication (e.g., text messaging, multimedia file downloading, etc.). When the mobile terminal 100 is in a video call mode or an image capturing mode, the display unit 151 may display a captured image and/or a received image, a UI or GUI showing a video or an image and related functions, and the like.

Meanwhile, when the display unit 151 and the touch pad are overlapped with each other in the form of a layer to form a touch screen, the display unit 151 may serve as an input device and an output device. The display unit 151 may include at least one of a Liquid Crystal Display (LCD), a thin film transistor LCD (TFT-LCD), an Organic Light Emitting Diode (OLED) display, a flexible display, a three-dimensional (3D) display, and the like. Some of these displays may be configured to be transparent to allow a user to view from the outside, which may be referred to as transparent displays, and a typical transparent display may be, for example, a TOLED (transparent organic light emitting diode) display or the like. Depending on the particular desired implementation, the mobile terminal 100 may include two or more display units (or other display devices), for example, the mobile terminal may include an external display unit (not shown) and an internal display unit (not shown). The touch screen may be used to detect a touch input pressure as well as a touch input position and a touch input area.

The audio output module 152 may convert audio data received by the wireless communication unit 110 or stored in the memory 160 into an audio signal and output as sound when the mobile terminal is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output module 152 may provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output module 152 may include a speaker, a buzzer, and the like.

The alarm unit 153 may provide an output to notify the mobile terminal 100 of the occurrence of an event. Typical events may include call reception, message reception, key signal input, touch input, and the like. In addition to audio or video output, the alarm unit 153 may provide output in different ways to notify the occurrence of an event. For example, the alarm unit 153 may provide an output in the form of vibration, and when a call, a message, or some other incoming communication (incomingmunication) is received, the alarm unit 153 may provide a tactile output (i.e., vibration) to inform the user thereof. By providing such a tactile output, the user can recognize the occurrence of various events even when the user's mobile phone is in the user's pocket. The alarm unit 153 may also provide an output notifying the occurrence of an event via the display unit 151 or the audio output module 152.

The memory 160 may store software programs and the like for processing and controlling operations performed by the controller 180, or may temporarily store data (e.g., a phonebook, messages, still images, videos, and the like) that has been or will be output. Also, the memory 160 may store data regarding various ways of vibration and audio signals output when a touch is applied to the touch screen.

The memory 160 may include at least one type of storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. Also, the mobile terminal 100 may cooperate with a network storage device that performs a storage function of the memory 160 through a network connection.

The controller 180 generally controls the overall operation of the mobile terminal. For example, the controller 180 performs control and processing related to voice calls, data communications, video calls, and the like. In addition, the controller 180 may include a multimedia module 1810 for reproducing (or playing back) multimedia data, and the multimedia module 1810 may be constructed within the controller 180 or may be constructed separately from the controller 180. The controller 180 may perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as a character or an image.

The power supply unit 190 receives external power or internal power and provides appropriate power required to operate various elements and components under the control of the controller 180.

The various embodiments described herein may be implemented in a computer-readable medium using, for example, computer software, hardware, or any combination thereof. For a hardware implementation, the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, an electronic unit designed to perform the functions described herein, and in some cases, such embodiments may be implemented in the controller 180. For a software implementation, the implementation such as a process or a function may be implemented with a separate software module that allows performing at least one function or operation. The software codes may be implemented by software applications (or programs) written in any suitable programming language, which may be stored in the memory 160 and executed by the controller 180.

Up to this point, mobile terminals have been described in terms of their functionality. Hereinafter, a slide-type mobile terminal among various types of mobile terminals, such as a folder-type, bar-type, swing-type, slide-type mobile terminal, and the like, will be described as an example for the sake of brevity. Accordingly, the present invention can be applied to any type of mobile terminal, and is not limited to a slide type mobile terminal.

The mobile terminal 100 as shown in fig. 1 may be configured to operate with communication systems such as wired and wireless communication systems and satellite-based communication systems that transmit data via frames or packets.

A communication system in which a mobile terminal according to the present invention is operable will now be described with reference to fig. 2.

Such communication systems may use different air interfaces and/or physical layers. For example, the air interface used by the communication system includes, for example, Frequency Division Multiple Access (FDMA), Time Division Multiple Access (TDMA), Code Division Multiple Access (CDMA), and Universal Mobile Telecommunications System (UMTS) (in particular, Long Term Evolution (LTE)), global system for mobile communications (GSM), and the like. By way of non-limiting example, the following description relates to a CDMA communication system, but such teachings are equally applicable to other types of systems.

Referring to fig. 2, the CDMA wireless communication system may include a plurality of mobile terminals 100, a plurality of Base Stations (BSs) 270, Base Station Controllers (BSCs) 275, and a Mobile Switching Center (MSC) 280. The MSC280 is configured to interface with a Public Switched Telephone Network (PSTN) 290. The MSC280 is also configured to interface with a BSC275, which may be coupled to the base station 270 via a backhaul. The backhaul may be constructed according to any of several known interfaces including, for example, E1/T1, ATM, IP, PPP, frame Relay, HDSL, ADSL, or xDSL. It will be understood that a system as shown in fig. 2 may include multiple BSCs 2750.

Each BS270 may serve one or more sectors (or regions), each sector covered by a multi-directional antenna or an antenna pointing in a particular direction being radially distant from the BS 270. Alternatively, each partition may be covered by two or more antennas for diversity reception. Each BS270 may be configured to support multiple frequency allocations, with each frequency allocation having a particular frequency spectrum (e.g., 1.25MHz,5MHz, etc.).

The intersection of partitions with frequency allocations may be referred to as a CDMA channel. The BS270 may also be referred to as a Base Transceiver Subsystem (BTS) or other equivalent terminology. In such a case, the term "base station" may be used to generically refer to a single BSC275 and at least one BS 270. The base stations may also be referred to as "cells". Alternatively, each sector of a particular BS270 may be referred to as a plurality of cell sites.

As shown in fig. 2, a Broadcast Transmitter (BT)295 transmits a broadcast signal to the mobile terminal 100 operating within the system. A broadcast receiving module 111 as shown in fig. 1 is provided at the mobile terminal 100 to receive a broadcast signal transmitted by the BT 295. In fig. 2, several Global Positioning System (GPS) satellites 300 are shown. The satellite 300 assists in locating at least one of the plurality of mobile terminals 100.

In fig. 2, a plurality of satellites 300 are depicted, but it is understood that useful positioning information may be obtained with any number of satellites. The GPS module 115 as shown in fig. 1 is generally configured to cooperate with satellites 300 to obtain desired positioning information. Other techniques that can track the location of the mobile terminal may be used instead of or in addition to GPS tracking techniques. In addition, at least one GPS satellite 300 may selectively or additionally process satellite DMB transmission.

As a typical operation of the wireless communication system, the BS270 receives reverse link signals from various mobile terminals 100. The mobile terminal 100 is generally engaged in conversations, messaging, and other types of communications. Each reverse link signal received by a particular base station 270 is processed within the particular BS 270. The obtained data is forwarded to the associated BSC 275. The BSC provides call resource allocation and mobility management functions including coordination of soft handoff procedures between BSs 270. The BSCs 275 also route the received data to the MSC280, which provides additional routing services for interfacing with the PSTN 290. Similarly, the PSTN290 interfaces with the MSC280, the MSC interfaces with the BSCs 275, and the BSCs 275 accordingly control the BS270 to transmit forward link signals to the mobile terminal 100.

Based on the above optional mobile terminal hardware structure and communication system, various embodiments of the method of the present invention are proposed.

As shown in fig. 3, a first embodiment of the present invention proposes a character pickup apparatus 1, which includes: a shooting module 01 and a playing module 02.

And the shooting module 01 is used for shooting an object in front of a camera of the terminal where the shooting module is located in a preset character pickup mode.

And the playing module 02 is used for converting the characters in the shot dynamic images into voice for playing.

Optionally, the apparatus further comprises: a detection module 03 and a mode entry module 04.

The detecting module 03 is configured to detect a trigger condition of the text pick-up mode.

And a mode entering module 04, configured to enter a text pickup mode when the trigger condition is detected and determined to be valid.

Optionally, the shooting module 01 shooting an object in front of a camera of a terminal where the module is located includes:

And adjusting the focal length according to preset conditions.

Optionally, the apparatus further comprises a reminder module 05.

Optionally, the preset conditions include: the size of the text.

The photographing module 01 adjusting the focal length according to the preset condition includes:

And comparing the detected character size with a preset character size.

Optionally, the apparatus further comprises: a first determination module 06.

The first determining module 06 is configured to determine a preset text size according to a fingerprint size of a user before adjusting the focal length according to a preset condition.

Optionally, the determining, by the first determining module 06, the preset character size according to the fingerprint size of the user includes:

The fingerprint height and width are extracted from the fingerprint size.

Optionally, the apparatus further comprises: a second determination module 07.

And a second determining module 07, configured to detect a touch operation on the captured dynamic image and determine a touch position.

The playing module 02 is further configured to convert the text corresponding to the touch position into voice for playing.

Optionally, the apparatus further comprises: a text position determination module 08.

The text position determination module 08 is configured to:

after the touch position is determined, comparing the coordinates of the touch position with the coordinates of each character in the dynamic image, and determining that the touch position corresponds to the character when the coordinates of the touch position are consistent with the coordinates of any character in the dynamic image; and when the coordinates of the touch position are inconsistent with the coordinates of each character in the photo, determining that the touch position does not correspond to the character.

Optionally, the text position determining module 08 is further configured to:

And controlling a preset motor in the corresponding direction to vibrate.

Optionally, the converting, by the playing module 03, the text corresponding to the touch position into the voice for playing includes:

Optionally, the apparatus further comprises: a module 09 is provided.

The setting module 09 is configured to:

In addition, to achieve the above object, the present invention further provides a text picking method, as shown in fig. 4 and 5, the method includes S101-S102:

s101, shooting an object in front of a camera of the terminal in a preset character picking mode.

In the embodiment of the invention, in order to help the blind or the amblyopic person read the characters on various objects such as paper, gravestone, license plate and the like so as to be convenient for the blind or the amblyopic person to know the character content, the scheme of the embodiment of the invention can shoot the object in front of the terminal through the terminal, capture the character information in the shot dynamic image and play the character information in a voice mode, thereby solving the problem that the blind or the amblyopic person cannot read due to the vision problem.

In the embodiment of the present invention, in order to distinguish from a general photographing or shooting action, the scheme of the embodiment of the present invention needs to be completed in a preset mode, such as the above-mentioned text pickup mode, where the text extraction mode is used to search, through a camera of the terminal, for an object located in front of the camera and opposite to the camera, where the object contains text information, shoot the object, and convert the text information in the shot dynamic image into voice information for playing. In the case of voice playback, the information is not limited to text information, and may be digital information, symbol information, or the like. The dynamic image may be a video image captured by the camera, or a real-time dynamic image captured by the camera during the capturing.

In the embodiment of the present invention, the text pick-up mode may be entered through the following scheme.

Optionally, the method further comprises S201-S202:

s201, detecting a trigger condition of a character picking mode. Wherein the trigger condition comprises a finger operation and/or a voice command.

In the embodiment of the invention, the terminal can detect the triggering condition of the message processing mode in real time or periodically. In addition, in order to save terminal resources, the trigger condition may also be obtained by a message notification manner, for example, when a certain finger operation or voice command is detected by a preset pressure sensor, a fingerprint recognition device, a scanning device, a voice recognition device, a key (the case includes a hardware key and a software key), and the like, a notification message is sent out, so that the terminal confirms whether the finger operation or voice command is the trigger condition of the text pickup mode. It should be noted that the triggering condition may include, but is not limited to, a finger operation and/or a voice command. In various embodiments, the trigger condition may be set to any one of operations or commands, etc. that may be implemented. For example, the trigger condition may also be a volley gesture, which is detected by a preset proximity sensor in the terminal.

S202, when the trigger condition is detected and determined to be effective, entering a character picking mode.

In the embodiment of the present invention, after the trigger condition of the character picking mode is detected in step S201, the validity of the trigger condition needs to be determined. For example, when a pressing operation on a trigger key of a certain preset character picking mode is detected, the duration of the pressing operation needs to be detected, and when the duration of the pressing operation is less than or equal to a preset time threshold, it may be determined that the pressing operation is invalid, that is, the trigger condition of the character picking mode is invalid. For another example, when a preset proximity sensor detects a volley gesture triggering a message processing mode, if the holding time of the volley gesture is less than or equal to a preset time threshold, it may also be determined that the volley gesture is invalid, i.e., the triggering condition of the text pickup mode is invalid. Through the scheme of the embodiment of the invention, the occurrence of misoperation can be effectively prevented.

In the embodiment of the invention, when the detected triggering condition is determined to be effective, the terminal can be triggered to enter a preset character picking mode. In the character pickup mode, a user can shoot an object in front of the terminal, so that the terminal converts characters in a shot dynamic image into voice, and the voice is conveniently played to the terminal user.

In the embodiment of the invention, the object in front of the terminal can be shot by the following scheme.

Optionally, the shooting of the object in front of the camera of the terminal comprises S301-S302:

s301, detecting an object in front of a camera; wherein the object includes textual information on a side thereof opposite the camera.

In the embodiment of the invention, the preset character picking mode is mainly used for extracting characters in the dynamic image so as to convert the problems into voice for playing. Therefore, in the text pickup mode, the terminal detects an object including text information in a shooting scene of the terminal when shooting. In the embodiment of the invention, the detection and identification process of the characters can be finished through a preset image identification system.

Optionally, the method further comprises: and when the surface of the object opposite to the camera does not contain the character information, sending out reminding information. The reminding information comprises: vibration of the motor at a preset position.

In the embodiment of the invention, before the terminal shoots, when the terminal does not detect that an object containing the text information exists in the current scene of the terminal, the terminal can send out the preset reminding information in order to remind a user of changing the shooting scene, particularly remind a blind person or a person with weak sight. It should be noted that the reminding message may include one or more of the following: ringing, music, voice, vibration, flashing lights. For example, a motor at a preset position of the terminal may be controlled to generate vibration. Because the terminal can comprise one or more motors which are respectively arranged at different positions to realize different functions, when the object at the current position of the terminal does not comprise the text information, the motor at a certain preset position is only enabled to generate vibration, thereby achieving the purpose of reminding the user. The preset position may be any position on the terminal as long as it is convenient for the user to sense the vibration of the motor.

And S302, adjusting the focal length according to preset conditions.

In the embodiment of the invention, after capturing an object containing text information in a shooting scene, the terminal needs to focus the terminal camera according to a preset condition so as to shoot a dynamic image meeting the preset condition.

Optionally, the preset condition includes: the size of the text.

In the embodiment of the present invention, the size of the text in the text portion can be adjusted to a proper value by focusing, so that when a user clicks, a phenomenon that a click error is caused by too small text in a photo is prevented, especially for blind people and visually impaired people, when the user does not need to directly listen to voice information directly converted from text information in a dynamic image (for example, when the user wants to exercise finger touch capability), the user can determine the selected text and listen to the content of the text by clicking with a finger, and in the case that the user cannot see or cannot see the text in the dynamic image, if the text is too small, the user is prone to click an error all the time, as shown in fig. 6, which will bring a poor experience to the user. Therefore, focusing is required to be performed before shooting, so that a shot dynamic image meets the required character size, and a user can click conveniently.

In the embodiment of the present invention, as can be seen from the above, before shooting, the standard of the character size needs to be predetermined, so that the terminal directly uses the preset value as the basis for focusing when focusing. Since the predetermined size of the characters in the moving image is determined to avoid the click error caused by too small characters, too large characters can contain too few characters in the moving image. Therefore, in the embodiment of the present invention, the text size criterion of the moving image may be determined according to the size or the size of the user's finger. Specifically, it can be realized by the following scheme.

Optionally, determining the preset text size according to the fingerprint size of the user includes S401-S402:

s401, collecting fingerprint information when a user touches a terminal screen; the fingerprint information includes a fingerprint size.

In the embodiment of the invention, the terminal can acquire and store the fingerprint information of the user when the user touches the terminal screen according to the historical use condition of the user, and can also acquire the fingerprint information of the user in a preset fingerprint information acquisition mode and acquire the size information of the fingerprint.

S402, extracting the height and width of the fingerprint from the fingerprint size.

In an embodiment of the present invention, the fingerprint size of the user includes the height and width of the fingerprint. In the scheme of the embodiment of the invention, the fingerprint height refers to the maximum distance between the longitudinal contour lines in the acquired fingerprint contour; the fingerprint width refers to the maximum distance between the transverse contour lines in the acquired fingerprint profile. Since the fingerprint profiles obtained each time fingerprint identification is performed may not be identical, an average value of the height and width of a fingerprint may be obtained as standard values of the height and width of the fingerprint by averaging over a plurality of times. In addition, in order to obtain a sufficiently large character size in photographing, a maximum value can be selected from among a plurality of acquisitions as standard values of the height and width of the fingerprint.

And S403, determining the height and width of the fingerprint as the height and width of characters in the preset character size.

In the embodiment of the invention, after the standard values of the height and the width of the fingerprint are obtained, the standard height and width of the fingerprint can be used as the standard for determining the character size. For example, the fingerprint height and width are directly used as the character height and width in the preset character size, or the fingerprint height and width are enlarged by a preset ratio to be used as the character height and width in the preset character size. For example, the predetermined ratio may be 1%, 5%, or the like. The preset ratio cannot be set too large here to avoid too large text making the picture contain too little text. In addition, when the size of the characters is determined, the height and the width of the characters in the size of the characters do not need to be determined at the same time, and one of the heights and the widths of the characters can be determined according to the touch habits of the user. For example, if the user is accustomed to lateral finger touch, only the width of the text can be determined; the user is used to touch the finger vertically, and then only the height of the character can be determined.

Through the scheme, the standard of the size of the characters during shooting can be obtained, and the dynamic images of the characters suitable for the user can be obtained by focusing the camera according to the standard.

In the embodiment of the present invention, when focusing is performed according to the size of characters, the focusing operation can be specifically completed by the following scheme.

Optionally, adjusting the focal length according to the preset condition includes S401-S404:

s401, detecting the size of characters in the dynamic image under the current focal length.

In the embodiment of the present invention, before focusing according to the preset character size, the character size in the dynamic image acquired by the camera at the current focal length may be detected to determine whether the character size meets the preset standard character size, and the dynamic image may be adjusted according to the current character size. In the embodiment of the present invention, the detection of the size of the text in the dynamic image at the current focal length can also be implemented by performing image recognition by using a preset image recognition device.

S402, comparing the detected character size with a preset character size.

In the embodiment of the present invention, after the size of the text in the dynamic image at the current focal length is detected, the specific information of the size of the text in the dynamic image at the current focal length is obtained by comparing the size of the text with a preset size of the text, and the following processing is performed for different comparison results.

And S403, when the size of the detected characters is consistent with the preset character size, keeping the current focal length.

In the embodiment of the present invention, when the detected text size is consistent with the preset text size, that is, the detected text size is completely the same as the preset text size or the difference is smaller than or equal to the preset difference threshold, the current focal length may be used as the shooting focal length.

S404, when the detected character size is inconsistent with the preset character size, adjusting the focal length of the camera to be a first focal length, and enabling the character size in the dynamic image to be consistent with the preset character size.

In the embodiment of the present invention, when the detected text size is not consistent with the preset text size, that is, the detected text size is completely different from the preset text size, and the difference is greater than the preset difference threshold, the current focal length may be adjusted, so that the text size in the dynamic image is consistent with the preset text size, and the adjusted focal length, that is, the first focal length in the embodiment of the present invention is determined as the shooting focal length.

And S303, taking the central area of the character part in the object including the character information as a shooting focus and shooting.

In the embodiment of the present invention, after the focal length of the camera is determined, in order to make the captured dynamic image mainly contain text portions, the central area of the text portions in the object including text information may be used as the shooting focus.

In the embodiment of the invention, the proper shooting focal length and focus can be obtained through the adjustment, and the character image suitable for the user can be obtained by shooting according to the focal length and focus.

Optionally, detecting a touch operation on the shot dynamic image and determining a touch position; and converting the characters corresponding to the touch position into voice for playing.

In the embodiment of the invention, after the dynamic image is shot by the scheme, a user can obtain the character information in the photo according to the dynamic image.

The terminal may extract the text information in the moving image through the image recognition device, arrange the extracted text information according to the position in the moving image, and finally obtain the electronic form of the text information in the moving image. After the electronic form of the text information is obtained, the text information in the electronic form can be directly converted into voice information to be played, and corresponding text can also be converted into voice to be played after the touch operation of a user on a dynamic image on a terminal screen is detected. Specifically, after the moving image is shot, the moving image is displayed on an interface of the terminal, a user may perform operations such as touching or clicking on the moving image on the interface of the terminal, the terminal detects the touching or clicking operation, and determines a touched or clicked position, so as to determine a corresponding character according to the position, as shown in fig. 7.

In the embodiment of the present invention, any detection method, algorithm, and apparatus that can be implemented may be used to complete the above-described detection scheme, and the specific detection method, algorithm, and apparatus are not limited.

In the embodiment of the invention, since the specific position on the screen dynamic image cannot be seen or cannot be seen clearly for the blind or the amblyopic person, the touched position is likely to have no characters when being touched. In this case, whether or not there is text at the touch position can be determined by the following scheme.

Optionally, the method further comprises:

after the touch position is determined, comparing the coordinates of the touch position with the coordinates of each character in the dynamic image, and determining that the touch position corresponds to the character when the coordinates of the touch position are consistent with the coordinates of any character in the dynamic image; and when the coordinates of the touch position are inconsistent with the coordinates of each character in the dynamic image, determining that the touch position does not correspond to the character.

In the embodiment of the invention, the terminal can respectively determine the coordinates of each character in the dynamic image displayed on the screen according to the left side of the screen. Similarly, the terminal may further determine specific coordinates of the touch position of the user, and therefore, the terminal may compare the coordinates of the touch position of the user with each character coordinate, and when the two coordinates are consistent, it indicates that the touch position corresponds to the character, that is, the touch position falls on the character, and when the two coordinates are inconsistent, it indicates that the touch position does not correspond to the character, that is, the touch position does not fall on the character. It should be noted that, in the embodiment of the present invention, the consistency means that the differences are completely the same or the difference is smaller than or equal to the preset difference threshold, and the inconsistency means that the differences are completely different or the difference is larger than the preset difference threshold.

Optionally, the method further comprises S501-S502:

s501, after the touch position is determined, when no corresponding character exists at the touch position, detecting the position of a first character closest to the current touch position.

In the embodiment of the invention, under the condition that no corresponding characters exist at the touch position, the terminal is required to give a prompt so that the user can adjust the touch position in time. In the embodiment of the present invention, the terminal may first detect a character closest to the current touch position, and determine the position of the character on the terminal screen, so that the user guides the user to move the finger to the corresponding position, as shown in fig. 8. The specific guidance scheme can be realized by the following scheme.

S502, determining the relative direction between the position of the first character and the current touch position.

In the embodiment of the present invention, after determining the text closest to the current touch position, such as the position information of the first text in the embodiment of the present invention, for example, the coordinates of the first text on the terminal screen, the relative direction between the current touch position and the position of the first text, for example, the ten o' clock direction, may be determined.

And S503, controlling a preset motor in the corresponding direction to vibrate.

In the embodiment of the present invention, a plurality of direction indication motors may be arranged on the terminal in advance, and after the relative direction between the first text and the current touch position is determined in step S502, the preset motors in the corresponding directions may be controlled to vibrate, so as to guide the user to the direction to be adjusted next. In an embodiment of the present invention, the specific position of the motor may be determined by extending the determined motor along a direction opposite to the current touch position from the center of the terminal screen, as shown in fig. 8.

It should be noted that other guidance schemes may be adopted in other embodiments, and are not limited to the above schemes. For example, the user may be given directions by means of voice prompts, such as "please move left", "please move up". In the embodiment of the invention, when the left side, namely the terminal screen faces to the user, the direction indicated by the negative direction of the abscissa; the left side, namely the direction indicated by the positive direction of the abscissa when the terminal screen faces the user; when the upper side, namely the terminal screen faces the user, the positive direction of the ordinate indicates the direction; the lower side, i.e. the direction indicated by the negative direction of the ordinate, when the terminal screen is facing the user.

And S102, converting the characters corresponding to the touch position into voice for playing.

In the embodiment of the invention, after the characters at the touch position of the user are detected through the scheme or the user is guided to touch the characters, the characters corresponding to the touch position can be converted into voice information to be played. It should be noted that, since the conversion of the text information into the voice information is already a relatively mature technology, it is not described herein any more, and no specific limitations are imposed on the selected conversion method, algorithm, software, device, and the like.

In addition, the conversion process from the text to the speech in the dynamic image may be directly performed when the dynamic image of the text is obtained, that is, directly performed in the shooting process, or may be performed after the text touched by the user is determined, and the specific manner may be defined by itself according to the application scene of the user, which is not limited herein.

In the embodiment of the present invention, when the text information in the dynamic image is directly converted into the voice information when the text dynamic image is acquired, the text in the dynamic image may be directly played in a preset sequence, for example, a sequence from top to bottom and/or a sequence from left to right, or the voice may be played when the user touches the corresponding text according to the above scheme. In order to adapt to the random selection of the two playing modes, corresponding playing modes can be preset, for example, a playing mode and an automatic playing mode are selected. In the selected play mode, the touch operation of the user needs to be detected, so that the corresponding characters at the touch position are played. In the automatic playing mode, the characters in the dynamic image can be automatically played in voice according to the preset sequence.

In addition, in the above-mentioned selective playing mode, in order to make the user quickly understand the text content in the moving image and improve the playing efficiency, the following playing method may be adopted.

In the embodiment of the invention, when the text corresponding to the position touched by the user is detected to be in a row or a column of text, the content corresponding to the row or the column of text can be directly played to the user. In addition, if the line of text has one or more adjacent lines of text, a reminder, such as a voice reminder, may be issued to the user to remind the user whether the next or previous line of text needs to be played back. Similarly, if the line of characters has one or more adjacent columns of characters, a prompt can be sent to the user to prompt the user whether to continue playing the next or previous column of character content. The user can adopt a voice confirmation mode or a preset operation confirmation mode to feed back the reminding. And the terminal plays the next line or the next column of text content according to the feedback result, or stops playing.

In the embodiment of the present invention, before the terminal identifies the text of a row or a column, the terminal is required to define a concept of a row or a column in advance, so that the terminal can confirm whether the text of a row or a column exists according to the predefined definition. The specific implementation can be realized through the following scheme.

Optionally, the method further comprises:

In the embodiment of the invention, the terminal can detect the distance between each character in the dynamic image and the adjacent character, can determine the coordinate of each character, and determines which characters are on the same straight line according to the coordinate value of each character. Therefore, based on the terminal function, and according to the concept of rows and columns, a plurality of characters on a row of characters, namely, adjacent characters in the longitudinal direction, which are all kept at a preset first interval and are positioned on the same straight line in the transverse direction can be determined; and a plurality of characters which are adjacent to each other in the transverse direction, keep a preset second interval and are positioned on the same straight line in the longitudinal direction.

In the embodiment of the present invention, specific values of the first pitch and the second pitch in the above scheme are not limited. The first distance and the second distance may have different values according to different application scenarios.

While all the essential features of the embodiments of the present invention have been described, it should be noted that the above description is one or more specific implementations of the embodiments of the present invention, and other implementations may be adopted in other embodiments, and any combination of the essential features of the embodiments of the present invention, which are the same as or similar to the embodiments of the present invention, is within the protection scope of the embodiments of the present invention.

The invention provides a character pick-up device and a method, wherein the device comprises: shooting module and play module. The shooting module shoots an object in front of a camera of the terminal where the shooting module is located in a preset character picking mode. The playing module converts the characters in the shot dynamic images into voice for playing. Through the scheme of the embodiment of the invention, the text content can be known through the terminal, and the problem that the blind or amblyopia people cannot read due to the vision problem is solved.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A word pick-up device, the device comprising: the device comprises a shooting module, an extraction and arrangement module, a second determination module and a playing module;

the shooting module is used for shooting an object in front of a camera of the terminal where the shooting module is located in a preset character picking mode;

the extraction and arrangement module is used for extracting the character information in the shot dynamic image and arranging the extracted character information according to the position in the dynamic image;

the second determining module is used for detecting touch operation on the shot dynamic image and determining a touch position;

and the playing module is used for converting the characters corresponding to the touch position into voice for playing.

2. The text pick-up device of claim 1, wherein the device further comprises: the device comprises a detection module and a mode entering module;

the detection module is used for detecting the triggering condition of the character picking mode;

and the mode entering module is used for entering the character picking mode when the triggering condition is detected and determined to be effective.

3. The device for picking up letters according to claim 1, wherein the photographing module photographing an object in front of a camera of a terminal where the photographing module is located comprises:

detecting an object in front of the camera; wherein a side of the object opposite to the camera includes text information;

adjusting the focal length according to preset conditions;

taking the central area of the character part in the object comprising the character information as a shooting focus and shooting;

wherein, the module of shooing is according to presetting condition adjustment focus, includes:

detecting the size of characters in the dynamic image under the current focal length; comparing the detected character size with a preset character size; when the detected character size is consistent with the preset character size, keeping the current focal length; and when the detected character size is inconsistent with the preset character size, adjusting the focal length of the camera to be a first focal length to enable the character size in the dynamic image to be consistent with the preset character size.

4. The text pick-up device of claim 1, wherein the extraction arrangement module is further configured to:

keeping a preset first interval between adjacent characters in the longitudinal direction, and taking a plurality of characters on the same straight line in the transverse direction as a line of characters;

5. The text pick-up device of claim 1, wherein the device further comprises: a text position determination module;

the text position determination module is configured to:

after the touch position is determined, comparing the coordinates of the touch position with the coordinates of each character in the photo, and determining that the touch position corresponds to the character when the coordinates of the touch position are consistent with the coordinates of any character in the dynamic image; when the coordinates of the touch position are inconsistent with the coordinates of each character in the dynamic image, determining that the touch position does not correspond to the character;

when the touch position does not have the corresponding character, detecting the position of a first character closest to the current touch position; determining the relative direction of the position of the first character and the current touch position; and controlling a preset motor in the corresponding direction to vibrate.

6. A method for picking up a character, the method comprising:

shooting an object in front of a camera of a terminal in a preset character pickup mode;

extracting character information in the shot dynamic image, and arranging the extracted character information according to the position in the dynamic image;

detecting a touch operation on the arranged shot dynamic images and determining a touch position;

7. The method of picking up text according to claim 6, wherein the method further comprises:

detecting a trigger condition of the character picking mode;

and entering the character picking mode when the triggering condition is detected and determined to be effective.

8. The method for picking up letters according to claim 6, wherein the photographing of the object in front of the camera of the terminal comprises:

adjusting the focal length according to preset conditions;

wherein the adjusting the focal length according to the preset condition comprises:

9. The method of picking up text according to claim 6, wherein the method further comprises:

10. The method of picking up text according to claim 6, wherein the method further comprises: