CN112115696A - Data processing method and device and recording equipment - Google Patents

Data processing method and device and recording equipment Download PDF

Info

Publication number
CN112115696A
CN112115696A CN202010988800.4A CN202010988800A CN112115696A CN 112115696 A CN112115696 A CN 112115696A CN 202010988800 A CN202010988800 A CN 202010988800A CN 112115696 A CN112115696 A CN 112115696A
Authority
CN
China
Prior art keywords
information
text
text information
target image
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010988800.4A
Other languages
Chinese (zh)
Inventor
崔文华
路呈璋
李健涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN202010988800.4A priority Critical patent/CN112115696A/en
Publication of CN112115696A publication Critical patent/CN112115696A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/189Automatic justification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/16Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters 
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Abstract

The embodiment of the invention provides a data processing method, a data processing device and recording equipment, wherein the method comprises the following steps: the recording equipment acquires a target image; performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image; displaying the text information according to the typesetting information of the text information in the target image; the layout of the displayed text information can be the same as or similar to the layout of the text information in the target image, so that the user can read and understand the information in the image conveniently; thereby improving the user experience.

Description

Data processing method and device and recording equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method, an apparatus, and a recording device.
Background
In recent years, recording apparatuses have been developed rapidly and have entered the public domain as products in professional fields. Recording equipment is generally required for recording by journalists, students, teachers and other groups. In addition, recording of various television programs, movies, music, etc. requires the use of recording equipment.
Along with the universality of the use of the recording equipment, the functions of the recording equipment by users are gradually improved; the function of the existing recording equipment can not meet the user requirement, so that the user experience is poor.
Disclosure of Invention
The embodiment of the invention provides a data processing method, which is convenient for a user to read and understand information in an image.
Correspondingly, the embodiment of the invention also provides a data processing device and a recording device, which are used for ensuring the realization and the application of the method.
In order to solve the above problem, an embodiment of the present invention discloses a data processing method, which specifically includes: the recording equipment acquires a target image; performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image; and displaying the text information according to the typesetting information of the text information in the target image.
Optionally, the determining layout information of the text information in the target image includes: respectively recording corresponding line information of each text in the target image and position information of a line where each text is located; and generating typesetting information of the text information in the target image according to the line information and the position information of each text in the text information.
Optionally, the displaying the text information according to the layout information of the text information in the target image includes: according to the line information and the position information of each text in the text information, paragraph division is carried out on the text information; the segmentation shows the text information.
Optionally, the displaying the text information according to the layout information of the text information in the target image includes: and controlling each text in the text information to be displayed according to the same typesetting as the text in the image according to the line information and the position information of each text in the text information.
Optionally, the method further comprises: and translating the text information to obtain a corresponding translation result and displaying the translation result.
Optionally, the translation result includes: picture translation results and/or text translation results.
Optionally, the method further comprises: receiving a transmission instruction, wherein the transmission instruction comprises at least one of the following: sharing instructions, forwarding instructions and unloading instructions; transmitting the data corresponding to the transmission instruction to other equipment; the data corresponding to the transmission instruction comprises at least one of the following data: target image, text information, and translation result.
Optionally, the method further comprises: acquiring target audio data, wherein the target audio data is associated with the target image, and the target image is acquired by a recording device in the process of recording the target audio data; and performing voice recognition on the target audio data according to the text information, and determining a corresponding voice recognition result.
The embodiment of the invention also discloses a data processing device, which is applied to the recording equipment and specifically comprises the following steps: the image acquisition module is used for acquiring a target image; the text recognition module is used for performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image; and the display module is used for displaying the text information according to the typesetting information of the text information in the target image.
Optionally, the text information includes a plurality of texts, and the text recognition module is configured to record line information corresponding to each text in the target image and position information of a line where each text is located, respectively; and generating typesetting information of the text information in the target image according to the line information and the position information of each text in the text information.
Optionally, the display module includes: the first layout display submodule is used for carrying out paragraph division on the text information according to the line information and the position information of each text in the text information; the segmentation shows the text information.
Optionally, the display module includes: and the second typesetting and displaying submodule is used for controlling the texts in the text information to be typesetted and displayed in the same way as the texts in the images according to the line information and the position information of the texts in the text information.
Optionally, the apparatus further comprises: and the translation module is used for translating the text information to obtain a corresponding translation result and displaying the translation result.
Optionally, the translation result includes: picture translation results and/or text translation results.
Optionally, the apparatus further comprises: a data transmission module, configured to receive a transmission instruction, where the transmission instruction includes at least one of: sharing instructions, forwarding instructions and unloading instructions; transmitting the data corresponding to the transmission instruction to other equipment; the data corresponding to the transmission instruction comprises at least one of the following data: target image, text information, and translation result.
Optionally, the apparatus further comprises: the voice recognition module is used for acquiring target audio data, the target audio data is associated with the target image, and the target image is acquired by the recording equipment in the process of recording the target audio data; and performing voice recognition on the target audio data according to the text information, and determining a corresponding voice recognition result.
The embodiment of the invention also discloses a readable storage medium, and when the instructions in the storage medium are executed by the processor of the recording device, the recording device can execute the data processing method according to any one of the embodiments of the invention.
The embodiment of the invention also discloses a sound recording device, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise instructions for: acquiring a target image; performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image; and displaying the text information according to the typesetting information of the text information in the target image.
Optionally, the determining layout information of the text information in the target image includes: respectively recording corresponding line information of each text in the target image and position information of a line where each text is located; and generating typesetting information of the text information in the target image according to the line information and the position information of each text in the text information.
Optionally, the displaying the text information according to the layout information of the text information in the target image includes: according to the line information and the position information of each text in the text information, paragraph division is carried out on the text information; the segmentation shows the text information.
Optionally, the displaying the text information according to the layout information of the text information in the target image includes: and controlling each text in the text information to be displayed according to the same typesetting as the text in the image according to the line information and the position information of each text in the text information.
Optionally, further comprising instructions for: and translating the text information to obtain a corresponding translation result and displaying the translation result.
Optionally, the translation result includes: picture translation results and/or text translation results.
Optionally, further comprising instructions for: receiving a transmission instruction, wherein the transmission instruction comprises at least one of the following: sharing instructions, forwarding instructions and unloading instructions; transmitting the data corresponding to the transmission instruction to other equipment; the data corresponding to the transmission instruction comprises at least one of the following data: target image, text information, and translation result.
Optionally, further comprising instructions for: acquiring target audio data, wherein the target audio data is associated with the target image, and the target image is acquired by a recording device in the process of recording the target audio data; and performing voice recognition on the target audio data according to the text information, and determining a corresponding voice recognition result.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, the text recognition function of the image can be added to the recording equipment; when a user uses the recording equipment to perform text recognition on the target image, the recording equipment can acquire the target image; then, performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image; displaying the text information according to the typesetting information of the text information in the target image; the layout of the displayed text information can be the same as or similar to the layout of the text information in the target image, so that the user can read and understand the information in the image conveniently; thereby improving the user experience.
Drawings
FIG. 1 is a flow chart of the steps of one data processing method embodiment of the present invention;
FIG. 2 is a flow chart of the steps of an alternative embodiment of a data processing method of the present invention;
FIG. 3 is a flow chart of steps in another data processing method embodiment of the present invention;
FIG. 4 is a flow chart of steps in yet another data processing method embodiment of the present invention;
FIG. 5 is a flow chart of the steps of yet another data processing method embodiment of the present invention;
FIG. 6 is a block diagram of an embodiment of a data processing apparatus according to the present invention;
FIG. 7 is a block diagram of an alternate embodiment of a data processing apparatus of the present invention;
FIG. 8 is a block diagram illustrating a configuration of a sound recording apparatus for data processing according to an exemplary embodiment.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The data processing method provided by the embodiment of the invention is applied to recording equipment, wherein the recording equipment can be equipment with a recording function, such as a recording pen, translation equipment such as a translation pen, a translator and the like; the embodiments of the present invention are not limited in this regard.
In the embodiment of the invention, an image acquisition module can be arranged in the sound recording equipment so as to add an image acquisition function in the sound recording equipment; thereby enabling the user to use the recording device for image acquisition. Still be provided with the display module in the recording equipment, the display module can include the display screen, can be used for information display.
In many cases, after acquiring an image, a user needs to perform text recognition on the image to acquire text information in the image. Therefore, the embodiment of the invention also adds the text recognition function of the image in the recording equipment so as to meet the use requirements of users and improve the user experience.
On the basis of the above, one of the core concepts of the embodiment of the invention is that the typesetting of the text is recognized while the text in the image is recognized; then, displaying the text information obtained by identification based on the typesetting of the text in the image; the layout of the displayed text information can be the same as or similar to the layout of the text information in the target image; and further, the reading and understanding of the information in the image by the user can be facilitated.
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a data processing method according to the present invention is shown, which may specifically include the following steps:
and step 102, the recording equipment acquires a target image.
In the embodiment of the invention, a user can execute the image acquisition operation in the sound recording device, correspondingly, the sound recording device can receive the image acquisition instruction and then call the image acquisition module arranged in the sound recording device to acquire the image so as to acquire the target image. Of course, the user may also select an image to be subjected to text recognition from the images stored in the sound recording device as the target image.
The target image stored in the recording device can be acquired and stored by calling an image acquisition module arranged in the recording device in advance by the recording device; or after being sent to the recording device by other devices, the recording device may store the data, which is not limited in the present invention. The other device may refer to a device other than a sound recording device.
And 104, performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image.
After the target image is obtained, text recognition can be carried out on the target image, and text information corresponding to the target image is determined; and determining the typesetting information of the text information in the target image while performing text recognition on the target image. Wherein, the layout information can be used for describing the layout of the text in the target image.
And 106, displaying the text information according to the typesetting information of the text information in the target image.
In the embodiment of the invention, when the text information is displayed in the display screen, the text information can be controlled to be displayed according to the typesetting information of the text information in the target image and the typesetting of the text information in the image which is the same as or similar to the text information.
The layout information may be used to represent the layout of the text information in the target image, such as the paragraph where the text information is located, the line spacing between each line of text information, the spacing between each text in the same line of text information, and the like.
In the embodiment of the present invention, the sound recording device may execute steps 102 to 106; or after the sound recording device executes the step 102, the target image is sent to a server, and the server executes the step 104; then returning the text information and the typesetting information of the target image to the recording equipment, and executing the step 106 by the recording equipment; the embodiments of the present invention are not limited in this regard.
In conclusion, in the embodiment of the invention, the text recognition function of the image can be added to the recording device; when a user uses the recording equipment to perform text recognition on the target image, the recording equipment can acquire the target image; then, performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image; displaying the text information according to the typesetting information of the text information in the target image; the layout of the displayed text information can be the same as or similar to the layout of the text information in the target image, so that the user can read and understand the information in the image conveniently; thereby improving the user experience.
In the embodiment of the present invention, the text information may be controlled to be displayed in a similar layout as the text information in the image according to the layout information of the text information in the target image. The method comprises the following specific steps:
referring to fig. 2, a flowchart illustrating steps of an alternative embodiment of the data processing method of the present invention is shown, which may specifically include the following steps:
step 202, the recording device obtains a target image.
In the embodiment of the invention, the target image can be acquired by calling an image acquisition module arranged in the recording equipment in the process of recording the audio data; or the recording process may be a recording process independent of the recording apparatus, and the image acquisition module provided therein is invoked for acquiring, which is not limited in this embodiment of the present invention.
And 204, performing text recognition on the target image, and determining corresponding text information.
The text information corresponding to the target image may be obtained by performing text Recognition on each text of the target image by using an OCR (Optical Character Recognition) Recognition technology.
Wherein the text information comprises a plurality of texts, and the texts can comprise words and characters.
And step 206, respectively recording the line information of each text in the text information in the target image and the position information of the line where each text is located.
And 208, generating typesetting information of the text information in the target image according to the line information and the position information of each text in the text information.
Wherein, while step 204 is executed, the embodiment of the present invention may execute steps 206 to 208.
In the embodiment of the invention, the line information of each text in the target image and the position information of each text in each line can be recorded while each text is identified. The row information may include row number such as row 1, row 2, etc.; and may also include line coordinates, which may include coordinates of pixel points of the four vertices of the line. The representation of the location information may include various manners, such as: the method comprises the steps of obtaining pixel coordinates of a leftmost point of a text, pixel coordinates of a rightmost point of the text, pixel coordinates of a topmost point of the text and pixel coordinates of a bottommost point of the text; wherein. For another example, the method may further include: coordinates of the center position of the text. For example, the text sequence number of the text in the line where the text is located, such as 1 st, 2 nd, and the like, may be included, and the embodiment of the present invention is not limited thereto.
In the embodiment of the present invention, a manner of displaying the text information according to the layout information of the text information in the target image may refer to the step
And step 210, segmenting the text information according to the line information and the position information of each text in the text information.
Step 212, the text information is displayed in a subsection way.
In the embodiment of the present invention, all text information identified from the target image may be segmented according to the line information and the position information of each text in the text information, and the text information may be segmented into a plurality of paragraphs. For each text in the text information, determining a line where the text information is located according to the line information of the text; then, whether the text is the first text of the line where the text is located can be judged according to the position information of the text. If the text is the first text of the line where the text is located, judging whether at least two spaces exist before the text according to the position information of the text; when the text is preceded by at least two spaces, it can be determined that the line on which the text is based is a paragraph start line, which is the starting text of a paragraph start line. The line that is one line above the line that the text belongs to may be taken as the last paragraph of the paragraph that the text belongs to as the ending line of the last paragraph. The text information may then be divided into a plurality of paragraph presentations; the number of paragraphs displayed corresponds to the number of paragraphs in the target image.
In summary, in the embodiment of the present invention, the text information may be divided into paragraphs according to the line information and the position information of each text in the text information; displaying the text information in a subsection way; the number of the paragraphs displayed by the text information is consistent with the number of the paragraphs in the target image, so that the user can clearly distinguish each paragraph, and the user can read and understand the information in the image.
In the embodiment of the present invention, the text information may be controlled to be displayed in the same layout as the text information in the image according to the layout information of the text information in the target image. The method comprises the following specific steps:
referring to fig. 3, a flowchart illustrating steps of another embodiment of a data processing method according to the present invention is shown, which may specifically include the following steps:
step 302, the recording device obtains a target image.
And 304, performing text recognition on the target image, and determining corresponding text information.
And step 306, respectively recording the line information of each text in the text information in the target image and the position information of the line where each text is located.
And 208, generating typesetting information of the text information in the target image according to the line information and the position information of each text in the text information.
Step 302-step 308 are similar to step 202-step 308 described above, and are not described herein again.
And 310, controlling each text in the text information to be displayed according to the same typesetting as the text in the image according to the line information and the position information of each text in the text information.
Then, according to the line information and the position information of each text in the text information, controlling each text in the text information to be displayed according to the same typesetting as that of the text in the image; thereby enabling the text information to be displayed in the same typesetting way as the text information in the target image.
In summary, in the embodiment of the present invention, each text in the text information may be controlled to be displayed in the same layout as the text in the image according to the line information and the position information of each text in the text information; and the displayed text information can be typeset in the target image consistently, so that the reading and understanding of the user on the information in the image are further facilitated.
In the embodiment of the invention, when the language corresponding to the text information in the target image is not the language mastered by the user, the text information in the image can be translated to generate the translation result corresponding to the language mastered by the user, so that the user can understand the translation result conveniently.
Referring to fig. 4, a flowchart illustrating steps of another embodiment of a data processing method according to the present invention is shown, which may specifically include the following steps:
step 402, the recording device obtains a target image.
Step 404, performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image.
And 406, displaying the text information according to the typesetting information of the text information in the target image.
The steps 402 to 406 may refer to the above embodiments, and are not described herein again.
And step 408, translating the text information to obtain a corresponding translation result.
And step 410, displaying the translation result.
After the text information is identified, the text information can be translated, and a corresponding translation result is determined. The target language can be determined, and then the text information is translated to obtain a translation result corresponding to the target language. Wherein the target language may be a language that is mastered by the user.
In an optional embodiment of the invention, only the translation result can be displayed, and no text information is displayed; the translation result and the text information can also be displayed at the same time; the embodiments of the present invention are not limited in this regard.
In an optional embodiment of the present invention, the translation result may be a picture translation result. The text information can be translated, and after the corresponding translation text information is determined, a picture translation result is generated based on the translation text information. The target image and the translation text information can be synthesized to generate a picture translation result. For example, the translated text information may be overlaid on the text information corresponding to the translated text information in the target image; for another example, the translation text information may be added to the target image at a position associated with the text information corresponding to the translation text information; thereby facilitating the comparison and check of the user. For example, when the target image is an image of a presentation, translation text information corresponding to each line of text may be added at a position between the line of text and the next/previous line of text. For example, when the target image is an image of a menu, the translated dish name may be overlaid on the dish name corresponding to the translated dish name in the target image. Of course, the translated text information may also be adopted to generate a new picture as the picture translation result according to the typesetting mode of the text in the target image.
In the embodiment of the present invention, the translation result may also be a text translation result; i.e. the translated text information can be directly used as the text translation result. When the translation result is a text translation result, if the translation result and the text information are displayed at the same time, the translation result and the text information may be displayed in a contrasting manner.
In an optional embodiment of the present invention, the method further comprises: receiving a transmission instruction, wherein the transmission instruction comprises at least one of the following: sharing instructions, forwarding instructions and unloading instructions; transmitting the data corresponding to the transmission instruction to other equipment; the data corresponding to the transmission instruction comprises at least one of the following data: target image, text information, and translation result. The user can transmit one or more of the target image, the text information of the target image and the translation result of the target image to other equipment; the target image, the text information of the target image and the translation result of the target image are convenient for the user to use in other equipment.
In an optional embodiment of the present invention, when the target image is acquired during recording of the target audio data by the recording device, the target image may be associated with the target audio data; the data can be recorded from multiple dimensions in an associated manner, so that the user can conveniently use the recorded data of multiple dimensions at the same time, and the user experience is improved.
How to collect image data and how to associate the image data with audio data during recording of audio data will be described below.
And receiving an image acquisition instruction in the recording process of the recording equipment.
In the embodiment of the invention, when a user needs to record, the recording function of the recording equipment can be started, and the recording equipment is adopted to record. In the recording process, the user can perform image acquisition operation when needing to record data of other dimensions, such as image data, for example, printing data, projected images and the like. After the user executes the image acquisition operation, the corresponding recording device can receive the image acquisition instruction corresponding to the image acquisition operation.
In an example of the present invention, a user may execute an image capturing operation in a sound recording device, and correspondingly, the sound recording device may generate an image capturing instruction according to the received image capturing operation executed by the user.
In one example of the present invention, when the audio recording device is connected to another device, the user may also execute the image capturing device in an application program of the other device corresponding to the audio recording device. At this time, an image acquisition instruction can be generated by other equipment according to the image acquisition operation of the user; and then sending the image acquisition instruction to the recording equipment.
And acquiring an image according to the image acquisition instruction.
And then the recording equipment can call the image acquisition module to acquire images according to the image acquisition instruction to obtain image data.
In the recording process, a user can execute a plurality of image acquisition operations, and correspondingly, the recording equipment can receive a plurality of image acquisition instructions. The recording device can acquire an image once when receiving an image acquisition instruction every time to obtain a corresponding image frame.
And correlating and storing the acquired image data and the audio data obtained by recording.
In the embodiment of the invention, in order to facilitate the user to use the recorded data of multiple dimensions at the same time subsequently, after the image data is acquired, the acquired image data and the audio data obtained by recording can be associated and stored in the recording device. The image data and the audio data may be associated based on the time of the acquired image data and the time corresponding to the audio data obtained by recording, which is not limited in this embodiment of the present invention.
In one example, the recording device may associate, after each image frame is acquired, the image frame with an audio frame corresponding to the image frame obtained in the recording process; and further, the correlation between the acquired image data and the audio data obtained by recording is realized. In another example, the recording device may store in an image frame after each image frame is acquired; and after the recording is finished, associating each image frame of the image data with a corresponding audio frame in the audio data obtained by recording.
The manner of associating each image frame with a corresponding audio frame may be as follows: determining a target timestamp corresponding to a target image frame in the image data; determining a target audio frame with the same timestamp as the target timestamp in the audio data; and associating the target image frame with the target audio frame.
If the recording device associates the image frame with the audio frame corresponding to the image frame obtained in the recording process after each image frame is acquired, one image frame acquired each time can be used as a target image frame. If the recording device associates each image frame of the image data with the corresponding audio frame in the recorded audio data after the recording is finished, one image frame can be arbitrarily selected from the image data as a target image frame each time until all the image frames in the image data are associated with the corresponding audio frames in the audio data.
In the embodiment of the invention, aiming at a target image frame, a target time stamp corresponding to the target image frame can be determined, and a target audio frame with the same time stamp as the target time stamp in audio data obtained by recording is obtained; and then associating the target image frame with the target audio frame.
When a user needs to perform voice recognition on target audio data, the target audio data can be subjected to voice recognition by combining with the target image data; therefore, the target audio data is subjected to voice recognition by combining the information associated with the target audio data, so that the accuracy of the voice recognition is improved.
Referring to fig. 5, a flow chart of steps of yet another data processing method embodiment of the present invention is shown.
Step 502, the recording device obtains a target image.
Step 504, performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image.
Step 506, displaying the text information according to the typesetting information of the text information in the target image.
And step 508, acquiring target audio data, wherein the target audio data is associated with the target image, and the target image is acquired by the recording equipment in the process of recording the target audio data.
Step 510, performing speech recognition on the target audio data according to the text information, and determining a corresponding speech recognition result.
In the embodiment of the invention, the target audio data can be subjected to voice recognition in real time in the process of recording the target audio data by the recording equipment. In the process of recording the target audio data, if the recording device acquires a target image acquired by calling an image acquisition module therein, voice recognition can be performed on the target audio data recorded after acquiring the target image according to text information recognized for the target image, and a corresponding voice recognition result is determined.
In the embodiment of the present invention, after the recording device records the target audio data, the target audio data (i.e., the non-real-time target audio data) may be subjected to speech recognition. The method comprises the steps that text information of all target images acquired by an image acquisition module in the process of recording target audio data can be called, voice recognition is carried out on the target audio data, and a corresponding voice recognition result is determined; the embodiments of the present invention are not limited in this regard.
The text information can be utilized in the voice recognition process of the target audio data to improve the accuracy of the voice recognition of the target audio data.
Of course, in the embodiment of the present invention, a transmission instruction for the target audio data and/or the voice recognition result may also be received, and the target audio data and/or the voice recognition result may be transmitted to another device; the embodiments of the present invention are not limited in this regard.
In summary, in the embodiments of the present invention, target audio data may be obtained; then, carrying out voice recognition on the target audio data according to the text information of the target image, and determining a corresponding voice recognition result; the target image is acquired by a recording device in the process of recording target audio data, the target audio data is associated with the target image, and then the target audio data is subjected to voice recognition by combining information associated with the target audio data, so that the accuracy of the voice recognition is improved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
The embodiment of the invention also provides a data processing device which is applied to the recording equipment.
Referring to fig. 6, a block diagram of a data processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
an image acquisition module 602, configured to acquire a target image;
a text recognition module 604, configured to perform text recognition on the target image, determine corresponding text information, and determine layout information of the text information in the target image;
and a display module 606, configured to display the text information according to the layout information of the text information in the target image.
Referring to fig. 7, a block diagram of an alternative embodiment of a data processing apparatus of the present invention is shown.
In an optional embodiment of the present invention, the text information includes a plurality of texts, and the text recognition module 604 is configured to record line information corresponding to each text in the target image and position information of a line where each text is located, respectively; and generating typesetting information of the text information in the target image according to the line information and the position information of each text in the text information.
In an alternative embodiment of the present invention, the display module 606 includes:
a first layout displaying sub-module 6062, configured to perform paragraph division on the text information according to the line information and the position information of each text in the text information; the segmentation shows the text information.
In an alternative embodiment of the present invention, the display module 606 includes:
and a second layout displaying sub-module 6064, configured to control, according to the line information and the position information of each text in the text information, to display each text in the text information in the same layout as the text in the image.
In an optional embodiment of the present invention, the apparatus further comprises:
the translation module 608 is configured to translate the text information to obtain a corresponding translation result and display the translation result.
In an optional embodiment of the present invention, the translation result includes: picture translation results and/or text translation results.
In an optional embodiment of the present invention, the apparatus further comprises:
a data transmission module 610, configured to receive a transmission instruction, where the transmission instruction includes at least one of: sharing instructions, forwarding instructions and unloading instructions; transmitting the data corresponding to the transmission instruction to other equipment; the data corresponding to the transmission instruction comprises at least one of the following data: target image, text information, and translation result.
In an optional embodiment of the present invention, the apparatus further comprises:
a voice recognition module 612, configured to obtain target audio data, where the target audio data is associated with the target image, and the target image is acquired by a recording device in a process of recording the target audio data; and performing voice recognition on the target audio data according to the text information, and determining a corresponding voice recognition result.
In conclusion, in the embodiment of the invention, the text recognition function of the image can be added to the recording device; when a user uses the recording equipment to perform text recognition on the target image, the recording equipment can acquire the target image; then, performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image; displaying the text information according to the typesetting information of the text information in the target image; the layout of the displayed text information can be the same as or similar to the layout of the text information in the target image, so that the user can read and understand the information in the image conveniently; thereby improving the user experience.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Fig. 8 is a block diagram illustrating a configuration of a sound recording apparatus 800 for data processing according to an exemplary embodiment. For example, recording device 800 may be a recording pen, a translation pen, a translator, and the like.
Referring to fig. 8, sound recording apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
Processing component 802 generally controls the overall operation of sound recording device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
Memory 804 is configured to store various types of data to support operation at audio recording device 800. Examples of such data include instructions for any application or method operating on sound recording device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power component 806 provides power to the various components of recording device 800. Power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for audio recording device 800.
The multimedia component 808 includes a screen that provides an output interface between the sound recording device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the recording apparatus 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, audio component 810 includes a Microphone (MIC) configured to receive external audio signals when recording apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
Sensor assembly 814 includes one or more sensors for providing various aspects of status assessment for recording device 800. For example, sensor assembly 814 may detect the on/off status of recording device 800, the relative positioning of components, such as a display and keypad of recording device 800, sensor assembly 814 may also detect a change in the position of recording device 800 or a component of recording device 800, the presence or absence of user contact with recording device 800, the orientation or acceleration/deceleration of recording device 800, and a change in the temperature of recording device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
Communication component 816 is configured to facilitate communication between recording device 800 and other devices in a wired or wireless manner. Recording device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 814 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 814 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the sound recording apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as memory 804 comprising instructions, executable by processor 820 of sound recording device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of an audio recording device, enable the audio recording device to perform a data processing method, the method comprising: the recording equipment acquires a target image; performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image; and displaying the text information according to the typesetting information of the text information in the target image.
Optionally, the determining layout information of the text information in the target image includes: respectively recording corresponding line information of each text in the target image and position information of a line where each text is located; and generating typesetting information of the text information in the target image according to the line information and the position information of each text in the text information.
Optionally, the displaying the text information according to the layout information of the text information in the target image includes: according to the line information and the position information of each text in the text information, paragraph division is carried out on the text information; the segmentation shows the text information.
Optionally, the displaying the text information according to the layout information of the text information in the target image includes: and controlling each text in the text information to be displayed according to the same typesetting as the text in the image according to the line information and the position information of each text in the text information.
Optionally, the method further comprises: and translating the text information to obtain a corresponding translation result and displaying the translation result.
Optionally, the translation result includes: picture translation results and/or text translation results.
Optionally, the method further comprises: receiving a transmission instruction, wherein the transmission instruction comprises at least one of the following: sharing instructions, forwarding instructions and unloading instructions; transmitting the data corresponding to the transmission instruction to other equipment; the data corresponding to the transmission instruction comprises at least one of the following data: target image, text information, and translation result.
Optionally, the method further comprises: acquiring target audio data, wherein the target audio data is associated with the target image, and the target image is acquired by a recording device in the process of recording the target audio data; and performing voice recognition on the target audio data according to the text information, and determining a corresponding voice recognition result.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The data processing method, the data processing apparatus and the recording device provided by the present invention are introduced in detail, and a specific example is applied in the text to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A data processing method, comprising:
the recording equipment acquires a target image;
performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image;
and displaying the text information according to the typesetting information of the text information in the target image.
2. The method according to claim 1, wherein the text information includes a plurality of texts, and the determining layout information of the text information in the target image includes:
respectively recording corresponding line information of each text in the text information in the target image and position information of a line where each text is located;
and generating typesetting information of the text information in the target image according to the line information and the position information of each text in the text information.
3. The method according to claim 2, wherein said presenting the text information according to the layout information of the text information in the object image comprises:
according to the line information and the position information of each text in the text information, paragraph division is carried out on the text information;
the segmentation shows the text information.
4. The method according to claim 2, wherein said presenting the text information according to the layout information of the text information in the object image comprises:
and controlling each text in the text information to be displayed according to the same typesetting as the text in the image according to the line information and the position information of each text in the text information.
5. The method of claim 1, further comprising:
and translating the text information to obtain a corresponding translation result and displaying the translation result.
6. The method of claim 5, wherein the translation results comprise: picture translation results and/or text translation results.
7. The method of claim 5, further comprising:
receiving a transmission instruction, wherein the transmission instruction comprises at least one of the following: sharing instructions, forwarding instructions and unloading instructions;
transmitting the data corresponding to the transmission instruction to other equipment; the data corresponding to the transmission instruction comprises at least one of the following data: target image, text information, and translation result.
8. A data processing apparatus, applied to a recording device, includes:
the image acquisition module is used for acquiring a target image;
the text recognition module is used for performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image;
and the display module is used for displaying the text information according to the typesetting information of the text information in the target image.
9. An audio recording apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
acquiring a target image;
performing text recognition on the target image, determining corresponding text information and determining typesetting information of the text information in the target image;
and displaying the text information according to the typesetting information of the text information in the target image.
10. A readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an audio recording device, enable the audio recording device to perform the data processing method of any one of method claims 1-7.
CN202010988800.4A 2020-09-18 2020-09-18 Data processing method and device and recording equipment Pending CN112115696A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010988800.4A CN112115696A (en) 2020-09-18 2020-09-18 Data processing method and device and recording equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010988800.4A CN112115696A (en) 2020-09-18 2020-09-18 Data processing method and device and recording equipment

Publications (1)

Publication Number Publication Date
CN112115696A true CN112115696A (en) 2020-12-22

Family

ID=73799884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010988800.4A Pending CN112115696A (en) 2020-09-18 2020-09-18 Data processing method and device and recording equipment

Country Status (1)

Country Link
CN (1) CN112115696A (en)

Similar Documents

Publication Publication Date Title
CN108259991B (en) Video processing method and device
CN109413478B (en) Video editing method and device, electronic equipment and storage medium
EP3147802B1 (en) Method and apparatus for processing information
CN106919629B (en) Method and device for realizing information screening in group chat
CN107423386B (en) Method and device for generating electronic card
CN105426094B (en) Information pasting method and device
CN106534951B (en) Video segmentation method and device
CN109063101B (en) Video cover generation method and device
CN108320208B (en) Vehicle recommendation method and device
CN107239351B (en) Pasting method and device
EP3168755A1 (en) Method and device for adding font
CN112543368A (en) Video processing method, video playing method, video processing device, video playing device and storage medium
CN108174269B (en) Visual audio playing method and device
CN112532931A (en) Video processing method and device and electronic equipment
CN110085066B (en) Method and device for displaying reading information and electronic equipment
CN111797262A (en) Poetry generation method and device, electronic equipment and storage medium
CN106331328B (en) Information prompting method and device
CN111061452A (en) Voice control method and device of user interface
CN109756783B (en) Poster generation method and device
CN107943317B (en) Input method and device
CN111629270A (en) Candidate item determination method and device and machine-readable medium
CN112087653A (en) Data processing method and device and electronic equipment
CN105260088B (en) Information classification display processing method and device
CN112035691A (en) Method, device, equipment and medium for displaying cell labeling data of slice image
CN111832455A (en) Method, device, storage medium and electronic equipment for acquiring content image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination