CN117935273A

CN117935273A - Real-time text recognition and interaction method based on augmented reality

Info

Publication number: CN117935273A
Application number: CN202311837498.2A
Authority: CN
Inventors: 章惠龙; 郭磊; 王乐
Original assignee: Beijing Longyao Vision Technology Co ltd
Current assignee: Beijing Longyao Vision Technology Co ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-04-26

Abstract

The invention discloses a real-time text recognition and interaction method and system based on augmented reality, and relates to the technical field of optical character recognition, wherein the method comprises the following steps: the front end acquires a text image in a real environment, and a target text region is extracted from the text image; the rear end detects a text region from the target text region by using a first preset model, character recognition is carried out on the text region by using a second preset model, text content is obtained, and the text content is sent to the front end; the front end performs visual display on the text content in an AR environment. The method utilizes OCR technology and AR technology to realize the functions of real-time recognition and text interaction in a real environment. Through cooperation of the front end and the rear end, text content can be detected and identified rapidly and accurately, visual display is carried out in an AR environment, rich interaction instructions are provided, and a user can interact with the text content flexibly.

Description

Real-time text recognition and interaction method based on augmented reality

Technical Field

The invention relates to the technical field of optical character recognition, in particular to a real-time text recognition and interaction method based on augmented reality.

Background

Augmented Reality (AR) technology is a technology that combines virtual information with the real world. In recent years, as the AR technology is continuously developed, its application in various fields is becoming wider and wider. However, implementing real-time text recognition and interaction functions in an AR environment remains a technical challenge. Traditional Optical Character Recognition (OCR) systems focus mainly on recognizing text from static images, ignoring the dynamic and real-time requirements in AR environments. Therefore, combining OCR technology with AR technology to provide real-time text recognition and interaction functions is an urgent problem in the current technical field. In existing AR applications, most are still focused on the enhancement of images and video, while interactions with text are ignored. This prevents users from acquiring, recognizing and understanding text information in real-time in an AR environment, limiting the application scope of AR technology. Therefore, the method capable of identifying and interacting the text in real time is developed, and has important significance for expanding the application field of the AR technology and improving the user experience.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a real-time text recognition and interaction method based on augmented reality.

In one aspect, a real-time text recognition and interaction method based on augmented reality includes:

The front end acquires a text image in a real environment, and a target text region is extracted from the text image;

the rear end detects a text region from the target text region by using a first preset model, character recognition is carried out on the text region by using a second preset model, text content is obtained, and the text content is sent to the front end;

the front end performs visual display on the text content in an AR environment.

Preferably, the front end is constructed based on the a-Frame technology.

Preferably, marking a text region in the text image includes:

Extracting a target text region from the text image, comprising:

identifying a text region from the text image, and highlighting the text region;

Acquiring a region selection instruction of a user;

And determining a target text region from the text regions according to the region selection instruction.

Preferably, the first preset model is DBNet deep learning model, and the second preset model is CRNN model.

Preferably, the text content is sent to the front end through a transmission method, and the method further comprises the following steps: and formatting the text content according to a preset format.

Preferably, the front end performs visual display on the text content in an AR environment, including:

acquiring a first text interaction instruction of a user;

executing the first text interaction instruction in an AR environment;

wherein the first text interaction instruction includes adding an access link and translating text.

Preferably, the front end performs visual display on the text content in an AR environment, and further includes:

Acquiring a second text interaction instruction of a user;

rendering the text according to the second text interaction instruction, and updating the visual display effect of the text content in the AR environment;

The first text interaction instruction comprises one or more of zooming in, zooming out, rotating, splitting and assembling.

On the other hand, the real-time text recognition and interaction system based on augmented reality comprises a front end and a back end;

the front end is used for acquiring a text image in a real environment, acquiring a target text area from the text image and sending the target text area to the rear end;

the rear end is used for detecting a text area from the target text area by using a first preset model, carrying out character recognition on the text area by using a second preset model, obtaining text content, and sending the text content to the front end;

the front end is also used for visually displaying the text content in an AR environment.

The beneficial effects of the invention are as follows: the invention provides a real-time text recognition and interaction method and system based on augmented reality, which realize the functions of real-time text recognition and interaction in a real environment by utilizing an OCR technology and an AR technology. Through cooperation of the front end and the rear end, text content can be detected and identified rapidly and accurately, visual display is carried out in an AR environment, rich interaction instructions are provided, and a user can interact with the text content flexibly.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. Like elements or portions are generally identified by like reference numerals throughout the several figures. In the drawings, elements or portions thereof are not necessarily drawn to scale.

FIG. 1 is a flowchart of a real-time text recognition and interaction method based on augmented reality according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a real-time text recognition and interaction system based on augmented reality according to an embodiment of the present invention.

Detailed Description

Embodiments of the technical scheme of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and thus are merely examples, and are not intended to limit the scope of the present invention.

It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.

Example 1

As shown in fig. 1, an embodiment of the present invention provides a real-time text recognition and interaction method based on augmented reality, which includes:

And step 1, acquiring a text image in a real environment by the front end, and extracting a target text region from the text image.

In an embodiment of the invention, the front end is built based on the A-Frame technology. Of course, other WEB technologies may be selected to construct the front end, which is not limited in the embodiment of the present invention.

In an embodiment of the present invention, marking a text region in the text image includes: extracting a target text region from the text image, comprising: identifying a text region from the text image, and highlighting the text region; acquiring a region selection instruction of a user; and determining a target text region from the text regions according to the region selection instruction.

The front end provides an AR function, can identify text areas in the real world, and highlights the text areas on a user interface in real time, so that a user can quickly locate and pay attention to specific text areas in a given text image according to own needs, and the operation is simple and convenient.

And 2, detecting a text region from the target text region by the rear end through a first preset model, carrying out character recognition on the text region through a second preset model, acquiring text content, and sending the text content to the front end.

In the embodiment of the invention, the first preset model is DBNet deep learning model, DBNet is a deep learning model specially designed for text detection, image features are extracted through a Convolutional Neural Network (CNN), and then prediction and positioning of a text region are performed by combining with the RNN. The second preset model is a CRNN model, and the CRNN is a deep learning model for sequence identification and generation, and comprises a convolution layer, a circulation layer and a transcription layer, so that the serialization data can be effectively processed. Through the CRNN model, the rear end can accurately identify characters and acquire text contents. . Of course, other text region recognition models and character recognition models may be selected, and are not limited in the embodiment of the present invention.

In the embodiment of the present invention, the text content is sent to the front end through a method, which further includes: and formatting the text content according to a preset format.

To better present the text content in the AR environment, the backend formats the identified text content in a pre-set format (e.g., font, color, size, etc.). The formatted text content is sent to the front end again for display.

And 3, the front end performs visual display on the text content in an AR environment.

In the embodiment of the invention, the front end performs visual display on the text content in an AR environment, and the method comprises the following steps: acquiring a first text interaction instruction of a user; executing the first text interaction instruction in an AR environment; wherein the first text interaction instruction includes adding an access link and translating text.

In the embodiment of the invention, the front end performs visual display on the text content in the AR environment, and the method further comprises the following steps: acquiring a second text interaction instruction of a user; rendering the text according to the second text interaction instruction, and updating the visual display effect of the text content in the AR environment; the first text interaction instruction comprises one or more of zooming in, zooming out, rotating, splitting and assembling.

The user may issue a first text interaction instruction, such as an operation instruction to add an access link and translate text, through a gesture, an interface control, or other interaction means. The front end will perform corresponding operations in the AR environment, such as adding an access link or performing text translation, etc., according to the user's first text interaction instruction.

The interaction mode provided by the embodiment enables the user to acquire more information more conveniently, and cross-cultural exchange and understanding of text contents are deepened.

Illustratively, the scaling operation: the user may zoom in or out on the text via gestures or interface controls to view a particular portion of the text in more detail. Rotation viewing: the rotation function is provided, so that a user can observe the text from different angles, and the naturalness and intuitiveness of interaction are improved. Text splitting and assembling: allowing the user to split and reassemble the text to explore different portions of the text or related content. Touch and gesture control: advanced touch and gesture recognition techniques are utilized to enable users to interact directly with text, providing intuitive operational experience.

The user can easily perform interactive operation on the text in the AR environment, and the functions of enlarging, reducing, rotating, splitting, assembling and the like are realized, so that visual display and exploration of text contents are performed more intuitively and flexibly.

The embodiment of the invention provides a real-time text recognition and interaction method based on augmented reality, which comprises the following steps: the front end acquires a text image in a real environment, and a target text region is extracted from the text image; the rear end detects a text region from the target text region by using a first preset model, character recognition is carried out on the text region by using a second preset model, text content is obtained, and the text content is sent to the front end; the front end performs visual display on the text content in an AR environment. The method utilizes OCR technology and AR technology to realize the functions of real-time recognition and text interaction in a real environment. Through cooperation of the front end and the rear end, text content can be detected and identified rapidly and accurately, visual display is carried out in an AR environment, rich interaction instructions are provided, and a user can interact with the text content flexibly.

Example two

As shown in fig. 2, an embodiment of the present invention provides a real-time text recognition and interaction system based on augmented reality, which includes a front end and a back end; the front end is used for acquiring a text image in a real environment, acquiring a target text area from the text image and sending the target text area to the rear end; the rear end is used for detecting a text area from the target text area by using a first preset model, carrying out character recognition on the text area by using a second preset model, obtaining text content, and sending the text content to the front end; the front end is also used for visually displaying the text content in an AR environment.

It should be understood that, as shown in fig. 2 provided by the embodiment of the present invention, a real-time text recognition and interaction system based on augmented reality provided by the embodiment of the present invention and a real-time text recognition and interaction method based on augmented reality provided by the foregoing embodiment of the present invention are for the same inventive concept, and reference may be made to the foregoing embodiment for more specific working principles of each module in the embodiment of the present invention, which is not repeated in the embodiment of the present invention.

It should be understood that, for the same inventive concept, the reliable communication system based on the accumulated value provided in the embodiments of the present invention and the reliable communication method based on the accumulated value provided in the foregoing embodiments, reference may be made to the foregoing embodiments for more detailed working principles of each module and unit in the embodiments of the present invention, which are not repeated in the embodiments of the present invention.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

Claims

1. A real-time text recognition and interaction method based on augmented reality, comprising:

the front end performs visual display on the text content in an AR environment.

2. The augmented reality-based real-time text recognition and interaction method of claim 1, wherein the front end is constructed based on an a-Frame technology.

3. The augmented reality-based real-time text recognition and interaction method of claim 1, wherein extracting a target text region from the text image comprises:

Acquiring a region selection instruction of a user;

4. The augmented reality-based real-time text recognition and interaction method according to claim 1, wherein the first preset model is DBNet deep learning model and the second preset model is CRNN model.

5. The augmented reality-based real-time text recognition and interaction method of claim 1, wherein the text content is sent to a front end by sending it to a front end, further comprising: and formatting the text content according to a preset format.

6. The augmented reality-based real-time text recognition and interaction method of claim 1, wherein the front-end visually presents the text content in an AR environment, comprising:

acquiring a first text interaction instruction of a user;

executing the first text interaction instruction in an AR environment;

7. The augmented reality-based real-time text recognition and interaction method of claim 6, wherein the front-end visually presents the text content in an AR environment, further comprising:

Acquiring a second text interaction instruction of a user;

8. The real-time text recognition and interaction system based on augmented reality is characterized by comprising a front end and a rear end;