CN113889114A

CN113889114A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN113889114A
Application number: CN202010627737.1A
Authority: CN
Inventors: 武卫
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2022-01-04

Abstract

The embodiment of the disclosure discloses a data processing method, a data processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring voice data in an acquisition area; processing the voice data to obtain virtual information, wherein the virtual information comprises text contents corresponding to the voice data; and outputting the virtual information to at least one display device in the user area so as to display the virtual information on the display device, so that the user can view the virtual information in the sight line range of the user when viewing the acquisition area through the display device. According to the technical scheme, the virtual display technology and the voice recognition technology are fused, so that the technical problem that some special scenes and/or hearing-impaired people cannot accurately receive voice information when watching the field information such as performances and the like is solved.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

In some artistic scenes, the requirement that a hearing-impaired person watches the performance is met by increasing the lines of the actors displayed on the display screen, but in the scheme, manual operation needs to be carried out on each line of lines, the efficiency is not high, and the situation that some audiences cannot clearly see the captions on the line of lines exists, and in the mode, only the content of the lines can be presented, and diversified information cannot be presented on the basis.

Disclosure of Invention

The embodiment of the disclosure provides a data processing method and device, electronic equipment and a computer-readable storage medium.

In a first aspect, an embodiment of the present disclosure provides a data processing method, including:

acquiring voice data in an acquisition area;

processing the voice data to obtain virtual information, wherein the virtual information comprises text contents corresponding to the voice data;

and outputting the virtual information to at least one display device in the user area so as to display the virtual information on the display device, so that the user can view the virtual information in the sight line range of the user when viewing the acquisition area through the display device.

Further, processing the voice data to obtain virtual information includes:

preprocessing the voice data;

recognizing the preprocessed voice data by using an acoustic model to obtain corresponding candidate content;

and performing semantic processing on the candidate content by utilizing a semantic model to obtain the character content.

Further, after processing the voice data to obtain the virtual information, the method further includes:

and translating the text content into target content corresponding to the target language associated with the display equipment.

Further, the display device comprises an AR display device.

In a second aspect, an embodiment of the present disclosure provides a data processing method, which is performed on a display device, and includes:

acquiring virtual information; the virtual information comprises text contents obtained by identifying voice data collected in the collection area;

and displaying the virtual information so that the user can see the virtual information in the sight range of the user when watching the acquisition area through the display equipment.

Further, displaying the virtual information includes:

acquiring image data in the acquisition area;

acquiring a display image by superimposing the virtual information on the image data;

and displaying the display image.

Further, the display device includes a transparent display unit that displays the virtual information, including:

and displaying the text content on the transparent display unit, so that the text content is displayed on the information viewed by the user through the transparent display unit in an overlapping manner.

Further, the method further comprises:

receiving a target language configured by a user;

before displaying the virtual information, the method further comprises the following steps:

and when the text content in the virtual information is not matched with the target language, translating the text content in the virtual information into the target content corresponding to the target language.

In a third aspect, an embodiment of the present disclosure provides a data processing method, where the method is performed on a display device, where the display device includes a voice acquisition unit and a display unit, and includes:

acquiring voice data acquired by the voice acquisition unit;

outputting the virtual information to the display unit to display the virtual information on the display unit.

Further, processing the voice data to obtain virtual information includes:

preprocessing the voice data;

Further, the display apparatus includes an image acquisition unit that outputs the virtual information to the display unit to display the virtual information on the display unit, including:

acquiring image data acquired by the image acquisition unit;

and outputting the display image to the display unit for displaying.

Further, the display unit includes a transparent display unit to which the virtual information is output to display the virtual information on the display unit, including:

and outputting the text content to the transparent display unit for displaying, so that the text content is displayed on the information viewed by the user through the transparent display unit in an overlapping manner.

Further, the method further comprises:

receiving a target language configured by a user;

before outputting the virtual information to the display unit, the method further includes:

In a fourth aspect, an embodiment of the present invention provides a data processing apparatus, including:

the first acquisition module is configured to acquire voice data in an acquisition area;

the first processing module is configured to process the voice data to obtain virtual information, and the virtual information comprises text content corresponding to the voice data;

the first output module is configured to output the virtual information to at least one display device in a user area so as to display the virtual information on the display device, so that a user can view the virtual information in a user sight range when watching the acquisition area through the display device.

In a fifth aspect, an embodiment of the present invention provides a data processing apparatus, where the apparatus is located on a display device, and the apparatus includes:

a second acquisition module configured to acquire virtual information; the virtual information comprises text contents obtained by identifying voice data collected in the collection area;

the display module is configured to display the virtual information so that the user can see the virtual information in a user sight range when watching the acquisition area through the display device.

In a sixth aspect, an embodiment of the present invention provides a data processing apparatus, where the apparatus is located in a display device, the display device includes a voice acquisition unit and a display unit, and the apparatus includes:

the third acquisition module is configured to acquire the voice data acquired by the voice acquisition unit;

the second processing module is configured to process the voice data to obtain virtual information, and the virtual information comprises text contents corresponding to the voice data;

a second output module configured to output the virtual information to the display unit to display the virtual information on the display unit.

The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.

In one possible design, the apparatus includes a memory configured to store one or more computer instructions that enable the apparatus to perform the corresponding method, and a processor configured to execute the computer instructions stored in the memory. The apparatus may also include a communication interface for the apparatus to communicate with other devices or a communication network.

In a seventh aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any of the above aspects.

In an eighth aspect, the present disclosure provides a computer-readable storage medium for storing computer instructions for use by any one of the above apparatuses, which includes computer instructions for performing the method according to any one of the above aspects.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

through the embodiment of the disclosure, the voice of the target object in the acquisition area can be converted into the text content in real time, and the virtual information including the text content is sent to the display device to be displayed to the user, so that the user can watch the text content corresponding to the voice data in the acquisition area through the display device while watching the acquisition area. The embodiment combines the virtual display technology and the voice recognition technology, so that the technical problem that some special scenes and/or hearing-impaired people cannot accurately receive voice information when watching the field information such as performances and the like is solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of a data processing method according to another embodiment of the present disclosure;

FIG. 3 shows a flow diagram of a data processing method according to yet another embodiment of the present disclosure;

fig. 4 shows a schematic flow chart of an application in a stage performance scene according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device suitable for implementing a data processing method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.

It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The details of the embodiments of the present disclosure are described in detail below with reference to specific embodiments.

Fig. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the data processing method includes the steps of:

in step S101, acquiring voice data in an acquisition area;

in step S102, processing the voice data to obtain virtual information, where the virtual information includes text content corresponding to the voice data;

in step S103, the virtual information is output to at least one display device in the user area, so as to display the virtual information on the display device, so that the user can view the virtual information in the user sight range when viewing the collection area through the display device.

In this embodiment, the data processing method may be implemented on a processor, and the processor may be located on a server or other processing device different from the display device. The capture area and the user area may be predetermined, for example, the capture area may be an area where a target object outputting a voice is located, and the user area may be an area where an object receiving a voice is located. For example, in an artistic application scenario, the capture area may be a stage area where the performer is located, and the user area may be an area where the audience is located. The voice acquisition device can be arranged in the acquisition area, such as a microphone and the like, outputs the voice data acquired in real time to the processor, and the processor converts the voice data into corresponding text contents after processing the voice data in real time and sends virtual information comprising the text contents to the display device in the user area.

In some embodiments, the virtual information may also include display information of the text content and other related information, and the display information may include, for example, information of display position, display mode, display format, and the like. The virtual information may also include synchronization information, such as the occurrence time of the voice data corresponding to the text content.

In this embodiment, one or more display devices may be included in the user area for use by one or more users. The display device, after receiving the virtual information, may display the virtual information within a line of sight of the user while the user views the collection area through the display device. In some embodiments, the display device may be an AR display device. For example, the display device may be AR glasses, and when the user wears the AR glasses to watch the collection area, the user may display the virtual information on the glasses, so that the user can see the corresponding virtual information while seeing the real picture in the collection area through the AR glasses, for example, when the audience watches stage performance through the AR glasses, the audience can see the real performance picture on the stage through the AR glasses, and simultaneously can see the corresponding virtual information such as subtitles.

In an optional implementation manner of this embodiment, step S101, namely the step of acquiring the voice data in the collection area, further includes the following steps:

and acquiring voice data acquired by the voice acquisition equipment in real time from the voice acquisition equipment arranged in the acquisition area.

In this optional implementation manner, a voice acquisition device may be disposed in the acquisition area, and is configured to acquire voice data in the acquisition area in real time. The voice acquisition device may be, for example, a 360-degree microphone array, and the voice acquisition device may perform preprocessing such as amplification on the acquired voice data after acquiring the voice data, and then output the voice data to the server side. It is to be understood that the server may be a local computer device capable of processing voice data, or may be a remote server device, which may be specifically set according to actual needs, and is not limited herein.

In an optional implementation manner of this embodiment, step S102, namely, the step of processing the voice data to obtain the virtual information, further includes the following steps:

preprocessing the voice data;

In this optional implementation manner, preprocessing such as noise reduction and filtering may be performed on the voice data, and the audio content of the target object in the acquisition region is extracted from the voice data. The target object can be any object which emits voice in the acquisition area, or one or more objects, the audio content of the target object can be extracted from the preprocessed voice data through functions such as audio recognition, and then the voice recognition is carried out on the audio content by using an acoustic model to obtain corresponding candidate content. And finally, performing context semantic processing on the candidate contents by using a semantic model, and outputting character contents conforming to semantic logic, wherein the semantic model can be obtained by training lines commonly used in the performance scene and a script. The acoustic model and the semantic model may employ models already implemented in the related art, and are not limited herein.

In some embodiments, the acquisition area may further include an image acquisition device configured to acquire image data synchronized with the voice data, the image data and synchronization information synchronized with the voice data may be sent to the server, and the server may identify whether the voice data is sent by a target object in the image data, and perform corresponding preprocessing according to an identification result. For example, in an application scene of watching a stage play, if it is recognized from the image data synchronized with the voice data that the current sound is not the sound made by the actor but is made by the surrounding noise or other people, the server side may filter out the sound and only retain the sound made in the actor's mouth on the stage. It is understood that, in an application scene such as a stage drama, the voice of the actor will be more prominent than the surrounding noise, so that the voice data can be filtered by a common filter, and the voice of the actor is retained, which can be specifically set according to the actual needs, and is not limited herein.

In other embodiments, the server may further identify sound directivity according to the image data synchronized with the voice data, that is, which direction the sound emitted by the target object is directed to, and then may perform corresponding processing based on the sound directivity. In the case where the user area corresponds to a plurality of display apparatuses, the server may transmit the processed virtual information to the display apparatus to which the sound is directed according to the sound directivity without transmitting the processed virtual information to other display apparatuses. For example, in a performance scene, an actor wants to interact with the audience under the table and make sounds facing the audiences in different areas at different times, and the server can transmit virtual information to the display devices of the audiences in the pointed areas according to the current sound directivity of the actor.

In an application scene of watching a stage performance, bullet screen information surrounding actors can be displayed on the display device. The user may add the barrage through voice control or a setup interface on the display device, or may add the barrage through a user device interacting with the display device. Display device can also upload the barrage that the user added to the server side, and the server side can share the barrage information on other display device.

In an optional implementation manner of this embodiment, after step S102, that is, after the step of processing the voice data to obtain the virtual information, the method further includes the following steps:

In this optional implementation, the text content obtained by the speech data recognition may be translated into the target content corresponding to the target language. The target language may be a language type associated with the display device, such as chinese, english, and the like. The display device can comprise a plurality of display devices, different target languages can be associated with different display devices, and after the text content corresponding to the voice data is determined, the text content can be automatically translated into the target content corresponding to the associated target language, and then the virtual information comprising the target content is output to the corresponding display device. By the method, the problem that people with special scenes or hearing disorders cannot effectively receive the voice information output by the target object in the acquisition area can be solved, and the problem of language difference between the target object and a user of the display equipment can be solved. For example, for artistic performance scenes, the dramas of different languages can be brought to the audiences all over the world to be watched in the mode, and the threshold of cultural appreciation is greatly reduced.

Fig. 2 shows a flow diagram of a data processing method according to another embodiment of the present disclosure. As shown in fig. 2, the data processing method includes the steps of:

in step S201, virtual information is acquired; the virtual information comprises text contents obtained by identifying voice data collected in the collection area;

in step S202, the virtual information is displayed so that the user can view the virtual information in the user' S sight line when viewing the acquisition area through the display device.

In this embodiment, the data processing method may be implemented on a display device, for example, an AR display device. The capture area and the user area may be predetermined, for example, the capture area may be an area where a target object outputting a voice is located, and the user area may be an area where an object receiving a voice is located. For example, in an artistic application scenario, the capture area may be a stage area where the performer is located, and the user area may be an area where the audience is located. The voice acquisition device such as a microphone can be arranged in the acquisition area, the voice acquisition device outputs voice data acquired in real time to corresponding equipment, a processor on the corresponding equipment converts the voice data into corresponding text contents after processing the voice data in real time, and virtual information including the text contents is sent to display equipment in the user area.

In some embodiments, the display device may be an AR display device, such as AR glasses.

The user may view objects, such as people, objects, scenes, etc., within the capture area through the AR display device. The AR display device can be provided with a display unit, and after the virtual information is acquired, the virtual information can be displayed on the AR display device, so that the virtual information can be displayed in a user sight range in an overlapping mode when a user watches the object in the acquisition area through the AR display device. For example, when a user in the audience area wears AR glasses to watch a stage performance, the voice of the stage performance can be displayed in a superimposed manner in a stage scene watched by the user through the AR glasses.

In some embodiments, the virtual information may also include display information of the text content and other related information, such as display position, display mode, display format, and the like. The display device can display the text content at a proper position according to the display information and other related information in the virtual information, so that the user does not shield the viewing line when watching the object in the acquisition area, and the display device can also adjust the size of the displayed text after calculation by detecting the visual angle, the binocular distance and the like of the user, thereby avoiding the user from experiencing problems of caption defocusing or dizziness and the like.

For other details in the embodiments of the present disclosure, reference may also be made to the description of the data processing method in fig. 1 and the related embodiments, which are not described herein again.

The embodiment of the disclosure combines the virtual display technology and the voice recognition technology, thereby solving the technical problem that some special scenes and/or hearing-impaired people and the like cannot accurately receive voice information when watching the field information such as performance and the like.

In an optional implementation manner of this embodiment, step S202, namely the step of displaying the virtual information, further includes the following steps:

acquiring image data in the acquisition area;

and displaying on the display unit according to the display image.

In this optional implementation, the display device further includes an image acquisition unit, such as a monocular camera or a monocular camera. The image data acquired in real time from the acquisition region may be a two-dimensional image or a three-dimensional image. After receiving the virtual information, the display device may perform computer rendering on the text content and the image data in the virtual information to obtain a display image, so as to display the text content on the image data in an overlapping manner, and display the display image on the display unit. It will be appreciated that other virtual information may also be superimposed on the image data if desired.

In an optional implementation manner of this embodiment, the display device includes a transparent display unit, and step S202, that is, the step of displaying the virtual information, further includes the following steps:

In this alternative implementation, the display device may be an AR display device, such as AR glasses. The display unit on the display device may be a transparent display unit, and when the user wears the display device, the transparent display unit may be located at an eye portion so that the user can view information in an environment through the transparent unit. The transparent display unit also has a display function, and can display virtual information, such as text content, so that when a user watches the information in the acquisition area through the transparent display unit, the text content can be displayed on the watched information in an overlapping manner. Because the natural reflected light in the surrounding environment can normally pass through the transparent display unit, the user can view the surrounding environment and things through the transparent display unit without influencing the sight of the wearer. By the method, the information in the acquisition area watched by the user is real, and meanwhile, the text content corresponding to the voice data sent by the object in the acquisition area can be seen without any intervention.

In an optional implementation manner of this embodiment, the method further includes the following steps:

receiving a target language configured by a user;

In this optional implementation manner, the user may configure a target language for the display device through the client, and when the text content in the received virtual information is not matched with the target language configured by the user, the text content in the virtual information may be translated into target content corresponding to the target language, and then displayed on the display device. In this way, the display device may be adapted to target users using any language.

In some embodiments, a user may set a language category on a display device or a user device interacting with the display device, such as a mobile phone, that is, multiple selectable language categories may be preset on the display device, the user may select a corresponding category according to a language familiar to the user, and the display device automatically translates text content into the language category selected by the user when the text content in the received virtual information is inconsistent with the language category selected by the user.

In other embodiments, the user may also wear a headset, and the virtual information is displayed on the display device while the corresponding voice data is played on the headset. The user may also set language categories on the headset, and when the received voice data is inconsistent with the selected language category, the voice data may be automatically translated into the language category selected by the user. Of course, it is understood that the process of automatic translation may be completed at the server side, after the user selects the configured language category through the headset, the headset sends the configured language category to the server, and the server may translate the voice data into the voice corresponding to the configured language category and send the voice to the headset during the use process.

Fig. 3 shows a flow chart of a data processing method according to yet another embodiment of the present disclosure. As shown in fig. 2, the data processing method includes the steps of:

in step S301, acquiring voice data acquired by the voice acquisition unit;

in step S302, processing the voice data to obtain virtual information, where the virtual information includes text content corresponding to the voice data;

in step S303, the virtual information is output to the display unit to display the virtual information on the display unit.

In this embodiment, the data processing method may be implemented on a display device, for example, an AR display device. The display device may include a voice acquisition unit and a display unit. The speech acquisition unit may be, for example, a microphone array. The display device may be glasses, and the display unit may be disposed on the glasses. In the using process of the display device, the voice acquisition unit can acquire voice data in the surrounding environment in real time and output the acquired voice data to the processing unit on the display device, the processing unit processes the voice data to obtain virtual information, the virtual information can include text content corresponding to the voice data, and the virtual information is output to the display unit and then displayed. The capture area and the user area may be predetermined, for example, the capture area may be an area where a target object outputting a voice is located, and the user area may be an area where an object receiving a voice is located.

A user may view information in the environment, such as people, objects, scenes, etc., through a display unit on the display device. When a user views information in the environment through the display unit, the virtual information can be displayed in a superimposed mode within the sight range of the user. For example, when a user wears glasses to watch a stage performance, the voice of the stage performer can be superimposed and displayed in a stage scene watched by the user through the glasses.

Through the embodiment of the disclosure, the collected voice can be converted into the text content in real time by using the display equipment, and the virtual information including the text content is displayed to the user, so that the user can watch the text content corresponding to the voice data sent by the object in the environment through the display equipment while watching the information in the environment. The embodiment combines the virtual display technology and the voice recognition technology, so that the technical problem that some special scenes and/or hearing-impaired people cannot accurately receive voice information when watching the field information such as performances and the like is solved.

In an optional implementation manner of this embodiment, step S302, namely, the step of processing the voice data to obtain the virtual information, further includes the following steps:

preprocessing the voice data;

In an optional implementation manner of this embodiment, the display device includes an image capturing unit, and step S302 is to output the virtual information to the display unit to display the virtual information on the display unit, and further includes the following steps:

acquiring image data acquired by the image acquisition unit in real time;

and outputting the display image to the display unit for displaying.

In this optional implementation, the display device further includes an image acquisition unit, such as a monocular camera or a monocular camera. The image acquisition unit is used for acquiring image data in the environment in real time, and the image data can be a two-dimensional image or a three-dimensional image. After the processing unit obtains the corresponding virtual information by using the voice data acquired by the voice acquisition unit, the processing unit can perform computer rendering on the text content and the image data in the virtual information to obtain a display image, so that the text content is displayed on the image data in an overlapping manner, and the display image is displayed on the display unit. It will be appreciated that other virtual information may also be superimposed on the image data if desired.

In an optional implementation manner of this embodiment, the display unit includes a transparent display unit, and step S302, that is, the step of outputting the virtual information to the display unit to display the virtual information on the display unit further includes the following steps:

In this optional implementation, the display unit on the display device may be a transparent display unit, and when the user wears the display device, the transparent display unit may be located at an eye position, so that the user can view information in an environment through the transparent unit. The transparent display unit also has a display function, and can display virtual information, such as text content, so that when a user watches the information in the acquisition area through the transparent display unit, the text content can be displayed on the watched information in an overlapping manner. Because the natural reflected light in the surrounding environment can normally pass through the transparent display unit, the user can view the surrounding environment and things through the transparent display unit without influencing the sight of the wearer. By the method, the information in the acquisition area watched by the user is real, and meanwhile, the text content corresponding to the voice data sent by the object in the environment can be seen without any intervention.

receiving a target language configured by a user;

Fig. 4 shows a schematic flow chart of an application in a stage performance scene according to an embodiment of the present disclosure. As shown in fig. 4, a 360-degree microphone array 401 is provided around the stage for collecting voice data uttered by performers on the stage in real time. The processing device 402 may be located in the space of the stage or remotely, communicating with the microphone array 401 over a communications network. In the audience area, the audience may wear AR glasses 403 to watch the performance on the stage. The AR glasses may communicate with the processing device 402 over a communication network. During the performance, the microphone array 401 sends the voice data collected in real time to the processing device 402 through the network, and after being processed by the processing device 402, subtitle information can be obtained, and the subtitle information is sent to the AR glasses 403 worn by the audience in the audience area through the network and displayed on the AR glasses 403 in real time, so that the audience can superpose and display subtitles on the AR glasses 403 while watching the object performance.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.

According to the data processing apparatus of an embodiment of the present disclosure, the apparatus may be implemented as part or all of an electronic device by software, hardware, or a combination of both. The data processing apparatus includes:

In an optional implementation manner of this embodiment, the first obtaining module includes:

the first acquisition sub-module is configured to acquire voice data acquired by a voice acquisition device from the voice acquisition device arranged in the acquisition area.

In an optional implementation manner of this embodiment, the first processing module includes:

a first preprocessing submodule configured to preprocess the voice data;

the first recognition submodule is configured to recognize the preprocessed voice data by using an acoustic model to obtain corresponding candidate content;

and the first semantic processing submodule is configured to perform semantic processing on the candidate content by using a semantic model to obtain the text content.

In an optional implementation manner of this embodiment, after the first processing module, the apparatus further includes:

the first translation module is configured to translate the text content into target content corresponding to a target language associated with the display device.

In an alternative implementation of this embodiment, the display device includes an AR display device.

The data processing apparatus in this embodiment corresponds to the data processing method in the embodiment and the related embodiment shown in fig. 1, and specific details can be referred to the above description of the data processing method in the embodiment and the related embodiment shown in fig. 1, and are not described herein again.

According to the data processing apparatus of another embodiment of the present disclosure, the apparatus may be implemented as part or all of an electronic device by software, hardware, or a combination of both. The apparatus is located on a display device, and the data processing apparatus includes:

In an optional implementation manner of this embodiment, the display module includes:

an acquisition sub-module configured to acquire image data within the acquisition region;

a second acquisition sub-module configured to acquire a display image by superimposing the virtual information on the image data;

a first display sub-module configured to display the display image.

In an optional implementation manner of this embodiment, the display device includes a transparent display unit, and the display module includes:

a second display sub-module configured to display the text content on the transparent display unit such that the text content is displayed superimposed on information viewed by a user through the transparent display unit.

In an optional implementation manner of this embodiment, the method further includes:

a first receiving module configured to receive a target language configured by a user;

before the display module, the method further comprises:

and the second translation module is configured to translate the text content in the virtual information into the target content corresponding to the target language when the text content in the virtual information is not matched with the target language.

The data processing apparatus in this embodiment corresponds to the data processing method in the embodiment and the related embodiment shown in fig. 2, and specific details can be referred to the above description of the data processing method in the embodiment and the related embodiment shown in fig. 2, which is not described herein again.

According to a data processing apparatus of still another embodiment of the present disclosure, the apparatus may be implemented as a part or all of an electronic device by software, hardware, or a combination of both. The device is located display device, and this display device includes pronunciation acquisition element and display element, and this data processing apparatus includes:

In an optional implementation manner of this embodiment, the second processing module includes:

a second preprocessing submodule configured to preprocess the voice data;

the second recognition submodule is configured to recognize the preprocessed voice data by using an acoustic model to obtain corresponding candidate content;

and the second semantic processing submodule is configured to perform semantic processing on the candidate content by using a semantic model to obtain the text content.

In an optional implementation manner of this embodiment, the display device includes an image capturing unit, and the second output module includes:

a third acquisition sub-module configured to acquire image data acquired by the image acquisition unit;

a fourth acquisition sub-module configured to acquire a display image by superimposing the virtual information on the image data;

a first output sub-module configured to output the display image to the display unit for display.

In an optional implementation manner of this embodiment, the display unit includes a transparent display unit, and the second output module includes:

and the second output sub-module is configured to output the text content to the transparent display unit for displaying, so that the text content is displayed in a manner of being superposed on the information viewed by the user through the transparent display unit.

In an optional implementation manner of this embodiment, the apparatus further includes:

a second receiving module configured to receive a target language configured by a user;

before the second output module, the apparatus further includes:

and the third translation module is configured to translate the text content in the virtual information into the target content corresponding to the target language when the text content in the virtual information is not matched with the target language.

The data processing apparatus in this embodiment corresponds to the data processing method in the embodiment and the related embodiment shown in fig. 3, and specific details can be referred to the above description of the data processing method in the embodiment and the related embodiment shown in fig. 3, which is not described herein again.

As shown in fig. 5, the electronic device 500 includes a processing unit 501, which may be implemented as a CPU, GPU, FPGA, NPU, or the like processing unit. The processing unit 501 may perform various processes in the embodiments of any one of the methods described above of the present disclosure according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing unit 501, the ROM502, and the RAM503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to embodiments of the present disclosure, any of the methods described above with reference to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing any of the methods of the embodiments of the present disclosure. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A data processing method, comprising:

acquiring voice data in an acquisition area;

2. The method of claim 1, wherein processing the voice data to obtain virtual information comprises:

preprocessing the voice data;

3. The method according to claim 1 or 2, wherein after processing the voice data to obtain the virtual information, further comprising:

4. The method of claim 1 or 2, wherein the display device comprises an AR display device.

5. A data processing method, wherein the method is performed on a display device, comprising:

6. The method of claim 5, wherein displaying the virtual information comprises:

acquiring image data in the acquisition area;

and displaying the display image.

7. The method of claim 5 or 6, wherein the display device comprises a transparent display unit, displaying the virtual information, comprising:

8. The method of claim 5 or 6, wherein the method further comprises:

receiving a target language configured by a user;

9. A data processing method, wherein the method is performed on a display device, the display device comprising a voice acquisition unit and a display unit, comprising:

acquiring voice data acquired by the voice acquisition unit;

10. The method of claim 9, wherein processing the voice data to obtain virtual information comprises:

preprocessing the voice data;

11. The method of claim 9 or 10, wherein the display device comprises an image acquisition unit that outputs the virtual information to the display unit for display thereon, comprising:

acquiring image data acquired by the image acquisition unit;

and outputting the display image to the display unit for displaying.

12. The method of claim 9 or 10, wherein the display unit comprises a transparent display unit, outputting the virtual information to the display unit for display thereon, comprising:

13. The method according to claim 9 or 10, wherein the method further comprises:

receiving a target language configured by a user;

14. A data processing apparatus, comprising:

15. A data processing apparatus, wherein the apparatus is located on a display device, the apparatus comprising:

16. A data processing apparatus, wherein the apparatus is located in a display device, the display device comprising a speech acquisition unit and a display unit, the apparatus comprising:

17. An electronic device, comprising a memory and a processor; wherein the content of the first and second substances,

the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method of any one of claims 1-13.

18. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, implement the method of any of claims 1-13.