CN117292692A

CN117292692A - Display device and audio recognition method

Info

Publication number: CN117292692A
Application number: CN202211634779.3A
Authority: CN
Inventors: 王光强; 王金童; 杨绍栋
Original assignee: Juhaokan Technology Co Ltd
Current assignee: Juhaokan Technology Co Ltd
Priority date: 2022-12-19
Filing date: 2022-12-19
Publication date: 2023-12-26

Abstract

The application discloses a display device and an audio recognition method, wherein, a request of a user for recognizing audio is responded, and the playing state of a first application program running in the display device is monitored; after the playing state is monitored to be playing, based on the current time, acquiring first audio data corresponding to the first application program in a preset first time range, and identifying the information of the first audio data can be realized by executing corresponding audio identification operation on the first audio data; after the playing state is not played, the second audio data (the sound source of the second audio data is the sound source outside the display device) in the preset second time range is acquired through the microphone, the information for identifying the second audio data can be realized by executing corresponding audio identification operation on the second audio data, the identification of various audios in a scene is provided for a user, the identification of interesting audios sent by sound sources such as intelligent equipment, characters, animals and the like in the environment is provided for the user, and the user experience is improved.

Description

Display device and audio recognition method

Technical Field

The application relates to the technical field of audio recognition, in particular to a display device and an audio recognition method.

Background

With the development of intelligent devices, multiple intelligent devices exist in more and more application scenes. For example, in a scenario of applying the home internet, there may be a display device (such as a smart tv, a smart phone, a tablet, etc.), a smart speaker, a refrigerator, an air conditioner, etc. at the same time, where the display device provides different entertainment functions such as audio, video, game, etc. for a user, and the smart speaker may provide entertainment functions such as audio for the user, so as to satisfy the user's demands for different entertainment.

If a smart device capable of playing audio exists in the application scene, a user may sometimes be interested in the audio played by the smart device and hope to know related information about the audio. In response to this demand of users, a method capable of recognizing audio is demanded.

Disclosure of Invention

The application provides a display device and an audio identification method, which can be used for realizing the identification of interesting audio data played by intelligent equipment in an application scene.

The application provides a recording state display method and display equipment, which can improve the user experience of a user operating the display equipment.

In a first aspect, some embodiments of the present application provide a display device, a display, and a controller in communication with the display, the microphone, the controller configured to:

Responding to a request of a user for identifying audio, and monitoring the playing state of a first application program running in the display equipment;

after the playing state is monitored to be playing, based on the current time, acquiring first audio data corresponding to the first application program in a preset first time range, and identifying information of the first audio data by executing corresponding audio identification operation on the first audio data;

and after the playing state is not played, acquiring second audio data in a preset second time range through a microphone, and identifying information of the second audio data by executing corresponding audio identification operation on the second audio data, wherein a sound source of the second audio data is a sound source outside the display equipment.

In a second aspect, some embodiments of the present application provide an audio recognition method, including:

Some embodiments of the present application provide a display device and an audio recognition method, which can respond to a request of a user for recognizing audio and monitor a playing state of a first application program running in the display device; after the playing state is monitored to be playing, based on the current time, acquiring first audio data corresponding to the first application program in a preset first time range, and identifying the information of the first audio data can be realized by executing corresponding audio identification operation on the first audio data; after the playing state is not played, the second audio data in a preset second time range is acquired through the microphone (the sound source of the second audio data is the sound source outside the display device), the information for identifying the second audio data can be realized by executing corresponding audio identification operation on the second audio data, in an application scene, the identification of various audios in the application scene is provided for a user, when the user encounters the intelligent device to play the interesting audio or the environment has interesting humming, the corresponding audio information can be identified, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates an operational scenario between a display device and a control apparatus of some embodiments of the present application;

fig. 2 shows a hardware configuration block diagram of the control apparatus 100 of some embodiments of the present application;

fig. 3 shows a hardware configuration block diagram of a display device 200 of some embodiments of the present application;

FIG. 4 illustrates a software configuration diagram in a display device according to some embodiments of the present application;

FIG. 5a illustrates a timing diagram for audio recognition in a display device according to some embodiments of the present application;

FIG. 5b illustrates a schematic diagram of a user request to identify audio to a display device in some embodiments of the present application;

fig. 5c is a schematic diagram illustrating an application scenario where a display device is located in some embodiments of the present application;

FIG. 6 illustrates a timing diagram for a display device to recognize audio in some embodiments of the present application;

FIG. 7 illustrates a flow chart of a decision for first audio data and third audio data in some embodiments of the present application;

FIG. 8 illustrates a timing diagram for a display device to recognize audio in some embodiments of the present application;

fig. 9 illustrates a timing diagram of an audio recognition operation by a display device through a recognition audio server in some embodiments of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the exemplary embodiments of the present application more apparent, the technical solutions in the exemplary embodiments of the present application will be clearly and completely described below with reference to the drawings in the exemplary embodiments of the present application, and it is apparent that the described exemplary embodiments are only some embodiments of the present application, but not all embodiments.

All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are intended to be within the scope of the present application based on the exemplary embodiments shown in the present application. Furthermore, while the disclosure has been presented in terms of an exemplary embodiment or embodiments, it should be understood that various aspects of the disclosure can be practiced separately from the disclosure in a complete subject matter.

It should be understood that the terms "first," "second," "third," and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such as where appropriate, for example, implementations other than those illustrated or described in accordance with embodiments of the present application.

Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The display device provided in the embodiment of the application may have various implementation forms, for example, may be a television, a smart television, a mobile terminal, a tablet computer, a notebook computer, a laser projection device, a display (monitor), an electronic whiteboard (electronic bulletin board), an electronic desktop (electronic table), and the like. Fig. 1 and 2 are specific embodiments of a display device of the present application.

In the same application scenario, there are one or more display devices, and possibly other smart devices that can play audio, for example, smart speakers, smart microphones, display devices, etc.

Fig. 1 is a schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment. As shown in fig. 1, a user may operate the display device 200 through the smart device 300 or the control apparatus 100.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication or bluetooth protocol communication, and other short-range communication modes, and the display device 200 is controlled by a wireless or wired mode. The user may control the display device 200 by inputting user instructions through keys on a remote control, voice input, control panel input, etc.

In some embodiments, a smart device 300 (e.g., mobile terminal, tablet, computer, notebook, etc.) may also be used to control the display device 200. For example, the display device 200 is controlled using an application running on a smart device.

In some embodiments, the display device may receive instructions not using the smart device or control device described above, but rather receive control of the user by touch or gesture, or the like.

In some embodiments, the display device 200 may also perform control in a manner other than the control apparatus 100 and the smart device 300, for example, the voice command control of the user may be directly received through a module configured inside the display device 200 device for acquiring voice commands, or the voice command control of the user may be received through a voice control apparatus configured outside the display device 200 device.

In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be permitted to make communication connections via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display device 200. The server 400 may be a cluster, or may be multiple clusters, and may include one or more types of servers.

Fig. 2 exemplarily shows a block diagram of a configuration of the control apparatus 100 in accordance with an exemplary embodiment. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive to the display device 200, and function as an interaction between the user and the display device 200.

As shown in fig. 3, the display apparatus 200 includes at least one of a modem 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface.

In some embodiments the controller includes a processor, a video processor, an audio processor, a graphics processor, RAM, ROM, a first interface for input/output to an nth interface.

The display 260 includes a display screen component for presenting a picture, and a driving component for driving an image display, a component for receiving an image signal from the controller output, displaying video content, image content, and a menu manipulation interface, and a user manipulation UI interface.

The display 260 may be a liquid crystal display, an OLED display, a projection device, or a projection screen.

The communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, or other network communication protocol chip or a near field communication protocol chip, and an infrared receiver. The display apparatus 200 may establish transmission and reception of control signals and data signals with the control device 100 or the server 400 through the communicator 220.

A user interface, which may be used to receive control signals from the control device 100 (e.g., an infrared remote control, etc.).

The detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for capturing the intensity of ambient light; alternatively, the detector 230 includes an image collector such as a camera, which may be used to collect external environmental scenes, user attributes, or user interaction gestures, or alternatively, the detector 230 includes a sound collector such as a microphone, or the like, which is used to receive external sounds.

The external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, etc. The input/output interface may be a composite input/output interface formed by a plurality of interfaces.

In some embodiments, the controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box or the like.

The controller 250 controls the operation of the display device and responds to the user's operations through various software control programs stored on the memory. The controller 250 controls the overall operation of the display apparatus 200. For example: in response to receiving a user command to select a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments the controller includes at least one of a central processing unit (Central Processing Unit, CPU), video processor, audio processor, graphics processor (Graphics Processing Unit, GPU), RAM Random Access Memory, RAM), ROM (Read-Only Memory, ROM), first to nth interfaces for input/output, a communication Bus (Bus), and the like.

A "user interface" is a media interface for interaction and exchange of information between an application or operating system and a user, which enables conversion between an internal form of information and a user-acceptable form. A commonly used presentation form of the user interface is a graphical user interface (Graphic User Interface, GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in a display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

In some embodiments, as shown in fig. 4, the system is divided into four layers, from top to bottom, an application layer (application layer), an application framework layer (Application Framework layer), a An Zhuoyun row (Android run) and a system library layer (system runtime layer), and a kernel layer.

In some embodiments, at least one application program is running in the application program layer, and these application programs may be a Window (Window) program of an operating system, a system setting program, a clock program, or the like; or may be an application developed by a third party developer. In particular implementations, the application packages in the application layer are not limited to the above examples.

As shown in fig. 4, the application framework layer in the embodiment of the present application includes a manager (manager), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used to interact with all activities that are running in the system; a Location Manager (Location Manager) is used to provide system services or applications with access to system Location services; a Package Manager (Package Manager) for retrieving various information about an application Package currently installed on the device; a notification manager (Notification Manager) for controlling the display and clearing of notification messages; a Window Manager (Window Manager) is used to manage icons, windows, toolbars, wallpaper, and desktop components on the user interface.

In some embodiments, the activity manager is used to manage the lifecycle of the individual applications as well as the usual navigation rollback functions, such as controlling the exit, opening, fallback, etc. of the applications. The window manager is used for managing all window programs, such as obtaining the size of the display screen, judging whether a status bar exists or not, locking the screen, intercepting the screen, controlling the change of the display window (for example, reducing the display window to display, dithering display, distorting display, etc.), etc.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the kernel layer contains at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (e.g., fingerprint sensor, temperature sensor, pressure sensor, etc.), and power supply drive, etc.

For example, in the home internet, there are multiple home appliances (such as a television, a mobile phone tablet and other display devices, and an intelligent sound box) capable of playing audio, so that entertainment functions such as audio can be provided for users, and the requirements of users on different entertainment can be met.

The display device can recognize the audio through installing an application program with an audio recognition function. If a user is interested in audio played by the display device or other home appliances, and wants to know related information about the audio, the embodiment of the application provides a display device and an audio identification method aiming at the requirement of the user, so that audio identification under an application scene with multiple devices can be realized.

In order to meet the above requirements of users, the embodiments of the present application provide a display device and an audio recognition method, which can respond to a request of a user for recognizing audio and monitor a playing state of a first application program running in the display device; after the playing state is monitored to be playing, based on the current time, acquiring first audio data corresponding to the first application program in a preset first time range, and identifying the information of the first audio data can be realized by executing corresponding audio identification operation on the first audio data; after the playing state is not played, the second audio data in a preset second time range is acquired through the microphone (the sound source of the second audio data is the sound source outside the display device), the information for identifying the second audio data can be realized by executing corresponding audio identification operation on the second audio data, in an application scene, the identification of various audios in the application scene is provided for a user, when the user encounters the intelligent device to play the interesting audio or the environment has interesting humming, the corresponding audio information can be identified, and the user experience is improved.

Fig. 5a shows a timing diagram of audio recognition in a display device according to some embodiments of the present application, as shown in fig. 5a, the display device comprising a microphone, a controller, wherein the controller is configured to perform the steps of:

s310, responding to a request of a user for identifying audio, and monitoring the playing state of a first application program running in the display equipment.

The request for identifying the audio may be sent by the user by clicking an identification control on the display device 200, or may be sent by inputting a user by a button, a voice input, or the like on the control apparatus 100.

Fig. 5b is a schematic diagram illustrating a request of a user to identify audio to a display device in some embodiments of the present application, where, as shown in fig. 5b, the display device has an identification control 261, and the user implements an operation on the identification control 261 through a control device or the like, to trigger the request of identifying audio.

In response to a request of a user for identifying audio, firstly, monitoring a playing state of a first application program running in the display device, wherein the first application program is an application program running in the display device and having a playing function, and the first application program can be one or a plurality of application programs.

If a plurality of first application programs are displayed on the display device, determining a target application program from the plurality of first application programs based on a preset selection strategy, and acquiring first audio data corresponding to the target application program.

For example, the first application program being played includes a video playing application program, a music playing application program and a game application program, and all the first application program and the game application program are playing audio, at this time, the preset selection policy may be a type corresponding to a request for identifying audio, and when the request for identifying audio is a music identification, the target application program corresponding to the corresponding preset selection policy is the music playing application program.

The playing state of the first application program can be in playing or not; and judging whether the first application program currently exists in the display device or not by monitoring the playing state of the first application program running in the display device.

If it is monitored that the playing state of the first application is playing, step 320 is executed, and if it is monitored that the playing state of the first application is not playing, step 330 is executed.

And S320, after the playing state is monitored to be playing, based on the current time, acquiring first audio data corresponding to the first application program in a preset first time range, and identifying the information of the first audio data by executing corresponding audio identification operation on the first audio data.

After the playing state of the first application program is monitored to be playing, based on the current time corresponding to the monitoring, obtaining first audio data corresponding to a preset first time range.

The first audio data is data before audio decoding of the display device, that is, data which is not subjected to audio decoding before the display device passes through the power amplifier and the loudspeaker, and at this time, the data is a digital signal and only contains data corresponding to audio information, and the data is purer and less in interference compared with the data after audio decoding of the display device.

If the first audio data is the data which is not played by the display device after the audio decoding, the first audio data needs to be encoded again; alternatively, if the first audio data is data that has been played by the display device after audio decoding, i.e. an ambient sound recorded by the microphone, it may contain disturbing sounds, such as a speaking sound of the user, sounds of other devices, etc. Therefore, in the embodiment of the application, the first audio data is the data of the display device before audio decoding, so that the efficiency of collecting the audio data and the purity of the data are improved.

And the information of the first audio data is information of the first audio data, the audio data to be identified comprising the first audio data is transmitted to the identification audio server, and the identification result corresponding to the audio data to be identified is displayed in a result display area of the display according to the returned result of the identification audio server.

In some embodiments, after the playing state of the first application program is monitored to be playing, the sound recorded by the microphone is not obtained by obtaining the corresponding first audio data in the preset first time range, so as to eliminate the interference of the microphone sound.

The information of the identified first audio data can be directly displayed in the result display area or indirectly displayed in the result display area in a two-dimensional code mode or the like; in some embodiments, the display may also be presented by having voice information or the like.

For example, the video playing application program in the display device is playing the media asset a, and playing the music B in the media asset a, where the user is interested in the music B, and may respond to the request for identifying the audio by clicking a control on the display device or controlling to press a button, input voice, etc. on the control device, and monitor that the playing state of the display device is playing, obtain the first audio data C in the preset first time range at this time, where the first audio data C is a part of the music B, and identify the information of the first audio data C by performing a corresponding audio identification operation on the first audio data C.

S330, after the playing state is not played, acquiring second audio data in a preset second time range through a microphone, and identifying information of the second audio data by executing corresponding audio identification operation on the second audio data, wherein a sound source of the second audio data is a sound source outside the display device.

It should be understood that the display device is in a state of not playing when turned on, i.e. the playing state of the first application running in the display device is not playing; however, in the scene where the display device is located, other display devices or other intelligent devices outside the display device are in a playing state, or audio such as humming which is interested in exists in the scene, at this time, the display device acquires second audio data in a preset second time range through the microphone.

The preset second time is the same time length or different time lengths as the preset first time.

The second audio data is audio data outside the display device obtained through the microphone, and a sound source of the second audio data can be played by other display devices outside the display device or other intelligent speakers, or can be hummed by other people in the scene through the microphone, the sound box and other devices.

Fig. 5c is a schematic diagram illustrating an application scenario where a display device is located in some embodiments of the present application, where, as shown in fig. 5c, the application scenario includes a display device 200, other display devices (flat panel 291, smart phone 292) outside the display device, smart microphone 294, smart speaker 293, a person, an animal, and so on; and the devices, persons and animals in the scene can be sound sources, and various audios in the scene are identified through the display device 200.

It will be appreciated that the second audio data is acquired by the microphone, determined after audio encoding and audio decoding of a sound source external to the display device.

In order to make the obtained second audio data easy to identify, in some embodiments, the initial audio data may also be filtered by a preset audio filtering policy, so as to filter the noise in the initial audio data to obtain the second audio data.

For example, if the audio recognition takes a song (song recognition) as an example, at this time, a sound source outside the display device includes a song M played by the intelligent sound box, an automobile whistle in a scene, and a media item N played by other intelligent devices (without music), at this time, the sound of the automobile whistle and the media item N is filtered through a preset audio filtering policy, so that the second audio data only includes the song M.

And the second audio data is identified by executing corresponding audio identification operation on the second audio data, the information of the second audio data is that the audio data to be identified comprising the second audio data is transmitted to an identification audio server, and the identification result corresponding to the audio data to be identified is displayed in a result display area of the display according to the returned result of the identification audio server.

It should be understood that audio recognition in embodiments of the present application includes identifying tracks, identifying sounds, identifying information of audio, and the like.

The display device provides the user with the identification of the audio played by the display device or other audio outside the display device in the application scene through the steps executed by the controller, so that the user can identify the corresponding audio information when encountering the audio which is interested in the intelligent device playing or the humming which is interested in the environment, and the user experience is improved.

In some embodiments, the first application of the display device is in a playing state of playing, at which other audio data (such as audio data played by other smart devices or humming of other people, etc.) is present in addition to the display device, where before the identifying information of the first audio data by performing a corresponding audio identifying operation on the first audio data in step 320, the acquiring and analyzing of the audio data outside the display device may be further included, fig. 6 shows a timing diagram of the identifying audio by the display device in some embodiments of the present application, as shown in fig. 6, the controller in the display device is configured to perform the following steps:

S4 1 0, responding to a request of a user for identifying audio, and monitoring the playing state of a first application program running in the display equipment.

If it is monitored that the playing state of the first application is playing, steps 421 to 424 are performed, and if it is monitored that the playing state of the first application is not playing, step 430 is performed.

S42, after the playing state is monitored to be playing, based on the current time, acquiring first audio data corresponding to the first application program in a preset first time range.

The first application of the display device is in a playing state of being played, at this time, other audio data, such as audio data played by other intelligent devices or humming of other people, exist in addition to the display device. Step 422 may continue while acquiring the first audio data corresponding to the first application program within the preset first time range based on the current time.

S422, monitoring whether audio data sent by other sound sources outside the display device exist or not in a preset first time range through a microphone.

It should be understood that other sound sources than the display device may be other smart devices, display devices or human humming. For example, other sound sources outside the display device may be emitted by a mobile phone, may be emitted by a smart speaker, or may be emitted by other people, animals, etc. in the scene.

Monitoring whether audio data sent by other sound sources exist outside the display device within a preset first time range through a microphone of the display device, and if so, executing step 423.

S423, after the audio data sent by other sound sources except the display device are monitored, the third audio data sent by the other sound sources are collected through the microphone.

It should be understood that the third audio data is the same as the second audio data except for the display device obtained by the microphone, and thus, the third audio data is data after audio encoding and audio decoding of a sound source outside the display device.

The third audio data is acquired within a preset first time corresponding to the acquisition of the first audio data, and the acquisition time corresponding to the third audio data can be equal to or less than the preset first time.

For example, taking a display device as an intelligent electricity as an example, playing a music accompaniment by using media of the intelligent television, wherein the user is interested in the music accompaniment, and identifying music through a corresponding control, wherein at the moment, the user hums lyrics corresponding to the accompaniment through an intelligent microphone in communication connection with the intelligent sound box, at the moment, the intelligent television acquires the music accompaniment (namely first audio data), and acquires the lyrics corresponding to the accompaniment through the intelligent microphone through a microphone of the intelligent television; at this time, if the audio recognition can be performed together by the music accompaniment and the corresponding lyrics, the efficiency of the audio recognition can be improved.

In some embodiments, the first audio data and the third audio data may be acquired simultaneously or sequentially, and the efficiency of acquiring the first audio data and the third audio data may be improved by performing the first audio data and the third audio data simultaneously because the first audio data and the third audio data are acquired in different manners.

S424, identifying information of the audio data to be identified by performing corresponding audio identification operation on the audio data to be identified containing the first audio data and the third audio data.

And the information of the audio data to be identified is transmitted to an identification audio server, and according to the returned result of the identification audio server, the identification result corresponding to the audio data to be identified is displayed in a result display area of the display.

When the display device acquires the first audio data played in the display device and the third audio data outside the display device at the same time, it can be determined whether the audio data to be recognized includes the first audio data and the third audio data, or includes only the first audio data, or includes only the third audio data, by determining the first audio data and the third audio data.

In some embodiments, the determining of the first audio data and the third audio data may include determining whether the third audio data corresponds to the first audio data, or may determine, based on the type of the identification audio, that the third audio data and the first audio data belong to the content included in the audio data to be identified.

Fig. 7 shows a flowchart of determining the first audio data and the third audio data according to some embodiments of the present application, and in step 424, further includes the following steps:

S424 1, it is determined whether the third audio data corresponds to the first audio data.

S4242, if the third audio data corresponds to the first audio data, identifying the information of the first audio data and the third audio data by executing corresponding audio identification operation on the first audio data and the third audio data.

It should be appreciated that the correspondence of the third audio data with the first audio data characterizes the third audio data as being a representation of different audio data for the same data.

For example, the smart tv acquires a musical accompaniment (i.e., the first audio data), and the microphone of the smart tv acquires lyrics corresponding to the accompaniment (i.e., the third audio data) through the smart microphone humming of the user, or the smart tv acquires a musical accompaniment (i.e., the first audio data), and the microphone of the smart tv acquires lyrics corresponding to the accompaniment (i.e., the third audio data) through the microphone of the smart tv humming of the user, at which time it may be determined that the third audio data corresponds to the first audio data.

And the information for identifying the first audio data and the third audio data is that the audio data to be identified comprising the first audio data and the third audio data is transmitted to an identification audio server, and according to the returned result of the identification audio server, the identification result corresponding to the audio data to be identified is displayed in a result display area of the display.

It should be noted that, the audio data to be identified including the first audio data and the third audio data may be formed by combining the first audio data and the third audio data through a preset audio data combination policy.

For example, the first audio data and the third audio data are combined according to the time of collection, and when the third audio data corresponds to the first audio data, the first audio data and the third audio data corresponding to each time of collection also correspond, so by being able to combine the first audio data and the third audio data according to the time of collection.

And meanwhile, the first audio data and the third audio data are acquired, so that the identification of the audio data to be identified, which are combined by the first audio data and the third audio data, is more accurate.

S4243, if the third audio data is inconsistent with the first audio data, identifying the information of the first audio data by executing corresponding audio identification operation on the first audio data.

If the third audio data is inconsistent with the first audio data, the first audio data on the display device is defaulted to be used as the judgment basis of audio identification.

In some embodiments, the information of the audio data to be identified may be identified by analyzing the first audio data and the third audio data based on the type of the audio identification request, determining the audio data to be identified, performing a corresponding audio identification operation on the audio data to be identified; wherein the audio data to be identified comprises first audio data, or third audio data, or the first audio data and the third audio data.

For example, the audio recognition is to recognize the animal sound, the first audio data is a dialogue that the audio data played by the media of the display device is a character in the media, the third audio data is a bird sound played by the smart phone, at this time, the third audio data is inconsistent with the first audio data, the audio recognition request is to execute corresponding audio recognition operation on the third audio data, and the information of the third audio data is recognized.

And the information of the third audio data is transmitted to the identification audio server, and according to the returned result of the identification audio server, the identification result corresponding to the audio data to be identified is displayed in the result display area of the display.

S430, after the playing state is not played, acquiring second audio data in a preset second time range through a microphone, and identifying information of the second audio data by executing corresponding audio identification operation on the second audio data, wherein a sound source of the second audio data is a sound source outside the display device.

The second audio data is audio data outside the display device obtained through the microphone, and a sound source of the second audio data can be played by other display devices outside the display device or other intelligent device speakers, or can be hummed by other people in the scene through the microphone, the sound box and other devices.

In some embodiments, the first application of the display device is in a playing state that is playing, where the playing state that is playing may include multiple playing states, and after the playing state is monitored in step 320, it may be further determined that the recording policy corresponding to the first audio data is further determined according to the multiple playing states, fig. 8 shows a timing chart of the display device identifying audio in some embodiments of the present application, as shown in fig. 8, a controller in the display device is configured to perform the following steps:

s510, responding to a request of a user for identifying audio, and monitoring the playing state of a first application program running in the display equipment.

If it is monitored that the playing state of the first application is playing, steps 521 to 522 are performed, and if it is monitored that the playing state of the first application is not playing, step 530 is performed.

S521, after the playing state is monitored to be playing, determining a recording strategy corresponding to the playing state of the first application program.

When the playing state of the first application program in playing includes a first playing state and a second playing state, determining the playing state of the first application program in playing includes the following steps:

If the playing state of the first application program is the first playing state, determining to acquire a first recording strategy corresponding to the first playing state based on a preset recording strategy.

And if the playing state of the first application program is the second playing state, determining to acquire a second recording strategy corresponding to the first playing state based on a preset recording strategy.

For example, in the smart tv, according to the service scenario, two types of audio are included, a common media playing (for example, playing of a movie or a television play, which has low requirement on delay and high requirement on sound effect) and a talking type and a K song voice type playing (which has low requirement on delay and low requirement on sound quality), links of the two types of media playing are different, the media playing adopts a common playing channel, the K song voice type playing is a low delay channel, and recording nodes of the K song voice type playing are different for different channels; at this time, for the media asset application, the corresponding playing state being played is the first playing state; and determining that the media asset application program corresponds to the first recording strategy based on a preset recording strategy, and the K song voice application program corresponds to the second recording strategy.

S522, based on the current time and the recording strategy, acquiring first audio data corresponding to the first application program in a preset first time range, and identifying information of the audio data to be identified by executing corresponding audio identification operation on the audio data to be identified containing the first audio data.

For the acquisition of the first audio data, implementation according to a corresponding recording strategy is also required.

In some embodiments, step 522 is the same principle as step 320 and is not described in detail herein.

In some embodiments, in step 522, during the process of obtaining the first audio data corresponding to the first application program in the preset first time range based on the current time, other audio data exists besides the display device, for example, audio data played by other intelligent devices or humming of other people.

Firstly, monitoring whether audio data sent by other sound sources outside the display device exist or not in a preset first time range through a microphone.

And secondly, after the audio data sent by other sound sources except the display device are monitored, collecting third audio data sent by the other sound sources through a microphone.

The third audio data is data after audio encoding and audio decoding of a sound source external to the display device. The third audio data is acquired within a preset first time corresponding to the acquisition of the first audio data, and the acquisition time corresponding to the third audio data can be equal to or less than the preset first time.

And finally, identifying the information of the audio data to be identified by executing corresponding audio identification operation on the audio data to be identified containing the first audio data and the third audio data.

And if the third audio data is consistent with the first audio data, identifying the information of the first audio data and the third audio data by executing corresponding audio identification operation on the first audio data and the third audio data. If the third audio data is inconsistent with the first audio data, identifying the information of the first audio data by executing corresponding audio identification operation on the first audio data.

And S530, after the playing state is not played, acquiring second audio data in a preset second time range through a microphone, and identifying information of the second audio data by executing corresponding audio identification operation on the second audio data, wherein a sound source of the second audio data is a sound source outside the display equipment.

Fig. 9 is a timing chart of an audio recognition operation performed by a display device through a recognition audio server according to some embodiments of the present application, as shown in fig. 9, the audio data to be recognized may be determined according to different audio data, and specifically includes the following steps:

s610, determining the audio data to be identified.

Based on the timing diagrams of fig. 5 to 8, it should be understood that the audio data to be identified includes: the first audio data, or the second audio data, or the third audio data, or the first audio data and the third audio data.

S620, the audio data to be identified is transmitted to the identification audio server.

At the recognition audio server side, recognition is performed on the audio data to be recognized.

In some embodiments, the type of the audio data to be identified may be reduced to the preset data type on the identification audio server, the reduced data amount of the audio data to be identified may be smaller, the more speed is transferred to the identification audio server, the more efficient the identification by the identification audio server may be, so the type of the audio data to be identified may be reduced to the preset data type on the identification audio server before step 520.

S630, receiving the identification result sent by the identification audio server and displaying the identification result in the display device.

The information (recognition result) of the recognized audio data to be recognized can be displayed directly in a result display area or indirectly in the result display area in a two-dimensional code form or the like; in some embodiments, the display may also be presented by having voice information or the like.

It should be appreciated that the variety of audio data to be identified is very large, and typically the identification is performed at the server side as shown in fig. 9, and in some embodiments, the identification of the audio data to be identified may also be performed by data local to the display device.

According to the method and the device, the first audio data of the user and/or the third audio data outside the display device are obtained through the display device, the second audio data outside the display device are obtained when the first audio data of the user are not available, the audio data are used as the audio data to be identified, the identification of various audio data in an application scene is achieved through a cloud server for identifying the audio, user experience is improved, and meanwhile the efficiency of audio identification can be improved in some scenes.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A display device, characterized by comprising:

a display;

a controller in communication with the display, microphone, the controller configured to:

And after the playing state is not played, acquiring second audio data in a preset second time range through the microphone, and identifying information of the second audio data by executing corresponding audio identification operation on the second audio data, wherein a sound source of the second audio data is a sound source outside the display equipment.

2. The display device of claim 1, wherein prior to identifying the information of the first audio data by performing a corresponding audio identification operation on the first audio data, the controller is further configured to:

monitoring whether audio data sent by other sound sources outside the display device exist or not in the preset first time range through the microphone;

and after the audio data sent by other sound sources except the display equipment are monitored, collecting third audio data sent by the other sound sources through the microphone.

3. The display device of claim 2, wherein in the step of identifying information of the first audio data by performing a corresponding audio identification operation on the first audio data, the controller is configured to:

Determining whether the third audio data corresponds to the first audio data;

if the third audio data are corresponding to the first audio data, identifying information of the first audio data and the third audio data by executing corresponding audio identification operation on the first audio data and the third audio data;

and if the third audio data is inconsistent with the first audio data, identifying the information of the first audio data by executing corresponding audio identification operation on the first audio data.

4. A display device according to any of claims 1-3, wherein the controller is further configured to:

delivering audio data to be identified to an identification audio server, the audio data to be identified comprising: first audio data, or second audio data, or first audio data and third audio data;

and according to the returned result of the identification audio server, displaying the identification result corresponding to the audio data to be identified in a result display area of the display.

5. The display device of claim 2, wherein the first audio data is data of the display device prior to audio decoding;

The second audio data and the third audio data are data after audio encoding and audio decoding of a sound source external to the display device.

6. The display device of claim 1, wherein after hearing the play status as being playing, the controller is further configured to:

if the playing state of the first application program is a first playing state, determining to acquire a first recording strategy corresponding to the first playing state based on a preset recording strategy;

and if the playing state of the first application program is the second playing state, determining to acquire a second recording strategy corresponding to the first playing state based on the preset recording strategy.

7. The display device of claim 1, wherein after hearing the play status as being playing, the controller is further configured to:

and if a plurality of first application programs which are being played exist in the display device, determining a target application program from the plurality of first application programs based on a preset selection strategy, and acquiring the first audio data corresponding to the target application program.

8. The display device of claim 1, wherein in the step of acquiring second audio data over a preset second time frame by the microphone, the controller is configured to:

Acquiring initial audio data outside the display device in a preset second time range through the microphone;

and filtering the initial audio data through a preset audio filtering strategy to determine the second audio data.

9. An audio recognition method, comprising:

10. The audio recognition method according to claim 9, further comprising, before recognizing the information of the first audio data by performing a corresponding audio recognition operation on the first audio data: