CN113707179A - Audio identification method, device, equipment and medium - Google Patents

Audio identification method, device, equipment and medium Download PDF

Info

Publication number
CN113707179A
CN113707179A CN202110321717.6A CN202110321717A CN113707179A CN 113707179 A CN113707179 A CN 113707179A CN 202110321717 A CN202110321717 A CN 202110321717A CN 113707179 A CN113707179 A CN 113707179A
Authority
CN
China
Prior art keywords
audio
audio recognition
page
multimedia
application program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110321717.6A
Other languages
Chinese (zh)
Inventor
曹辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110321717.6A priority Critical patent/CN113707179A/en
Publication of CN113707179A publication Critical patent/CN113707179A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set

Abstract

The embodiment of the application provides an audio identification method, an audio identification device, audio identification equipment and a medium, wherein the method comprises the following steps: displaying an audio identification page of a first application program; outputting audio recognition progress in the audio recognition page, wherein the audio recognition progress is used for prompting progress information of the first application program for audio recognition according to the multimedia identifier, and the multimedia identifier is shared to the first application program by the second application program; and displaying an audio recognition result page of the first application program. By adopting the method and the device, the audio associated with the multimedia information can be accurately identified.

Description

Audio identification method, device, equipment and medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an audio recognition method, apparatus, device, and medium.
Background
Background music is often matched with multimedia information played through an application program; for example, short videos viewed through short video applications are often accompanied by background music. In addition, the user may also hear his/her favorite music while watching movies, television, radio, etc. In these scenarios, there may be no way for the user to obtain information about the music, such as the name of the music, the attribute information of the singer, and the like.
The prior art can obtain attribute information of related music by recording and identifying the recorded audio data. How to conveniently carry out audio identification reminding on a user becomes a hot problem of research.
Disclosure of Invention
The embodiment of the application provides an audio identification method, an audio identification device, audio identification equipment and an audio identification medium, which can conveniently identify audio associated with multimedia information.
In one aspect, an embodiment of the present application provides an audio identification method, where the method includes:
displaying an audio identification page of a first application program;
outputting audio recognition progress in the audio recognition page, wherein the audio recognition progress is used for prompting progress information of the first application program for audio recognition according to the multimedia identifier, and the multimedia identifier is shared to the first application program by the second application program;
and displaying an audio recognition result page of the first application program.
On the other hand, an embodiment of the present application provides an audio identification method, including:
receiving a multimedia identifier sent by a first application program;
inquiring the matched multimedia information according to the multimedia identifier, and performing audio recognition processing according to the multimedia information; and the number of the first and second groups,
sending the audio recognition processing result to the first application program, wherein the audio recognition processing result comprises: any one or more of the audio recognition progress and the audio recognition result.
In another aspect, an embodiment of the present application provides an audio recognition apparatus, including:
the display unit is used for displaying an audio identification page of the first application program;
the processing unit is used for outputting audio recognition progress in the audio recognition page, the audio recognition progress is used for prompting progress information of the first application program for audio recognition according to the multimedia identifier, and the multimedia identifier is shared to the first application program by the second application program;
and the processing unit is also used for displaying an audio recognition result page of the first application program.
In another aspect, an embodiment of the present application provides an audio recognition apparatus, including:
the receiving unit is used for receiving the multimedia identifier sent by the first application program;
the processing unit is used for inquiring the matched multimedia information according to the multimedia identifier and carrying out audio identification processing according to the multimedia information; and the number of the first and second groups,
the processing unit is further used for sending the audio recognition processing result to the first application program, wherein the audio recognition processing result comprises: any one or more of the audio recognition progress and the audio recognition result.
In another aspect, an embodiment of the present application provides an electronic device, which includes: a storage device and a processor;
the storage device stores a computer program;
and the processor runs the computer program stored in the storage device to realize the audio recognition method.
In another aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer application program is stored, and when the computer application program is executed, the audio recognition method described above is implemented.
In another aspect, embodiments of the present application provide a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device executes the audio recognition method.
In the embodiment of the application, the multimedia identifier of the multimedia information to be identified can be shared to the first application program from the second application program, so that the first application program can conveniently perform audio identification according to the multimedia identifier to obtain an audio identification result. Compared with the existing method that two application programs need to frequently jump to perform audio recognition, the method for directly sharing the multimedia identifier can be used for recognizing the audio (such as background music) associated with the multimedia identifier, is simple and convenient to operate, is not influenced by the external environment in the audio recognition process, improves the audio recognition efficiency and speed, and improves the accuracy of the audio recognition.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 illustrates an architecture diagram of an audio recognition system provided by an exemplary embodiment of the present application;
FIG. 2 is a flow chart illustrating an audio recognition method provided by an exemplary embodiment of the present application;
fig. 3 is a diagram illustrating a shared multimedia identifier according to an exemplary embodiment of the present application;
FIG. 4 is a diagram illustrating page jump hints provided by an exemplary embodiment of the present application;
FIG. 5a is a diagram illustrating an audio recognition progress provided by an exemplary embodiment of the present application;
FIG. 5b illustrates a schematic diagram of an audio recognition animation provided by an exemplary embodiment of the present application;
FIG. 5c is a diagram illustrating an audio recognition progress and audio recognition animation provided by an exemplary embodiment of the present application;
FIG. 6 illustrates a schematic diagram of an identification type option provided by an exemplary embodiment of the present application;
FIG. 7 is a diagram illustrating an identification progress area and a type selection area provided by an exemplary embodiment of the present application;
FIG. 8a is a flowchart illustrating a method for switching display from an audio recognition result page of a first application to a service page of the first application according to an exemplary embodiment of the present application;
FIG. 8b is a flowchart illustrating a jump from an audio recognition result page of a first application to a service page of a second application according to an exemplary embodiment of the present application;
FIG. 8c is a diagram illustrating displaying at least one candidate audio in an audio recognition result page according to an exemplary embodiment of the present application;
FIG. 9a is a diagram illustrating playing selected candidate audio provided by an exemplary embodiment of the present application;
FIG. 9b is a schematic diagram illustrating another example of playing selected candidate audio provided by an exemplary embodiment of the present application;
FIG. 10 illustrates a flow diagram of another audio recognition method provided by an exemplary embodiment of the present application;
fig. 11 is a schematic structural diagram illustrating an audio recognition apparatus according to an exemplary embodiment of the present application;
FIG. 12 is a schematic diagram illustrating another audio recognition device according to an exemplary embodiment of the present application;
fig. 13 shows a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides an audio identification scheme, and the audio identification scheme relates to the following terms and concepts:
first, Artificial Intelligence (AI). Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises an automatic content recognition technology, a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The Automatic Content Recognition (ACR) technology is a technology for directly recognizing multimedia information (such as audio, video, and images) by using a computer algorithm, wherein the input of the Recognition algorithm may be a multimedia file, or original data acquired by a microphone and a camera, and then feature comparison and search are performed in a multimedia database to obtain a matching result. The ACR technology provides a novel and convenient way to search and acquire information, and users can immediately acquire relevant information of multimedia contents in which users are interested without any manual input. The ACR technology has been widely applied to a plurality of fields such as image recognition, audio and video recognition, multi-screen interaction of television programs, automatic monitoring of television and broadcast contents, multimedia copyright detection, multi-screen synchronization of television advertisements and the like.
The song listening recognition is a core algorithm of an automatic content recognition technology, and is also one of Audio fingerprint technologies (Audio fingerprinting technologies). The mainstream song identification technology mainly applies an Audio fingerprint (Audio fingerprint) algorithm to extract fingerprints of each song, establishes a song fingerprint library (or called as a voiceprint feature library), and when a user requests for recording, the song listening identification technology firstly extracts Audio fingerprints (or called as Audio features) of the music, compares and matches the Audio fingerprints, and finds the song with the highest matching degree in the database.
And II, application programs. An application may refer to a computer program for performing one or more specific tasks. Classifying the application programs according to different dimensions (such as the operation modes and functions of the application programs) can obtain the types of the same application program under different dimensions, and classifying the application programs according to the operation modes of the application programs, wherein the application programs can include but are not limited to: a client installed in a terminal, an applet that can be used without download installation, a web application opened through a browser, and the like. Classified by the functional type of the application, the application may include, but is not limited to: IM (Instant Messaging) applications, content interaction applications, and the like; the instant messaging application refers to an application based on internet for instant messaging and social interaction, and the instant messaging application may include but is not limited to: QQ, WeChat, Business WeChat, map application containing social interaction functionality, gaming application, QQ music application, QQ browser, and the like. The content interaction application is an application capable of realizing content interaction, and may be, for example, an application such as internet banking, microblog, personal space, news, and the like. It should be noted that the application program mentioned later in this embodiment of the present application may be any one of a plurality of application programs classified according to an operation manner, and may also be any one of a plurality of application programs classified according to a function type; the embodiment of the present application does not limit the type of the application program.
Based on the above terms and concepts, the embodiments of the present application provide an audio recognition system. The audio recognition system can refer to fig. 1, as shown in fig. 1, the audio recognition system may include a terminal and a server, and the number of the terminal and the server is not limited in the embodiment of the present application. Among others, terminals may include, but are not limited to: smart phones (such as Android phones, iOS phones, etc.), tablet computers, portable personal computers, Mobile Internet Devices (MID for short), smart televisions, vehicle-mounted Devices, head-mounted Devices, VR/AR Devices, and other electronic Devices capable of performing touch screen. An application (which may be simply referred to as an application, such as a video application, a music application, or the like) may be run in the terminal. The server may include, but is not limited to: data processing servers, Web servers, application servers, and the like have complex computing capabilities. The server can be a background server of any application program and is used for interacting with a terminal running any application program so as to provide calculation and application service support for any application program. The server may be an independent physical server, or may be a server cluster or distributed system composed of a plurality of physical servers. The terminal and the server may be directly or indirectly communicatively connected in a wired or wireless manner, and the connection manner between the terminal and the server is not limited in the embodiments of the present application.
An audio recognition scheme is provided based on the audio recognition system, and the audio recognition scheme can be executed by a target terminal (or called as electronic equipment, namely any terminal) in the audio recognition system or an application program running in the target terminal; for convenience of explanation, the audio recognition scheme executed by the target terminal is described as an example in the following. Specifically, the general principle of the audio recognition scheme is as follows: when a user needs to identify the song title of background music during the process of watching short videos (or videos, audios, movies, etc.) by using any application program, the user can share the link of the short videos to be identified from the application program to the application program with the function of identifying the song title (such as QQ music, etc.) by using the sharing function of the application program, so that the application program performs audio identification according to the link of the short videos, and outputs an audio identification result in an audio identification result page.
Based on this, in the audio identification scheme provided in the embodiment of the present application, when an arbitrary user generates a requirement for identifying a song title of background music associated with a short video in a process of watching the short video, the arbitrary user can share the short video to an application program with the identified song title for audio identification, so as to obtain the background music associated with the short video; compared with the existing method that the song name identification can be realized only by skipping the application program, the method for sharing the short video can realize identification of the background music associated with the short video, is simple and convenient to operate, is not influenced by the external environment in the audio identification process, improves the audio identification efficiency and speed, and improves the accuracy of audio identification.
The audio recognition scheme proposed by the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
FIG. 2 is a flow chart illustrating an audio recognition method provided by an exemplary embodiment of the present application; the flow of the audio recognition scheme proposed by the embodiment of the present application may be executed by the aforementioned target terminal. As shown in fig. 2, the audio recognition scheme may include the following steps S201-S203.
S201: an audio identification page of the first application is displayed.
The first application may refer to any application with an audio recognition function, for example, the first application is a QQ music application, and the QQ music application has information related to music recognition (e.g., information such as a music name and a music release time). When the first application program receives the multimedia identifier sent by the second application program (such as any application program running in the target terminal), the first application program starts to execute the operation of audio identification according to the multimedia identifier, and displays an audio identification page of the first application program to remind a user that audio identification is being performed. The multimedia identifier may be used to uniquely identify a piece of multimedia information, and the multimedia information may include any one of the following: the multimedia identifier comprises a link of multimedia information, and the link of the multimedia information can be directly shared to the first application program by the second application program; or the link of the multimedia information is set in the picture and shared to the first application program by the second application program; the pictures herein include, but are not limited to: two-dimensional codes, bar codes, graphic codes, identification codes, and the like.
A process of sharing a multimedia identifier from a second application program included in a target terminal to a first application program included in the target terminal will be described with reference to fig. 3. As shown in fig. 3, it is assumed that the second application is a video application providing short video browsing, one or more videos are displayed in a video browsing page 301 of the video application, and each video is located in a display area including one or more options, for example, a video 3011 and a video 3012 in the video browsing page 301, a video 3011 is located in a display area including a praise option 30111, a comment option 30112, a share option 30113, and so on. When the sharing option 30113 is selected, a sharing object identifier window 302 is displayed, where the sharing object identifier window 302 includes one or more sharing object identifiers, and the one or more sharing object identifiers include a sharing object identifier 3021 of the first application program, where the sharing object identifiers include but are not limited to: an icon of an application, a name of an application, and so forth. If the sharing object identifier 3021 is selected, it indicates that the user wants to share the link of the video 3011 to the first application, and at this time, the audio recognition page 303 of the first application is triggered to be displayed. It is to be understood that the above-described shared object id window may be displayed on the video browsing page 301 of the second application program in a floating window manner as shown in fig. 3, or may be displayed in a display screen in a separate page, which is not limited in this embodiment of the present application.
It should be noted that, in addition to the above description, the manner in which the first application program obtains the multimedia identifier, and the manner in which the second application program shares the multimedia identifier with the first application program, the manner in which the first application program obtains the multimedia identifier may also include other manners. Optionally, the shared object identifier window may not include the shared object identifier of the first application program, but only include the shared object identifiers of some other application programs; in this case, the embodiment of the present application further supports sharing the multimedia identifier to any other application program, and forwarding the multimedia identifier to the first application program through the any other application program, so that the first application program obtains the multimedia identifier. Optionally, the audio recognition page of the first application further includes a recognition area (such as a recognition frame), and the user may copy the multimedia identifier in the second application and paste the copied multimedia identifier into the recognition area of the first application, so as to enable the first application to obtain the multimedia identifier; alternatively, the user directly inputs a multimedia identifier (e.g., a video link) in the identification area to enable the first application to obtain the multimedia identifier. Of course, the implementation manner of acquiring the multimedia identifier by the first application is not limited to the above-described implementation manners, and the above-described exemplary descriptions are given without limiting the embodiments of the present application, and are described herein.
In addition, after the multimedia identifier is shared to the first application program from the second application program, the page of the first application program is displayed in the display screen of the target terminal, and the second application program is operated in the background. Based on this, the embodiment of the application supports that after the first application program receives the multimedia identifier sent by the second application program, the user selects whether to leave the first application program or return the first application program to the second application program. In a specific implementation, after the first application program receives the multimedia identifier sent by the second application program, any page of the first application program is output in a display screen of the target terminal, and page jump prompting information is displayed in the any page, wherein the page jump prompting information is used for prompting a user to select to remain in the first application program or select to return to the second application program. When the user selects to remain in the first application program, switching from the service page to the audio identification page; when the user selects to return to the second application, the user jumps from any page of the first application to a service page of the second application, which may be a page of the second application that shares the multimedia identifier (e.g., the video browsing page 303 shown in fig. 3). In this case, when the user opens the first application program within a subsequent first time period, an audio recognition result page of the first application program may be displayed, in which the audio recognition result is displayed; otherwise, when the user opens the first application program within a subsequent second time period, an audio recognition page of the first application program may be displayed to prompt the user that audio recognition is being performed. Wherein the time length of the first time period is longer than the time length of the second time period, for example, the time length of the first time period is 5 minutes, and the time length of the second time period is 2 minutes; the first application has finished audio recognition during a first time period and is still performing audio recognition during a second time period.
An exemplary schematic diagram of page jump prompting information can be seen in fig. 4, as shown in fig. 4, when a sharing option is selected in a display area where any video of a second application program is located, a service page 401 of the first application program is jumped to display, and page jump prompting information 402 is displayed in the service page 401, where the page jump prompting information 402 includes a determination key (or a determination control, a determination button, etc.) 4021 and a return key 4022. If the user selects the determination key 4021, which indicates that the user selects to leave the first application program, the audio identification page 303 of the first application program is displayed; if the user selects the return key 4022, indicating that the user selects to return to the second application, the video browsing page 301 of the second application is displayed. Of course, as shown in fig. 4, the page jump hint information 402 may be displayed in an audio recognition page of the first application in addition to the service page of the first application; that is to say, when the first application program receives the multimedia identifier sent by the second application program, the first application program displays the audio identification page of the first application program, and displays the page jump prompting information in the audio identification page.
S202: and outputting audio recognition progress in the audio recognition page, wherein the audio recognition progress can be used for prompting progress information of the first application program for audio recognition according to the multimedia identifier, and the multimedia identifier is shared to the first application program by the second application program. The representation form of the audio recognition progress in the audio recognition page can comprise a text form. For example, an exemplary diagram of an audio recognition progress contained in text may be seen in fig. 5a, as shown in fig. 5a, an audio recognition progress 501 (e.g., the audio recognition progress is "recognition remaining time 8 s") is displayed in an audio recognition page 303 of the first application.
The embodiment of the application supports the output of the audio identification animation in the audio identification page besides the output of the audio identification progress in the audio identification page. In the specific implementation, the audio recognition animation is output in the audio recognition page, and the audio recognition animation is matched with the audio recognition progress. The matching of the audio recognition animation and the audio recognition progress can be as follows: the audio recognition animation follows the progress information of the audio recognition progress indication to perform animation display, for example: the audio recognition progress is expressed in a countdown mode, the total countdown length of the audio recognition is 10 seconds, when the countdown time is 7 seconds to 10 seconds, the audio recognition animation vibrates at a first frequency, when the countdown time is 4 seconds to 8 seconds, the audio recognition animation vibrates at a second frequency, and when the countdown time is 0 second to 3 seconds, the audio recognition animation vibrates at a third frequency; wherein the first frequency is less than the second frequency, and the second frequency is less than the third frequency; the mode that the audio recognition animation displays the animation along with the audio recognition progress enriches the interface display effect and can help a user to better know the progress of the audio recognition. The audio identification animation can be set by a manager or a user, and the setting mode of the audio identification animation is not limited in the embodiment of the application. In one implementation, an exemplary schematic diagram of outputting an audio recognition animation in an audio recognition page can be seen in fig. 5b, where as shown in fig. 5b, an audio recognition animation 502 is output in an audio recognition page 303 of a first application. In another implementation, an exemplary schematic diagram of outputting an audio recognition progress and an audio recognition animation in an audio recognition page may be seen in fig. 5c, where an audio recognition animation 503 and an audio recognition progress 504 are simultaneously displayed in an audio recognition page 303 of a first application, as shown in fig. 5 c. It should be noted that, the manner of displaying the audio recognition animation and the audio recognition progress in the audio recognition page in combination is various, and the embodiment of the present application is exemplarily described by the schematic diagram shown in fig. 5c, which does not limit the embodiment of the present application.
The embodiment of the application also supports displaying at least one identification type option in the audio identification page, wherein the identification type option comprises any one of the following options: humming recognition option, listening to songs recognition option, sharing recognition option, etc.; and when any identification type option is triggered, highlighting any selected identification type option in the audio identification page. The manner of highlighting herein may include, but is not limited to: displaying with font size information larger than other identification type options, displaying with brightness higher than other identification type options, displaying with transparency lower than other identification type options, and so on. For example, when the first application program receives the multimedia identifier sent by the second application program, the identification type of the current audio identification is the sharing identification, and then in the audio identification page of the first application program, the sharing identification option is highlighted in the audio identification page. As shown in fig. 6, a shared music recognition option 601 is displayed in the audio recognition page of the first application, and the shared music recognition option 601 is highlighted with a transparency lower than the transparency of the other recognition type options.
It should be noted that the above-described audio recognition progress, audio recognition animation and recognition type options may be separately displayed in the audio recognition page, or may be displayed in the audio recognition page in a combined manner, for example, the audio recognition progress and the audio recognition animation are displayed in the audio recognition page in a combined manner, or for example, the audio recognition progress, the audio recognition animation and the recognition type options are displayed in the audio recognition page in a combined manner, and so on. When the audio recognition progress, the audio recognition animation and the recognition type option are displayed in the audio recognition page in combination, the manner of displaying the audio recognition progress, the audio recognition animation and the recognition type option in the audio recognition page may include: the audio identification page is provided with an identification progress area, and the audio identification progress and the audio identification animation are displayed in the identification progress area of the audio identification page; and the audio identification page is provided with a type selection area, and at least one identification type option is displayed in the type selection area of the audio identification page. The distribution mode of the identification progress area and the type selection area in the audio identification page may include but is not limited to: the recognition progress area and the type selection area are distributed in the audio recognition page side by side along the horizontal direction; the recognition progress area and the type selection area are distributed in the audio recognition page in parallel along the vertical direction; the identification progress area and the type selection area are diagonally distributed in the audio identification page; or the identification progress area is distributed outwards in the audio identification page along the edge line of the type selection area; and so on. For an example, the identification progress area and the type selection area are distributed in the audio identification page in parallel along the vertical direction, please refer to fig. 7, as shown in fig. 7, a type selection area 701 and an identification progress area 702 are provided in the audio identification page 303, and the type selection area 701 and the identification progress area 702 are displayed in the audio identification page in parallel along the vertical direction.
S203: and displaying an audio recognition result page of the first application program.
The audio recognition result page of the first application program is used for displaying an audio recognition result; when the server (or called background server) identifies candidate audios associated with the multimedia identifier, the audio identification result contained in the audio identification result page comprises at least one candidate audio which is successfully identified; when the server does not recognize the candidate audio associated with the multimedia identifier, the audio recognition result contained in the audio recognition result page comprises audio recognition failure prompt information so as to prompt the user that the audio recognition fails. The following description will be made in more detail on the following boundary surface of different audio recognition results, in which:
(1) when the audio recognition fails, the audio recognition result comprises prompt information of the audio recognition failure; at the moment, displaying an audio recognition result page of the first application program, and displaying audio recognition failure prompt information in the audio recognition result page; when the audio recognition failure prompt message appears in the audio recognition result page for a target time period (e.g., 5 seconds, 6 seconds, etc.), the audio recognition result page of the first application program is switched to a service page of the first application program, where the service page may be any page of the first application program, and if the first application program is a QQ music application program, the service page of the second application program may be a main interface of the QQ music application program. Therefore, when the audio recognition fails, the user can directly jump to the main interface of the QQ music application program, so that the user can conveniently search and play songs.
When the audio recognition fails, the schematic flow chart of switching from the audio recognition result page of the first application program to the service page of the first application program to be displayed may be as shown in fig. 8 a; as shown in fig. 8a, when audio recognition fails, outputting an audio recognition result page 801 of the first application program, and displaying audio recognition failure prompt information 802 in the audio recognition result page 801; after the audio recognition failure prompt message 802 is displayed in the audio recognition result page 801 for a target time period (e.g., 5 seconds), a service page of the first application is displayed, where the service page may be a main interface of the first application, so as to facilitate a user to perform an operation of playing or searching for audio in the main interface.
(2) When the audio recognition fails, the audio recognition result comprises prompt information of the audio recognition failure; at this time, jumping from the audio recognition result page of the first application to a service page of the second application, where the service page may be any page of the second application, for example, the service page is a page of the second application where the user shares the multimedia identifier. As shown in fig. 8b, a schematic flow diagram of jumping from the audio recognition result page of the first application to the service page of the second application is shown, and as shown in fig. 8b, a user shares the multimedia identifier displayed on the video browsing page 301 of the second application with the first application; the first application program carries out audio recognition according to the multimedia identification and outputs an audio recognition result page 801 of the first application program, wherein the audio recognition result page 801 comprises audio recognition failure prompt information 802; when the audio recognition failure prompt message 802 stays in the audio recognition result page 801 for a target period of time (for example, 5 seconds), the process automatically jumps from the audio recognition result page 801 of the first application to the video browsing page 301 of the second application. Such a jump directly from the first application to the second application may assist the user in continuing the video browsing.
Of course, the embodiment of the present application also supports displaying the remaining time length at the display position where the audio recognition failure prompt message 802 is located, where the remaining time length may be used to remind the user of the remaining time length for jumping from the first application program to the second application program. And a cancel key is also displayed at the display position of the audio recognition failure prompt message 802, when the cancel key is triggered, the cancel key indicates that the user wants to stay in the first application program, at this time, the audio recognition failure prompt message 802 is closed, and the audio recognition result page of the first application program is still displayed, or the service page of the first application program is switched and displayed.
(3) When the audio recognition is successful, the audio recognition result comprises at least one candidate audio which is successfully recognized; at this time, the attribute information of the at least one candidate audio successfully identified is output in the audio identification result page of the first application. The attribute information of the candidate audio may include, but is not limited to: audio name, promotional image, publisher name, publication time, etc.; for example, if the candidate audio is a target song, the attribute information of the candidate audio may include a song name of the target song, a promotional image of the target song, a publisher name of the target song, and so on. An exemplary schematic diagram of displaying at least one candidate audio in the audio recognition result page may be seen in fig. 8c, as shown in fig. 8c, when the audio recognition is successful, an audio recognition result page 801 of the first application program is output, and attribute information of the at least one candidate audio is displayed in the audio recognition result page 801, for example, attribute information of the candidate audio 803 is displayed in the audio recognition result page 801. It can be understood that, when the number of candidate audios is greater than 1, in other words, when a plurality of candidate audios associated with the multimedia identifier are identified, the first application may rank and display the candidate audios in order from high to low according to the matching degree between the multimedia identifier obtained by the audio identification and the candidate audios, so as to help the user to see the candidate audio that is most matched with the multimedia identifier at a glance. Of course, the arrangement display manner of the candidate audios in the audio recognition result page may also include other implementation manners, for example, the candidate audios are randomly arranged and displayed in the audio recognition result page, which is not limited in the embodiment of the present application.
The embodiment of the application also supports triggering any candidate audio in the audio recognition result page so as to play the candidate audio. In a specific implementation, the audio recognition result page includes attribute information of at least one candidate audio, and when the attribute information of any candidate audio in the at least one candidate audio is selected, the selected candidate audio is played. Implementations of playing the selected candidate audio may include, but are not limited to: firstly, playing selected candidate audios in an audio recognition result page; as shown in fig. 9a, when attribute information of any one of the candidate audios in the audio recognition result page 801 (e.g., the play key 8031 of the candidate audio 803) is selected, the selected candidate audio is played in the audio recognition result page 801. Or, jumping to the audio playing page of the first application program from the audio recognition result page, and playing the selected candidate audio in the audio playing page. The audio playing page of the first application may be a detail page of the selected candidate audio, and detail information of the selected candidate audio is displayed in the detail page, where the detail information includes, but is not limited to, attribute information. As shown in fig. 9b, when attribute information of any one of the candidate audios in the audio recognition result page 801 (e.g., the play key 8031 of the candidate audio 803) is selected, the audio play page 901 displaying the first application program is switched from the audio recognition result page 801 of the first application program, and details of the candidate audio 803 are displayed in the audio play page 901, wherein the details of the candidate audio 803 include attribute information (e.g., an audio image, an audio name, etc.) of the candidate audio 803 and other information (e.g., comment information, etc. associated with the candidate audio 802).
In the embodiment of the application, when any user generates a requirement for identifying the song name of the background music associated with the short video in the process of watching the short video, the any user can share the short video to the application program with the identified song name for audio identification so as to obtain the background music associated with the short video; compared with the existing method that the song name identification can be realized only by skipping the application program, the method for sharing the short video can realize identification of the background music associated with the short video, is simple and convenient to operate, is not influenced by the external environment in the audio identification process, improves the audio identification efficiency and speed, and improves the accuracy of audio identification.
The above embodiments provide various interface diagrams experienced by the first application program when performing audio recognition, and a background technology implementation of executing an audio recognition scheme by a background server is provided below. The background server may be a server corresponding to the first application, for example, the first application is a QQ music application, and then the background server may be a QQ music server. The general flow of implementing an audio recognition scheme by a background server may include: as described above, the manner in which the multimedia identifier is obtained by the first application may include, but is not limited to: shared by the second application, entered by the user in the first application, etc.; the background server inquires matched multimedia information according to the multimedia identifier and performs audio recognition processing according to the multimedia information; and sending the audio recognition processing result to the first application program, wherein the audio recognition processing result comprises: any one or more of the audio recognition progress and the audio recognition result.
Taking the example that the second application shares the multimedia identifier with the first application, the following describes the audio recognition process performed by the background server in more detail. Referring to fig. 10, fig. 10 is a schematic flow chart illustrating another audio recognition method according to an exemplary embodiment of the present application; the process of the audio recognition scheme provided in the embodiment of the present application may be executed by the aforementioned background server corresponding to the target terminal, or executed by the background server corresponding to the first application running in the target terminal. As shown in fig. 10, the audio recognition scheme may include the following steps S1001-S1006.
S1001: the second application sends the multimedia identifier to the first application.
S1002: and the first application program receives the multimedia identifier sent by the second application program and outputs page jump prompt information.
The specific implementation process shown in steps S1001-S1002 can refer to the description related to the specific implementation process in the embodiment shown in fig. 2, and is not described herein again.
S1003: the first application program sends the multimedia identifier to the background server.
S1004: the background server receives the multimedia identifier sent by the first application program, inquires matched multimedia information according to the multimedia identifier, and performs audio identification processing according to the multimedia information.
In steps S1003 to S1004, after the backend server receives the multimedia identifier sent by the first application program, the backend server may query the multimedia information matched with the multimedia identifier according to the multimedia identifier. For example, if the multimedia identifier is a video link, the multimedia information matching the multimedia identifier may refer to a video stored in a storage space indicated by the video link, the attribute information of the multimedia information refers to video information of the video, and the video information may include, but is not limited to: video name, video data, music name of music with which the video is associated, music data, and so forth.
The implementation manner of the background server querying the matched multimedia information according to the multimedia identifier and performing audio recognition processing according to the multimedia information may include:
(1) the background server inquires multimedia information matched with the multimedia identification in a target database; if the multimedia information matched with the multimedia identification is inquired in the target database, audio recognition processing is carried out according to the multimedia information in the target database; and (3) if the multimedia information matched with the multimedia identification is not inquired in the target database, triggering to execute the step (2). Specifically, the multimedia identifier may include a link to the multimedia information, and the link may be a storage address of the multimedia information; if the multimedia information matched with the multimedia identifier is stored in the target database, the background server can inquire the multimedia information in the target database according to the storage address indicated by the multimedia identifier, and further the multimedia information matched with the multimedia identifier is successfully inquired. The target database may be a database corresponding to the backend server, that is, the backend server has operation permissions (such as permissions of adding and deleting, modifying, accessing, invoking, and the like to data in the database), and the database may include mass data (such as resource information, user information, and the like included in the first application program) related to the first application program. For example, the first application is a QQ music application, and the target database of the backend server corresponding to the first application is a QQ music database, where the QQ music database includes a large amount of music information, and the like.
(2) And if the multimedia information matched with the multimedia identifier is not inquired in the target database, the background server inquires the multimedia information matched with the multimedia identifier through the Internet and performs audio recognition processing according to the inquired multimedia information matched with the multimedia identifier. Specifically, the background server can inquire multimedia information matched with the multimedia identification in a network database through the internet; the network database may be a database of any device in the internet, and the backend server may only have access and download rights to the network database, but not have other operation rights (such as addition, deletion, modification, and the like to data in the database), for example, the backend server is a server corresponding to a QQ music application, and then the network database may be a database of a server corresponding to a trembler application. And if the background server does not inquire the multimedia information matched with the multimedia identification from the network database, the background server generates audio identification failure prompt information and returns the audio identification failure prompt information to the first application program so that the first application program outputs an audio identification result in an audio identification result page, wherein the audio identification result comprises the audio identification failure prompt information. If the background server inquires the multimedia information matched with the multimedia identifier from the network database, the background server downloads the multimedia information matched with the multimedia identifier from the network database, and the downloaded multimedia information can be played and forwarded. For example, if the multimedia information is a video accompanied by background music, the video data and the music data (or audio data) are downloaded to the background server at the same time in the process of downloading the multimedia information. Wherein, the network database is a database of any device in the Internet.
Based on the content described in the above steps (1) - (2), firstly, the multimedia information matched with the multimedia identifier is inquired in the target database according to the multimedia identifier, and when the multimedia information matched with the multimedia identifier cannot be inquired in the target database, the multimedia information matched with the multimedia identifier is inquired through the internet, so that the speed and efficiency of acquiring the multimedia information matched with the multimedia identifier are improved, and the security of the network environment is also ensured.
In addition, the embodiment of the application also supports audio identification processing according to the multimedia information to obtain candidate audio associated with the multimedia information and attribute information (such as audio names) of the candidate audio. The multimedia information is sourced from different sources (such as from a target database or a network database), but the background server performs audio recognition processing according to the multimedia information in a similar manner. In the following, taking the example that the multimedia information is from the target database, a more detailed description is given to the implementation manner of performing audio recognition processing according to the multimedia information, where: if the multimedia information matched with the multimedia identification is inquired in the target database, the process of carrying out audio identification processing according to the multimedia information comprises the following steps: judging whether the attribute information of the multimedia information is inquired in the target database; if yes, acquiring attribute information of the multimedia information from the target database; if not, audio data of the multimedia information is obtained from the target database, audio identification is carried out on the audio data, and at least one candidate audio and attribute information of the at least one candidate audio are obtained. Specifically, if the background server queries multimedia information matched with the multimedia identifier in the target database, it detects whether the target database stores the attribute information of the multimedia information, if the attribute information of the multimedia information exists, the attribute information of the multimedia information is directly acquired from the target database, and if the attribute information of the multimedia information does not exist, the process of performing audio identification according to the multimedia information is executed.
An exemplary implementation manner of performing audio recognition on audio data by a background server to obtain candidate audio and attribute information of the candidate audio may include: extracting voiceprint characteristics of the audio data; matching the voiceprint features in a voiceprint feature library, wherein the voiceprint feature library comprises a plurality of candidate voiceprint features; if the candidate voiceprint features matched with the voiceprint features are matched, determining the audio corresponding to the candidate voiceprint features as candidate audio, and determining the attribute information of the audio corresponding to the candidate voiceprint features as the attribute information of the candidate audio; otherwise, determining that the audio recognition fails, and generating prompt information of the audio recognition failure. That is to say, the background server can perform audio recognition on the audio data through an audio fingerprint technology, the audio fingerprint is obtained by comparing audio features extracted from audio with finger fingerprints, and the extracted audio features have the characteristics of uniqueness and concise information. When the voiceprint features based on the audio data are inquired in the voiceprint feature library, a plurality of candidate audios meeting the inquiry conditions can be obtained; and an alternative way for the audio to satisfy the query condition includes: the degree of match between the voiceprint features of the audio and the voiceprint features of the audio data is greater than or equal to a match threshold. Of course, the query conditions mentioned above are not limited in the embodiments of the present application, and are described here.
Through the process, the multimedia information matched with the multimedia identifier and the candidate audio associated with the multimedia information are identified by using two modes of database matching and voice identification, and the accuracy of audio identification can be improved.
S1005: the background server sends the audio recognition processing result to the first application program, and the audio recognition processing result comprises: any one or more of the audio recognition progress and the audio recognition result.
S1006: the first application program outputs the audio recognition processing result.
In steps S1005-S1006, when executing the related content described in step S1004, the backend server also sends the audio recognition processing result to the first application program in real time, so that the first application program outputs the audio recognition processing result in the audio recognition page. Optionally, the audio recognition processing result includes an audio recognition progress, and after the first application program receives the audio recognition progress sent by the background server, the first application program may output the audio recognition progress in the audio recognition page, where the audio recognition progress is used to prompt the first application program to perform audio recognition according to the multimedia identifier. One exemplary way of determining progress information of audio recognition may include: if the current audio identification is in a stage of being inquired in a target database, determining that the progress information of the current audio identification is 10%; if the current audio identification is in a stage of being inquired in a network database, determining that the progress information of the current audio identification is 50%; if the current audio identification is in the stage of carrying out audio identification on the audio data or acquiring multimedia from a target database, determining that the progress information of the current audio identification is 80%; and if the current audio identification is in the state of acquiring the candidate audio, determining that the progress information of the current audio identification is 100%. It should be noted that the above is only an exemplary implementation manner for determining the progress information of the audio recognition, and in a practical application scenario, there are some other determination manners, which are not elaborated herein. Optionally, the audio recognition processing result includes an audio recognition result, and after the background server finishes audio recognition, the audio recognition result may be sent to the first application program; and after receiving the audio recognition result sent by the background server, the first application program can display an audio recognition result page and output the audio recognition result in the audio recognition result page. As described above, the audio recognition node may include: at least one candidate audio or audio recognition failure prompt information.
In the embodiment of the application, the multimedia information matched with the multimedia identifier is firstly inquired in the target database according to the multimedia identifier, and when the multimedia information matched with the multimedia identifier cannot be inquired in the target database, the multimedia information matched with the multimedia identifier is inquired through the Internet, so that the speed and the efficiency of acquiring the multimedia information matched with the multimedia identifier are improved, and the safety of a network environment is also ensured. In addition, the multimedia information matched with the multimedia identifier and the candidate audio associated with the multimedia information are identified by using two modes of database matching and voice identification, so that the accuracy of audio identification can be improved.
Fig. 11 is a schematic structural diagram illustrating an audio recognition apparatus according to an exemplary embodiment of the present application; the audio recognition device is arranged in the target terminal. In some embodiments, the audio recognition device may be a client (e.g., a QQ music application, etc.) in the running target terminal; the specific implementation of the units included in the audio recognition apparatus can refer to the description related to the foregoing embodiments. Referring to fig. 11, the audio recognition apparatus according to the embodiment of the present invention includes the following units:
a display unit 1101 for displaying an audio recognition page of the first application;
the processing unit 1102 is configured to output an audio recognition progress in the audio recognition page, where the audio recognition progress is used to prompt the first application program to perform audio recognition according to a multimedia identifier, and the multimedia identifier is shared by the second application program to the first application program;
the processing unit 1102 is further configured to display an audio recognition result page of the first application.
In one embodiment, the processing unit 1102 is further configured to:
and outputting the audio identification animation in the audio identification page, wherein the audio identification animation is matched with the audio identification progress.
In one embodiment, the audio identification page is provided with an identification progress area; and displaying the audio identification progress and the audio identification animation in an identification progress area of the audio identification page.
In one embodiment, the processing unit 1102 is further configured to:
displaying at least one identification type option in an audio identification page; identifying type options includes any of: humming recognition option, listening to song recognition option, sharing recognition option; wherein the sharing identification option is highlighted on the audio identification page.
In one embodiment, the audio identification page is provided with a type selection area; when the processing unit 1102 displays at least one identification type option in the audio identification page, the processing unit is specifically configured to:
at least one recognition type option is displayed in a type selection area of the audio recognition page.
In one embodiment, the audio recognition result page is used for displaying an audio recognition result, and the audio recognition result comprises audio recognition failure prompt information; the processing unit 1102 is further configured to:
and displaying a service page of the first application program, wherein the service page is any page of the first application program.
In one embodiment, the audio recognition result page is used for displaying an audio recognition result, and the audio recognition result comprises audio recognition failure prompt information; the processing unit 1102 is further configured to:
and jumping to a service page of the second application program from the audio recognition result page, wherein the service page is any page of the second application program.
In one embodiment, the audio recognition result page is used for displaying an audio recognition result, wherein the audio recognition result comprises at least one candidate audio which is successfully recognized; the processing unit 1102 is further configured to:
and outputting attribute information of at least one candidate audio in the audio identification result page.
In one embodiment, the processing unit 1102 is further configured to:
when the attribute information of any one of the candidate audios is selected, the selected candidate audio is played.
In an embodiment, when playing the selected candidate audio, the processing unit 1102 is specifically configured to:
playing the selected candidate audio in the audio recognition result page; or jumping from the audio recognition result page to an audio playing page of the first application program, and playing the selected candidate audio in the audio playing page.
In one embodiment, the multimedia identifier is used to uniquely identify a piece of multimedia information; the multimedia information includes any one of video, short video, and audio; the multimedia identifier comprises a link to the multimedia information;
the link of the multimedia information is directly shared to the first application program by the second application program; or the link of the multimedia information is set in the picture, and the picture is shared to the first application program by the second application program.
According to an embodiment of the present application, the units in the audio recognition method shown in fig. 11 may be respectively or entirely combined into one or several other units to form the unit, or some unit(s) may be further split into multiple units with smaller functions to form the unit(s), which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the audio recognition method may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of multiple units. According to another embodiment of the present application, the audio recognition apparatus as shown in fig. 11 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 1 on a general-purpose computing device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like as well as a storage element, and the audio-based recognition method of the embodiment of the present application may be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.
In this embodiment, the processing unit 1102 may share the multimedia identifier of the multimedia information to be identified from the second application program to the first application program, so that the first application program performs audio identification according to the multimedia identifier to obtain an audio identification result. Compared with the existing method that two application programs need to frequently jump to perform audio recognition, the method for directly sharing the multimedia identifier can be used for recognizing the audio (such as background music) associated with the multimedia identifier, is simple and convenient to operate, is not influenced by the external environment in the audio recognition process, improves the audio recognition efficiency and speed, and improves the accuracy of the audio recognition.
Fig. 12 is a schematic structural diagram illustrating an audio recognition apparatus according to an exemplary embodiment of the present application; the audio recognition device is arranged in a background server. In some embodiments, the audio recognition device may be a client running in a backend server (e.g., a QQ music application, etc.); the specific implementation of the units included in the audio recognition apparatus can refer to the description related to the foregoing embodiments. Referring to fig. 12, the audio recognition apparatus according to the embodiment of the present invention includes the following units:
a receiving unit 1201, configured to receive a multimedia identifier sent by a first application;
the processing unit 1202 is configured to query the matched multimedia information according to the multimedia identifier, and perform audio recognition processing according to the multimedia information; and the number of the first and second groups,
the processing unit 1202 is further configured to send an audio recognition processing result to the first application program, where the audio recognition processing result includes: any one or more of the audio recognition progress and the audio recognition result.
In an embodiment, when querying the matching multimedia information according to the multimedia identifier and performing audio recognition processing according to the multimedia information, the processing unit 1202 is specifically configured to:
inquiring multimedia information matched with the multimedia identification in a target database;
if the multimedia information matched with the multimedia identification is inquired in the target database, audio recognition processing is carried out according to the multimedia information in the target database;
and if the multimedia information matched with the multimedia identifier is not inquired in the target database, inquiring the multimedia information matched with the multimedia identifier through the Internet, and performing audio recognition processing according to the inquired multimedia information matched with the multimedia identifier.
In an embodiment, when performing the audio recognition processing according to the multimedia information in the target database, the processing unit 1202 is specifically configured to:
judging whether the attribute information of the multimedia information is inquired in the target database;
if yes, acquiring attribute information of the multimedia information from the target database;
if not, audio data of the multimedia information is obtained from the target database, audio identification is carried out on the audio data, and at least one candidate audio and attribute information of the at least one candidate audio are obtained.
According to an embodiment of the present application, the units in the audio recognition method shown in fig. 12 may be respectively or entirely combined into one or several other units to form the unit, or some unit(s) may be further split into multiple units with smaller functions to form the unit(s), which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the audio recognition method may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of multiple units. According to another embodiment of the present application, the audio recognition apparatus as shown in fig. 12 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 10 on a general-purpose computing device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like as well as a storage element, and the audio-based recognition method of the embodiment of the present application may be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.
In the embodiment of the application, the multimedia information matched with the multimedia identifier is firstly inquired in the target database according to the multimedia identifier, and when the multimedia information matched with the multimedia identifier cannot be inquired in the target database, the multimedia information matched with the multimedia identifier is inquired through the Internet, so that the speed and the efficiency of acquiring the multimedia information matched with the multimedia identifier are improved, and the safety of a network environment is also ensured. In addition, the multimedia information matched with the multimedia identifier and the candidate audio associated with the multimedia information are identified by using two modes of database matching and voice identification, so that the accuracy of audio identification can be improved.
Fig. 13 shows a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application. Referring to fig. 13, the electronic device includes a storage device 1301 and a processor 1302, and in the embodiment of the present invention, the electronic device further includes a network interface 1303 and a user interface 1304. The electronic device may be a smart phone, a tablet computer, a smart wearable device, and the like, and can access the internet through the network interface 1303, communicate with a server and other electronic devices, and interact with data. The user interface 1304 may include a touch display screen or the like, and may be capable of receiving user operations, and various interfaces may be displayed to the user so as to receive user operations.
The storage 1301 may include a volatile memory (volatile memory), such as a random-access memory (RAM); the storage device 1301 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), or the like; the storage means 1301 may also comprise a combination of memories of the kind described above.
Processor 1302 may be a Central Processing Unit (CPU). The processor 1302 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or the like. The PLD may be a field-programmable gate array (FPGA), a General Array Logic (GAL), or the like.
The storage device 1301 of the embodiment of the present application stores a computer program, the processor 1302 calls the computer program in the storage device, and when the computer program is executed, the processor 1302 may be configured to implement the method described above, such as the embodiment corresponding to fig. 1.
In one embodiment, the electronic device may correspond to a target terminal; the storage device 1301 stores therein a computer program; the computer program is loaded and executed by the processor 1302 to implement the corresponding steps in the above-described embodiments of the audio recognition method; specifically, the processor 1302 is configured to perform the following steps:
displaying an audio identification page of a first application program;
outputting audio recognition progress in the audio recognition page, wherein the audio recognition progress is used for prompting progress information of the first application program for audio recognition according to the multimedia identifier, and the multimedia identifier is shared to the first application program by the second application program;
and displaying an audio recognition result page of the first application program.
In one embodiment, processor 1302 is further configured to perform the following steps:
and outputting the audio identification animation in the audio identification page, wherein the audio identification animation is matched with the audio identification progress.
In one embodiment, the audio identification page is provided with an identification progress area; and displaying the audio identification progress and the audio identification animation in an identification progress area of the audio identification page.
In one embodiment, processor 1302 is further configured to perform the following steps:
displaying at least one identification type option in an audio identification page; identifying type options includes any of: humming recognition option, listening to song recognition option, sharing recognition option; wherein the sharing identification option is highlighted on the audio identification page.
In one embodiment, the audio identification page is provided with a type selection area; the processor 1302 is specifically configured to execute the following steps when at least one identification type option is displayed in the audio identification page:
at least one recognition type option is displayed in a type selection area of the audio recognition page.
In one embodiment, the audio recognition result page is used for displaying an audio recognition result, and the audio recognition result comprises audio recognition failure prompt information; the processor 1302 is further configured to perform the following steps:
and displaying a service page of the first application program, wherein the service page is any page of the first application program.
In one embodiment, the audio recognition result page is used for displaying an audio recognition result, and the audio recognition result comprises audio recognition failure prompt information; the processor 1302 is further configured to perform the following steps:
and jumping to a service page of the second application program from the audio recognition result page, wherein the service page is any page of the second application program.
In one embodiment, the audio recognition result page is used for displaying an audio recognition result, wherein the audio recognition result comprises at least one candidate audio which is successfully recognized; the processor 1302 is further configured to perform the following steps:
and outputting attribute information of at least one candidate audio in the audio identification result page.
In one embodiment, processor 1302 is further configured to perform the following steps:
when the attribute information of any one of the candidate audios is selected, the selected candidate audio is played.
In one embodiment, the processor 1302 is specifically configured to perform the following steps when playing the selected candidate audio:
playing the selected candidate audio in the audio recognition result page; or jumping from the audio recognition result page to an audio playing page of the first application program, and playing the selected candidate audio in the audio playing page.
In one embodiment, the multimedia identifier is used to uniquely identify a piece of multimedia information; the multimedia information includes any one of video, short video, and audio; the multimedia identifier comprises a link to the multimedia information;
the link of the multimedia information is directly shared to the first application program by the second application program; or the link of the multimedia information is set in the picture, and the picture is shared to the first application program by the second application program.
In another embodiment, the electronic device may correspond to a backend server; the storage device 1801 stores a computer program; the computer program is loaded and executed by the processor 1302 to implement the corresponding steps in the above-described embodiments of the audio recognition method; specifically, the processor 1302 is configured to perform the following steps:
receiving a multimedia identifier sent by a first application program;
inquiring the matched multimedia information according to the multimedia identifier, and performing audio recognition processing according to the multimedia information; and the number of the first and second groups,
sending the audio recognition processing result to the first application program, wherein the audio recognition processing result comprises: any one or more of the audio recognition progress and the audio recognition result.
In an embodiment, when querying the matching multimedia information according to the multimedia identifier and performing audio recognition processing according to the multimedia information, the processor 1302 is specifically configured to perform the following steps:
inquiring multimedia information matched with the multimedia identification in a target database;
if the multimedia information matched with the multimedia identification is inquired in the target database, audio recognition processing is carried out according to the multimedia information in the target database;
and if the multimedia information matched with the multimedia identifier is not inquired in the target database, inquiring the multimedia information matched with the multimedia identifier through the Internet, and performing audio recognition processing according to the inquired multimedia information matched with the multimedia identifier.
In one embodiment, the processor 1302 is specifically configured to perform the following steps when performing the audio recognition processing according to the multimedia information in the target database:
judging whether the attribute information of the multimedia information is inquired in the target database;
if yes, acquiring attribute information of the multimedia information from the target database;
if not, audio data of the multimedia information is obtained from the target database, audio identification is carried out on the audio data, and at least one candidate audio and attribute information of the at least one candidate audio are obtained.
In the embodiment of the application, the multimedia information matched with the multimedia identifier is firstly inquired in the target database according to the multimedia identifier, and when the multimedia information matched with the multimedia identifier cannot be inquired in the target database, the multimedia information matched with the multimedia identifier is inquired through the Internet, so that the speed and the efficiency of acquiring the multimedia information matched with the multimedia identifier are improved, and the safety of a network environment is also ensured. In addition, the multimedia information matched with the multimedia identifier and the candidate audio associated with the multimedia information are identified by using two modes of database matching and voice identification, so that the accuracy of audio identification can be improved.
Embodiments of the present application also provide a computer-readable storage medium (Memory), which is a Memory device in an electronic device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include both a built-in storage medium in the electronic device and, of course, an extended storage medium supported by the electronic device. The computer readable storage medium provides a memory space that stores a processing system of the electronic device. Also stored in this memory space are computer programs (including program code) adapted to be loaded and executed by processor 1302. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer readable storage medium located remotely from the aforementioned processor.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device performs the audio recognition method provided in the above-mentioned various alternatives.
Those of ordinary skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (15)

1. An audio recognition method, comprising:
displaying an audio identification page of a first application program;
outputting an audio recognition progress in the audio recognition page, wherein the audio recognition progress is used for prompting progress information of the first application program for audio recognition according to a multimedia identifier, and the multimedia identifier is shared to the first application program by a second application program;
and displaying an audio recognition result page of the first application program.
2. The method of claim 1, wherein the method further comprises:
and outputting an audio identification animation in the audio identification page, wherein the audio identification animation is matched with the audio identification progress.
3. The method of claim 2, wherein the audio recognition page is provided with a recognition progress area; and the audio recognition progress and the audio recognition animation are displayed in a recognition progress area of the audio recognition page.
4. The method of claim 1, wherein the method further comprises:
displaying at least one identification type option in the audio identification page; the identification type option includes any one of: humming recognition option, listening to song recognition option, sharing recognition option; and the sharing identification option is displayed in the audio identification page.
5. The method of claim 4, wherein the audio recognition page is provided with a type selection area; the displaying at least one recognition type option in the audio recognition page includes:
displaying the at least one recognition type option in a type selection area of the audio recognition page.
6. The method of claim 1, wherein the audio recognition result page is used for displaying an audio recognition result, and the audio recognition result comprises audio recognition failure prompt information; the method further comprises the following steps:
and displaying a service page of the first application program, wherein the service page is any page of the first application program.
7. The method of claim 1, wherein the audio recognition result page is used for displaying an audio recognition result, and the audio recognition result comprises audio recognition failure prompt information; the method further comprises the following steps:
and jumping to a service page of the second application program from the audio recognition result page, wherein the service page is any page of the second application program.
8. The method of claim 1, wherein the audio recognition result page is used to display audio recognition results that include at least one candidate audio that was successfully recognized; the method further comprises the following steps:
and outputting the attribute information of the at least one candidate audio in the audio identification result page.
9. The method of claim 8, further comprising:
playing the selected candidate audio in the audio recognition result page; or jumping from the audio recognition result page to an audio playing page of the first application program, and playing the selected candidate audio in the audio playing page.
10. The method of any one of claims 1-9, wherein the multimedia identifier is used to uniquely identify a piece of multimedia information; the multimedia information comprises any one of video, short video and audio; the multimedia identity comprises a link to the multimedia information;
the link of the multimedia information is directly shared to the first application program by the second application program; or the link of the multimedia information is set in a picture, and the picture is shared to the first application program by the second application program.
11. An audio recognition method, comprising:
receiving a multimedia identifier sent by a first application program;
inquiring matched multimedia information according to the multimedia identifier, and performing audio recognition processing according to the multimedia information; and the number of the first and second groups,
sending an audio recognition processing result to the first application program, wherein the audio recognition processing result comprises: any one or more of the audio recognition progress and the audio recognition result.
12. The method of claim 11, wherein said querying the matching multimedia information according to the multimedia identifier and performing audio recognition processing according to the multimedia information comprises:
inquiring the multimedia information matched with the multimedia identification in a target database;
if the multimedia information matched with the multimedia identification is inquired in the target database, audio recognition processing is carried out according to the multimedia information in the target database;
and if the multimedia information matched with the multimedia identifier is not inquired in the target database, inquiring the multimedia information matched with the multimedia identifier through the Internet, and performing audio recognition processing according to the inquired multimedia information matched with the multimedia identifier.
13. The method of claim 12, wherein said performing an audio recognition process based on said multimedia information in said target database comprises:
judging whether the attribute information of the multimedia information is inquired in the target database;
if yes, acquiring attribute information of the multimedia information from the target database;
if not, audio data of the multimedia information is obtained from the target database, and audio identification is carried out on the audio data to obtain at least one candidate audio and attribute information of the at least one candidate audio.
14. An electronic device, comprising: a storage device and a processor;
the storage device stores a computer program therein;
a processor running a computer program stored in the storage means, implementing the audio recognition method according to any one of claims 1 to 10, and implementing the audio recognition method according to any one of claims 11 to 13.
15. A computer-readable storage medium, in which a computer program is stored which, when executed, carries out an audio recognition method as claimed in any one of claims 1 to 10 and an audio recognition method as claimed in any one of claims 11 to 13.
CN202110321717.6A 2021-03-25 2021-03-25 Audio identification method, device, equipment and medium Pending CN113707179A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110321717.6A CN113707179A (en) 2021-03-25 2021-03-25 Audio identification method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110321717.6A CN113707179A (en) 2021-03-25 2021-03-25 Audio identification method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN113707179A true CN113707179A (en) 2021-11-26

Family

ID=78647862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110321717.6A Pending CN113707179A (en) 2021-03-25 2021-03-25 Audio identification method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113707179A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023130812A1 (en) * 2022-01-04 2023-07-13 腾讯科技(深圳)有限公司 Multimedia processing method and apparatus, device, medium, and computer program product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023130812A1 (en) * 2022-01-04 2023-07-13 腾讯科技(深圳)有限公司 Multimedia processing method and apparatus, device, medium, and computer program product

Similar Documents

Publication Publication Date Title
CN107844586B (en) News recommendation method and device
US10643610B2 (en) Voice interaction based method and apparatus for generating multimedia playlist
CN106095453B (en) Information display method and device and electronic equipment
CN105488154A (en) Theme application recommendation method and device
CN107526761B (en) Methods, systems, and media for identifying and presenting multi-lingual media content items to a user
CN113225572B (en) Page element display method, device and system of live broadcasting room
US20170109339A1 (en) Application program activation method, user terminal, and server
CN108475260A (en) Method, system and the medium of the language identification of items of media content based on comment
CN107515870B (en) Searching method and device and searching device
CN110598098A (en) Information recommendation method and device and information recommendation device
CN112464031A (en) Interaction method, interaction device, electronic equipment and storage medium
KR20150106479A (en) Contents sharing service system, apparatus for contents sharing and contents sharing service providing method thereof
CN112532507B (en) Method and device for presenting an emoticon, and for transmitting an emoticon
WO2020227318A1 (en) Systems and methods for determining whether to modify content
CN109889921B (en) Audio and video creating and playing method and device with interaction function
US11698927B2 (en) Contextual digital media processing systems and methods
CN113707179A (en) Audio identification method, device, equipment and medium
CN107688587B (en) Media information display method and device
JP2020102201A (en) Image management method based on interaction between face image and messenger account, user terminal and computer device
CN112533032B (en) Video data processing method and device and storage medium
US10990456B2 (en) Methods and systems for facilitating application programming interface communications
US11249823B2 (en) Methods and systems for facilitating application programming interface communications
CN111866135A (en) Message display control method and device for electronic equipment, electronic equipment and readable medium
CN114501132B (en) Resource processing method and device, electronic equipment and storage medium
CN115361588B (en) Object display method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination