CN104991946B

CN104991946B - Information processing method, server and user equipment

Info

Publication number: CN104991946B
Application number: CN201510408843.XA
Authority: CN
Inventors: 徐培来; 孙艳庆; 汪俊杰
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2015-07-13
Filing date: 2015-07-13
Publication date: 2021-04-13
Anticipated expiration: 2035-07-13
Also published as: CN104991946A

Abstract

The embodiment of the application provides an information processing method, a server and user equipment, which are used for reducing time consumption in an audio recognition process and improving recognition efficiency. The method comprises the following steps: receiving an audio stream to be identified sent by user equipment; while receiving the audio stream to be identified, starting to perform first matching in a feature database based on a first part of audio to be identified received by a first moment so as to obtain a first matching result; wherein the first time is before the time when the audio stream to be identified is completely received.

Description

Information processing method, server and user equipment

Technical Field

The present invention relates to the field of electronic technologies, and in particular, to an information processing method, a server, and a user equipment.

Background

Previously, in order to search for a song, the user had to enter the correct song name or a piece of lyrics in the search engine. However, sometimes the user does not know the song name or lyrics. For example: the user hopes to download a song currently played to the mobile phone, but does not know the name of the song; or can hum only a certain segment of the song, etc.

With the development of audio identification technology, music player software can identify a song corresponding to audio according to recorded audio and provide the song name to a user. Specifically, in order to identify the audio, the music player software needs to record the audio for a certain time, such as 10 seconds, 12 seconds, etc., and then send the recorded audio to the server for identification. However, such an identification process is inefficient.

Disclosure of Invention

The embodiment of the application provides an information processing method, a server and user equipment, which are used for reducing time consumption in an audio recognition process and improving recognition efficiency.

In a first aspect, the present application provides an information processing method, including:

receiving an audio stream to be identified sent by user equipment;

while receiving the audio stream to be identified, starting to perform first matching in a feature database based on a first part of audio to be identified received by a first moment so as to obtain a first matching result; wherein the first time is before the time when the audio stream to be identified is completely received.

Optionally, while receiving the audio stream to be identified, the method further includes:

acquiring a second part of audio to be identified, which is received after the first moment and before a second moment, wherein the second moment is after the first moment;

and acquiring a second matching result based on the second part of the audio to be recognized and the first matching result.

Optionally, obtaining a second matching result based on the second part of the audio to be recognized and the first matching result, includes:

starting to perform second matching in the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized so as to obtain a third matching result;

matching in the first matching result based on the second part of the audio to be recognized so as to screen out a fourth matching result matched with the second part of the audio to be recognized from the first matching result;

and determining the matching result with the matching degree meeting the preset condition as the second matching result from the third matching result and the fourth matching result.

Optionally, starting to perform a second matching in the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized to obtain a third matching result, including:

judging whether the first matching is finished traversing the feature database at the second moment;

terminating the first match when the first match has not traversed the feature database;

performing the second matching from the first position of the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized to obtain the third matching result; wherein the first location is a location in the feature database at which the first matching at the second time is terminated.

and starting to match in the first matching result based on the second part of the audio to be recognized so as to obtain the second matching result.

Optionally, the second matching result includes a song name of a song corresponding to the audio stream to be identified, and an offset position of the second part of the audio to be identified in the song, and after obtaining the second matching result, the method further includes:

obtaining the song from a song database corresponding to the characteristic database based on the song name;

transmitting the offset location and the song to the user device to cause the user device to play the song from the offset location; or

Transmitting a remaining portion of the song after the offset location to the user device.

In a second aspect, the present application provides an information processing method, including:

recording an audio stream to be identified through an audio input device;

the method comprises the steps that when the audio stream to be identified is recorded, the audio stream to be identified is sent to a server, so that the server starts to carry out first matching in a feature database based on a first part of audio to be identified received till a first moment when receiving the audio stream to be identified, and a first matching result is obtained; wherein the first time is before the time of completing recording the audio stream to be identified.

In a third aspect, the present application provides a server, comprising:

the receiver is used for receiving the audio stream to be identified sent by the user equipment;

a processor, configured to start performing a first matching in a feature database based on a first portion of the to-be-identified audio that has been received by a first time while the receiver receives the to-be-identified audio stream, to obtain a first matching result; wherein the first time is before the time when the audio stream to be identified is completely received.

Optionally, while the receiver receives the audio stream to be identified, the processor is further configured to obtain a second portion of the audio stream to be identified, where the second portion of the audio stream is received after the first time and before a second time, where the second time is after the first time; and acquiring a second matching result based on the second part of the audio to be recognized and the first matching result.

Optionally, the processor is configured to start performing second matching in the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized, so as to obtain a third matching result; matching in the first matching result based on the second part of the audio to be recognized so as to screen out a fourth matching result matched with the second part of the audio to be recognized from the first matching result; and determining the matching result with the matching degree meeting the preset condition as the second matching result from the third matching result and the fourth matching result.

Optionally, the server is configured to determine whether the first matching is performed through the feature database at the second time; terminating the first match when the first match has not traversed the feature database; performing the second matching from the first position of the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized to obtain the third matching result; wherein the first location is a location in the feature database at which the first matching at the second time is terminated.

Optionally, the processor is configured to start matching in the first matching result based on the second part of the audio to be recognized to obtain the second matching result.

Optionally, the second matching result includes a song name of a song corresponding to the audio stream to be identified, and an offset position of the second part of the audio to be identified in the song, and the processor is further configured to obtain the song from a song database corresponding to the feature database based on the song name after obtaining the second matching result;

the server further comprises:

a transmitter for transmitting the offset location and the song to the user device to cause the user device to play the song from the offset location; or transmitting a remainder of the song after the offset location to the user device.

In a fourth aspect, the present application provides a user equipment, comprising:

the audio input device is used for recording the audio stream to be identified;

the transmitter is used for transmitting the audio stream to be identified to a server while recording the audio stream to be identified so that the server starts to perform first matching in a feature database based on a first part of audio to be identified received by the first moment to obtain a first matching result while receiving the audio stream to be identified; wherein the first time is before the time of completing recording the audio stream to be identified.

In a fifth aspect, the present application provides a server, comprising:

the receiving unit is used for receiving the audio stream to be identified sent by the user equipment;

the first matching unit is used for starting to perform first matching in the feature database based on a first part of audio to be identified received till a first moment while receiving the audio stream to be identified so as to obtain a first matching result; wherein the first time is before the time when the audio stream to be identified is completely received.

Optionally, the server further includes:

an obtaining unit, configured to obtain, while receiving the audio stream to be identified, a second portion of the audio to be identified that is received after the first time and before a second time, where the second time is after the first time;

and the second matching unit is used for acquiring a second matching result based on the second part of the audio to be recognized and the first matching result.

Optionally, the second matching unit is configured to start performing second matching in the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized, so as to obtain a third matching result; matching in the first matching result based on the second part of the audio to be recognized so as to screen out a fourth matching result matched with the second part of the audio to be recognized from the first matching result; and determining the matching result with the matching degree meeting the preset condition as the second matching result from the third matching result and the fourth matching result.

Optionally, the second matching unit is configured to determine whether the first matching is performed through the feature database at the second time; terminating the first match when the first match has not traversed the feature database; performing the second matching from the first position of the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized to obtain the third matching result; wherein the first location is a location in the feature database at which the first matching at the second time is terminated.

Optionally, the second matching unit is configured to start matching in the first matching result based on the second part of the audio to be recognized, so as to obtain the second matching result.

Optionally, the second matching result includes a song name of a song corresponding to the audio stream to be identified, and an offset position of the second part of the audio stream to be identified in the song, and the server further includes:

the obtaining unit is used for obtaining the song from a song database corresponding to the characteristic database based on the song name;

a first sending unit, configured to send the offset position and the song to the user equipment, so that the user equipment plays the song from the offset position; or transmitting a remainder of the song after the offset location to the user device.

In a sixth aspect, the present application provides a user equipment, comprising:

the recording unit is used for recording the audio stream to be identified through the audio input device;

the second sending unit is used for sending the audio stream to be identified to a server while recording the audio stream to be identified so that the server starts to perform first matching in a feature database based on a first part of audio to be identified received by the first moment to obtain a first matching result while receiving the audio stream to be identified; wherein the first time is before the time of completing recording the audio stream to be identified.

One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:

in the technical scheme of the application, receiving an audio stream to be identified, which is sent by user equipment; while receiving the audio stream to be identified, starting to perform first matching in a feature database based on a first part of audio to be identified received by a first moment so as to obtain a first matching result; wherein the first time is before the time when the audio stream to be identified is completely received. Therefore, the server in the embodiment of the present application starts to perform matching based on the received first part of the audio to be recognized while receiving the audio stream to be recognized sent by the user terminal, in other words, the server can perform matching while receiving the audio stream to be recognized. Because matching is not started after all audio streams to be recognized are received unlike the prior art, the technical scheme in the embodiment of the application reduces time consumption of an audio recognition process and improves recognition efficiency.

Drawings

Fig. 1 is a flowchart of an information processing method according to an embodiment of the present application;

fig. 2 is a flowchart of an information processing method according to a second embodiment of the present application;

fig. 3 is a schematic structural diagram of a server according to a third embodiment of the present application;

fig. 4 is a schematic structural diagram of a user equipment in a fourth embodiment of the present application;

fig. 5 is a schematic structural diagram of a server according to a fifth embodiment of the present application;

fig. 6 is a schematic structural diagram of a user equipment in a sixth embodiment of the present application.

Detailed Description

The embodiment of the application provides an information processing method, a server and user equipment, solves the technical problem that in the prior art, the server is low in audio recognition efficiency, is used for reducing time consumption of an audio recognition process and improving the recognition efficiency.

In order to solve the technical problems, the technical scheme provided by the application has the following general idea:

The technical solutions of the present invention are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present invention are described in detail in the technical solutions of the present application, and are not limited to the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The first embodiment is as follows:

an embodiment of the present application provides an information processing method, which is applied to a server, and as shown in fig. 1, includes the following steps:

s101: and receiving the audio stream to be identified sent by the user equipment.

S102: and starting to perform first matching in the feature database based on a first part of the audio to be identified received by the first time to obtain a first matching result while receiving the audio stream to be identified.

Specifically, in the embodiment of the present application, an audio stream to be identified is recorded and transmitted by a UE (User Equipment). The audio to be recognized may be an electronic music audio stream played by an audio playing device recorded by the UE, such as a computer or a mobile phone, or may be a humming audio stream recorded by the UE, and the application is not limited specifically.

And after establishing connection with the UE and determining that audio identification needs to be carried out on the UE, the server starts to receive the audio stream to be identified sent by the UE. In the embodiment of the application, the UE sends the audio stream to be identified to the server in the form of streaming media data packets.

In order to reduce the time consumption of the audio identification process, the server in the embodiment of the present application does not start to match the audio stream to be identified in the feature data packet after receiving all the audio streams to be identified, but starts to match according to the received audio while receiving the audio stream to be identified. That is, the server receives the audio stream to be recognized that is transmitted subsequently, and performs matching in the feature database based on the received audio.

Specifically, the server includes a feature database and a song database corresponding to the feature database. The characteristics of each song in the song database are stored in a characteristics database, and each characteristic in the characteristics database is a characteristic of a song in the song database.

In S102, the first part of the audio to be recognized is the audio that has been received from the start of receiving the audio to be recognized to the first time T1. The server receives the audio stream to be recognized after T1, and triggers a matching algorithm to match the first part of the audio to be recognized in the feature database. The matching algorithm is not specifically limited in this application, and examples thereof include a Dynamic programming algorithm, a DTW (Dynamic Time Warping) algorithm, and the like.

Wherein T1 is after time T0 when the reception of the audio stream to be recognized is started and before time Tn when the reception of the audio stream to be recognized is completed. The duration or data size of the first portion of audio to be recognized is related to T1. And the duration of the first part of the audio to be recognized represents the time taken for completely playing the first part of the audio to be recognized. T1 may be any time from T0 to Tn, such as 0.5 seconds, 1 second, etc. after T0, and the application is not particularly limited.

Optionally, in order to ensure the matching accuracy, a first preset duration may be set, and when the duration of the first part of the audio to be recognized reaches the first preset duration, the matching algorithm is triggered to start the first matching. Thus, T1 is specifically the time when the duration of the first portion of audio reaches the first preset duration. The preset time period is, for example, 3 seconds, 4 seconds, 5 seconds, or the like. Similarly, a preset data amount may be set, and when the data amount of the first part of the audio to be recognized reaches the preset data amount at T1, the first matching is triggered.

In the first matching, specifically, it is first necessary to extract the features of the first part of the audio to be identified, such as the frequency spectrum, the peak sequence, and the phase channel. Then, the characteristics of the first part of audio to be recognized are compared with the characteristics in the characteristic database one by one. And taking the features, of which the feature matching degree with the first part of audio to be recognized is greater than a first threshold value, in the feature database as matching results. The first threshold is, for example, 60%, 70%, etc. And when the first matching is completed or terminated, taking the feature which meets the condition that the matching degree and the feature of the first part of audio to be recognized are larger than a first threshold value as a first matching result.

Optionally, the embodiment of the present application further includes:

acquiring a second part of the audio to be identified, which is received after the first time and before a second time, wherein the second time is after the first time and before the time of completing receiving the audio stream to be identified;

While receiving the audio stream to be identified, or while performing the first matching, since the server continues to receive the audio stream to be identified sent by the UE, the server may also receive and acquire the audio stream to be identified after T1 (excluding T1), before Tn (including Tn). And using the audio stream to be recognized after (not including T1) and before (including Tn) the Tn and the first matching result to obtain a final matching result corresponding to the whole audio stream to be recognized. For convenience of explanation, how the server obtains the final matching result is described below by taking an example of obtaining the second matching result based on the second part of the audio to be recognized and the first matching result.

Specifically, the second part of the audio to be recognized is the audio to be recognized received after T1 (excluding T1), before T2 (including T2). T2 is the time after T1, before Tn. In the present embodiment, T2 has two possibilities.

First, the server continues to receive the audio to be recognized transmitted by the UE in the form of streaming media packets after the first matching is started, and T2 may be the time when the first packet is received after T1. In other words, after the first matching is started, the server acquires the next matching result based on the latest received packet and the matching result of the previous matching every time the server receives one packet.

Secondly, in order to make the matching accuracy higher, the server may set a second preset duration, such as 1 second, 2 seconds, etc., and when the duration of the second part of the audio to be recognized received after T1 (excluding T1) reaches the second preset duration, the server obtains a second matching result based on the second part of the audio to be recognized and the first matching result. That is, the time when the second part of the audio to be recognized reaches the second preset time period is T2. In other words, after the first matching is started, the server acquires the next matching result based on the newly received audio to be recognized and the matching result of the previous matching every time the server receives the audio to be recognized, the time of which reaches the second preset time.

From the above description, the server receives and matches while obtaining a more accurate matching result based on the subsequently received audio to be recognized and the previous matching result, so that not only is the time consumed in the matching process reduced, but also the accuracy of the matching result is improved.

In the following, how to obtain the second matching result based on the second part of the audio to be recognized and the first matching result will be described in detail. In the embodiment of the present application, there are the following two methods for obtaining the second matching result.

The method comprises the following steps:

obtaining a second matching result based on the second part of the audio to be recognized and the first matching result, wherein the second matching result comprises:

Specifically, on one hand, since the duration or the data amount of the two parts of the first part of the audio to be recognized and the second part of the audio to be recognized is higher than that of the first part of the audio to be recognized, the accuracy of the obtained third matching result is higher than that of the first matching result when the first part of the audio to be recognized and the second part of the audio to be recognized are matched in the feature database.

On the other hand, the features in the first matching result are obtained from the feature database based on the matching of the first part of the audio to be recognized, but the features are not necessarily matched with the second part of the audio to be recognized. And only the features that match the first part of the audio to be identified and that do not match the second part of the audio to be identified are not the final matching results corresponding to the audio stream to be identified. Therefore, in the embodiment of the present application, the server further screens the first matching result by using the second part of the audio to be recognized, so as to screen out a fourth matching result that is matched with both the first part of the audio to be recognized and the second part of the audio to be recognized.

In a specific implementation process, the second matching can be performed to obtain a third matching result, and then a fourth matching result is obtained in the first matching result; or a fourth matching result can be obtained by first matching in the first matching result, and then a third matching result can be obtained by second matching. Of course, in order to further reduce the time consumption of audio recognition, the second matching and the screening of the first matching result may be performed simultaneously. Those of ordinary skill in the art to which the present application pertains may set the setting according to the practice, and the present application is not particularly limited.

How to obtain the third matching result is explained below. In this embodiment of the present application, starting to perform a second matching in the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized, so as to obtain a third matching result, including:

Since the first match may or may not have already traversed the feature database at the beginning of the second match, it is first determined whether the first match has traversed the feature database at T2.

Specifically, in the embodiment of the present application, if the first matching traverses the feature database, the first matching will automatically terminate after the traversal. Since the first match has already traversed the feature database, the second match will restart from the start of the feature database, i.e. from the first feature. If the first match does not traverse the feature database, then the second match will continue with the first match down.

Specifically, when the first match does not traverse the feature database, to improve the matching accuracy, the first match is terminated while a second match is initiated from the first location based on the first portion of audio to be recognized and the second portion of audio to be recognized. The second matching and the screening of the first matching result are similar to the first matching, and the first matching is described in the foregoing, so that the details are not repeated here.

The first location is the location in the feature database at which the first match terminates. E.g. feature database, has a total of 10⁶When the first match terminates, the 1 st to 30184 features have been compared, and the first position is the 30184 feature. And a second match is made starting with 30185 features.

In this embodiment of the present application, before performing the second matching, the server needs to combine the first part of the audio to be recognized and the second part of the audio to be recognized in the audio stream to be recognized, so that the two parts of the audio to be recognized are integrated and matched in the feature database. For example, the first part of the audio to be recognized is the audio of 0 to 3 seconds in the audio to be recognized, and the second part of the audio is the audio of 3 to 4 seconds, so that the server needs to combine the two parts of the audio into the audio of 0 to 4 seconds according to time, and then perform second matching in the feature database based on the audio of 0 to 4 seconds.

The above process is exemplified. The first match at T2 has compared the 1 st-30184 features in the feature database, and the first match results are obtained assuming the features of song a, the features of song B, and the features of song C. And the server terminates the first matching, combines the first part of audio to be recognized and the second part of audio to be recognized, starts the second matching and starts the second matching from the 30185 th feature. At the same time, the audio to be identified is matched among the characteristics of song a, the characteristics of song B, and the characteristics of song C based on the second portion. Assuming that the first matching result with the second part of the audio to be identified with the matching degree higher than the second threshold is used as the fourth matching result, only the feature of song a is matched with the feature of the second part of the audio to be identified with the matching degree higher than the second threshold, so that the fourth matching result is the feature of song a. The third matching is terminated at time T3, and the second matching results in a third matching result of the characteristics of song D and the characteristics of song E. The second threshold, for example, is 70%, 50%, may be the same as or different from the first threshold, and is not particularly limited.

And finally, determining a matching result with the matching degree meeting the preset condition from the third matching result and the fourth matching result as a second matching result. Specifically, the preset condition may be that the matching degree is the highest, and the server selects the matching result with the highest matching degree from the third matching result and the fourth matching result as the second matching result. Or, if the preset condition is that the matching degree is higher than 80%, the server takes the matching result with the matching degree higher than 80% as the second matching result. Those of ordinary skill in the art to which the present application pertains may set the setting according to the practice, and the present application is not particularly limited.

Similarly, the server may further obtain a third part of the audio to be recognized after T2 (excluding T2) and before T3 (including T3) at a third time. And then, carrying out third matching by combining the first part of audio to be recognized, the second part of audio to be recognized and the third part of audio to be recognized from the position where the second matching result is terminated. And meanwhile, screening the second matching result based on the third part of audio to be recognized, and finally determining the matching result with the matching degree meeting the preset condition from the matching result of the third matching and the screening result of the second matching result. Still further, the server may further obtain a fourth part of the audio to be recognized after T3 (excluding T3) and before T4 (including T4), and then perform similar implementations until the audio stream to be recognized is received completely, so as to obtain a final matching result.

As can be seen from the above description, in the embodiment of the present application, after the second part of the audio to be recognized is obtained, the second matching is started in the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized, so as to obtain a third matching result. In addition, based on the second part of audio to be recognized, a fourth matching result is screened out from the first matching result, and finally, the matching result with the matching degree meeting the preset condition is determined to be the second matching result from the third matching result and the fourth matching result. Therefore, on one hand, matching is started while the audio stream to be identified is received, and the time consumption of audio identification is reduced; on the other hand, the first part of audio to be recognized and the second part of audio to be recognized are combined to be matched in the feature database, and the second part of audio is utilized to screen the first matching result, so that the recognition efficiency is improved, and the recognition accuracy is also improved.

The second method comprises the following steps:

Specifically, in the second method, in order to further reduce the time consumption of audio identification, after receiving the second part of the audio to be identified, the server searches the feature database based on the first part of the audio to be identified. Further, in order to provide the accuracy of the matching result, the first matching result is matched based on the second part of the audio to be recognized, and the matching result which is in accordance with the second part of the audio to be recognized in the first matching result is taken as the second matching result.

Further, after the third part of the audio to be recognized is received at T3, the second matching result is filtered by using the third part of the audio to be recognized. Similarly, after the fourth part of the audio to be recognized is received at a fourth time T4 after T3, further filtering is performed based on the fourth part of the audio to be recognized until the audio stream to be recognized is received, and a final matching result is obtained.

As can be seen from the above description, the first matching is performed while receiving the audio stream to be identified sent by the UE, thereby reducing the time consumption of audio identification. Further, compared with the prior art based on the matching of all the audio streams to be recognized, the data volume of the first part of audio streams to be recognized is smaller than that of all the audio streams to be recognized, so when the matching is performed in the feature database based on the first part of audio streams to be recognized, the matching speed is higher than that of the prior art, and therefore the time consumption is further reduced. Meanwhile, the server can screen the matching result based on the subsequent audio part to be identified, and if the second audio part to be identified screens the first matching result, the accuracy of the final matching result is ensured to be reliable.

Optionally, after the audio stream to be recognized is received and the final matching result is obtained, the UE needs to be notified of the song corresponding to the audio stream to be recognized. In a specific implementation process, the final matching result is a matching result obtained after the method one or the method two is executed for multiple times, and in order to facilitate receiving a song that informs the UE of the corresponding audio to be recognized, it is assumed that T2 is a time when the audio stream to be recognized is received, and the second matching result is the final matching result.

In this embodiment of the present application, the second matching result includes a song name of a song corresponding to the audio stream to be identified and an offset position of the second part of the audio stream to be identified in the song, and after the second matching result is obtained, the method further includes:

And the server searches in the characteristic database according to the song name so as to obtain the song corresponding to the audio stream to be identified. Further, the server sends the offset location and the song to the UE. In particular, the offset position is in particular the position in the song of the last information of the second part of the audio stream to be identified. For example, assuming that the second part of the audio to be recognized is specifically the audio of 1 minute 30 seconds to 1 minute 31 seconds of song a hummed by the user, the last information of the second part of the audio to be recognized is the song information of 1 minute 31 seconds in song a, and the offset position of the second part of the audio to be recognized is 1 minute 31 seconds.

Furthermore, the UE receives the song and the offset position, determines the offset position to be a playing position based on the offset position, and plays the song from the offset position.

Alternatively, the server may directly transmit the remaining part of the song after the offset position to the UE, and the UE starts playing after receiving the remaining part, so that the UE starts the song from the offset position.

In a specific implementation process, the server sends the song or the remaining part of the song to the UE, may specifically send the streaming media data of the song or the remaining part of the song to the UE on line, or send the file of the song or the remaining part of the song to the UE off line. Those of ordinary skill in the art to which the present application pertains may set the setting according to the practice, and the present application is not particularly limited.

As can be seen from the above description, the UE is caused to play the song from the offset position by transmitting the offset position and the song to the UE, or transmitting the remaining part of the song after the offset position to the UE. Therefore, the song played by the UE is seamlessly connected with the audio to be identified recorded by the UE, and the audio to be identified recorded by the UE continues to be played, so that the user experience is improved.

Example two:

based on the same inventive concept as the information processing method of the first embodiment, the second embodiment of the present application further provides an information processing method applied to the UE. As shown in fig. 2, includes:

s201: and recording the audio stream to be identified through the audio input device.

S202: and sending the audio stream to be identified to a server while recording the audio stream to be identified.

Specifically, the UE enters the audio stream to be recognized through an audio input device, such as a handset receiver, an earphone receiver, or an external microphone. The audio to be recognized may be an electronic music audio stream played by an audio playing device recorded by the UE, such as a computer or a mobile phone, or may be a humming audio stream recorded by the UE, which is not limited in this application.

The UE records the audio stream to be identified and sends the recorded audio stream to be identified to the server through the streaming media data packet. Specifically, the UE records the audio to be identified, and at the same time, packages the recorded and unsent audio stream data to be identified. And after a data packet is recorded and packaged, sending the data packet to a server.

And the server performs first matching based on the first part of the audio to be recognized which has been received by the position of the cut-off T1 while receiving the audio. For how the server performs matching and matches a complete audio stream to be recognized to determine a final matching result, please refer to the description in the first embodiment, which is not repeated herein.

As can be seen from the above description, in the embodiment of the present application, an audio stream to be identified is recorded through an audio input device; the method comprises the steps that when the audio stream to be identified is recorded, the audio stream to be identified is sent to a server, so that the server starts to carry out first matching in a feature database based on a first part of audio to be identified received till a first moment when receiving the audio stream to be identified, and a first matching result is obtained; wherein the first time is before the time of completing recording the audio stream to be identified. Therefore, the UE sends the audio stream to be identified to the server while recording the audio stream to be identified, instead of sending the audio stream to be identified after the whole audio stream to be identified is recorded in the prior art, so that the server can start to identify in advance compared with the prior art, and time consumption from the beginning of recording the audio stream to be identified to the obtaining of the corresponding song is reduced.

Example three:

based on the same inventive concept as the information processing method of the first embodiment, a third embodiment of the present application further provides a server, as shown in fig. 3, including:

a receiver 301, configured to receive an audio stream to be identified sent by a user equipment;

a processor 302, configured to start performing a first matching in the feature database based on a first portion of the audio to be identified that has been received by a first time while the receiver 301 receives the audio stream to be identified, to obtain a first matching result; wherein the first time is before the time of completing receiving the audio stream to be identified.

Optionally, while the receiver 301 receives the audio stream to be identified, the processor 302 is further configured to obtain a second portion of the audio stream to be identified, where the second portion of the audio stream is received after the first time and before a second time, where the second time is after the first time; and acquiring a second matching result based on the second part of the audio to be recognized and the first matching result.

Specifically, the processor 302 is specifically configured to start performing second matching in the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized to obtain a third matching result in the process of obtaining the second matching result based on the second part of the audio to be recognized and the first matching result; matching in the first matching result based on the second part of the audio to be recognized so as to screen out a fourth matching result matched with the second part of the audio to be recognized from the first matching result; and determining the matching result with the matching degree meeting the preset condition as a second matching result from the third matching result and the fourth matching result.

In order to obtain a third matching result, the processor 302 is configured to determine whether the first matching has traversed the feature database at the second time; when the first matching is not finished traversing the feature database, terminating the first matching; performing second matching from the first position of the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized to obtain a third matching result; wherein the first location is a location in the feature database at the end of the first matching at the second time.

Alternatively, the processor 302 may be specifically configured to start matching in the first matching result based on the second part of the audio to be recognized, in a process of obtaining a second matching result based on the second part of the audio to be recognized and the first matching result, so as to obtain the second matching result.

The second matching result includes a song name of a song corresponding to the audio stream to be identified and an offset position of a second part of the audio stream to be identified in the song, and further, the processor 302 is further configured to obtain the song from a song database corresponding to the feature database based on the song name after obtaining the second matching result;

further, the server in the embodiment of the present application further includes:

a transmitter 303 for transmitting the offset position and the song to the user device to cause the user device to play the song from the offset position; or the remainder of the song after the offset position is transmitted to the user device.

Specifically, the processor 302 may be a general-purpose Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits for controlling program execution.

Further, the server may further include a memory, and the number of the memories may be one or more. The Memory may include a Read Only Memory (ROM), a Random Access Memory (RAM), and a disk Memory.

Various changes and specific examples in the information processing method in the foregoing embodiment in fig. 1 are also applicable to the server in the foregoing embodiment, and those skilled in the art can clearly know the implementation method of the server in the foregoing embodiment through the foregoing detailed description of the information processing method, so that details are not described here for the sake of brevity of the description.

Example four:

based on the same inventive concept as the information processing method in the second embodiment, a fourth embodiment of the present application further provides a UE, as shown in fig. 4, including:

an audio input device 401, configured to record an audio stream to be identified; (ii) a

A transmitter 402, configured to record an audio stream to be identified and transmit the audio stream to be identified to a server, so that the server starts to perform a first matching in a feature database based on a first part of audio to be identified that has been received by a first time while receiving the audio stream to be identified, to obtain a first matching result; wherein the first time is before the time of completing recording the audio stream to be identified.

The audio input device 401 may be a receiver, a microphone, or the like of the UE itself, or an external microphone, a microphone of an earphone, or the like accessed to the UE, and the application is not particularly limited.

Further, the UE further includes a processor 403, configured to control a recording process and a sending process, and pack the audio to be recognized, and the like. Specifically, the processor 403 may be a general-purpose Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits for controlling program execution.

Further, the UE may further include a memory, and the number of the memory may be one or more. The Memory may include a Read Only Memory (ROM), a Random Access Memory (RAM), and a disk Memory.

Various changes and specific examples in the information processing method in the foregoing embodiment in fig. 2 are also applicable to the UE in this embodiment, and those skilled in the art can clearly know the implementation method of the UE in this embodiment through the foregoing detailed description of the information processing method, so for brevity of the description, detailed descriptions are not repeated here.

Example five:

an embodiment of the present application provides a server, as shown in fig. 5, including:

a receiving unit 501, configured to receive an audio stream to be identified, where the audio stream is sent by a user equipment;

a first matching unit 502, configured to, while receiving the audio stream to be identified, start a first matching in a feature database based on a first portion of audio to be identified that has been received by a first time to obtain a first matching result; wherein the first time is before the time when the audio stream to be identified is completely received.

Further, the server in this embodiment of the application may further include:

an obtaining unit, configured to obtain, while receiving the audio stream to be recognized, a second part of speech to be recognized that is received after the first time and before a second time, where the second time is after the first time;

The second matching unit is used for starting to perform second matching in the feature database based on the first part of audio to be recognized and the second part of audio to be recognized so as to obtain a third matching result; matching in the first matching result based on the second part of the audio to be recognized so as to screen out a fourth matching result matched with the second part of the audio to be recognized from the first matching result; and determining the matching result with the matching degree meeting the preset condition as the second matching result from the third matching result and the fourth matching result.

Specifically, the second matching unit is configured to, when obtaining a third matching result, determine whether the first matching has been performed through the feature database at the second time; terminating the first match when the first match has not traversed the feature database; performing the second matching from the first position of the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized to obtain the third matching result; wherein the first location is a location in the feature database at which the first matching at the second time is terminated.

Or, the second matching unit is configured to start matching in the first matching result based on the second part of the audio to be recognized to obtain the second matching result.

Further, the server in this embodiment of the application may further include:

Wherein the second matching result comprises a song name of a song corresponding to the audio stream to be identified and an offset position of the second part of audio to be identified in the song,

the fifth embodiment is the same inventive concept as the first embodiment, and various variations and specific examples of the information processing method in the first embodiment are also applicable to the server in the present embodiment, so for brevity of the description, detailed descriptions are not provided herein.

Example six:

an embodiment of the present application provides a UE, as shown in fig. 6, including:

a recording unit 601, configured to record an audio stream to be identified through an audio input device;

a second sending unit 602, configured to send the audio stream to be identified to a server while recording the audio stream to be identified, so that the server starts to perform first matching in a feature database based on a first part of audio to be identified that has been received by the first time while receiving the audio stream to be identified, to obtain a first matching result; wherein the first time is before the time of completing recording the audio stream to be identified.

The sixth embodiment is the same inventive concept as the second embodiment, and various variations and specific examples of the information processing method in the first embodiment are also applicable to the UE in this embodiment, so for brevity of the description, detailed descriptions are omitted here.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Specifically, the computer program instructions corresponding to the two information processing methods in the embodiment of the present application may be stored on a storage medium such as an optical disc, a hard disc, or a usb disk, and when the computer program instructions corresponding to the first information processing method in the storage medium are read or executed by an electronic device, the method includes the following steps:

receiving an audio stream to be identified sent by user equipment;

Optionally, the storage medium further stores other computer instructions, and the computer instructions are executed, and when executed, the computer instructions include the following steps:

acquiring a second part of the audio to be identified received after the first time and before a second time while receiving the audio stream to be identified, wherein the second time is after the first time;

Optionally, the step of storing in the storage medium obtains a second matching result based on the second part of the audio to be recognized and the first matching result, and the corresponding computer instruction specifically includes the following steps in a specific executed process:

Optionally, the step of matching stored in the storage medium for the second time in the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized starts to obtain a third matching result, and the corresponding computer instruction specifically includes the following steps in the specific executed process:

Optionally, the second matching result includes a song name of a song corresponding to the audio stream to be identified, and an offset position of the second portion of audio to be identified in the song, and the storage medium further stores other computer instructions, which are related to the steps of: the second matching result is obtained and then executed, and the method comprises the following steps:

When computer program instructions in the storage medium corresponding to the second information processing method are read or executed by an electronic device, the method comprises the following steps:

recording an audio stream to be identified through an audio input device;

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An information processing method comprising:

receiving an audio stream to be identified sent by user equipment;

while receiving the audio stream to be identified, starting to perform first matching in a feature database based on a first part of audio to be identified received by a first moment so as to obtain a first matching result; wherein the first time is before the time of completing receiving the audio stream to be identified;

starting to perform second matching in the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized so as to obtain a third matching result; matching in the first matching result based on the second part of the audio to be recognized so as to screen out a fourth matching result matched with the second part of the audio to be recognized from the first matching result; and determining the matching result with the matching degree meeting the preset condition as a second matching result from the third matching result and the fourth matching result.

2. The method of claim 1, wherein starting a second match in the feature database based on the first portion of audio to be recognized and the second portion of audio to be recognized to obtain a third match result comprises:

3. The method of claim 2, wherein the second matching result comprises a song name of a song corresponding to the audio stream to be identified, and an offset position of the second portion of the audio to be identified in the song, and further comprising, after obtaining the second matching result:

4. An information processing method comprising:

recording an audio stream to be identified through an audio input device;

the method comprises the steps that when the audio stream to be identified is recorded, the audio stream to be identified is sent to a server, so that the server starts to carry out first matching in a feature database based on a first part of audio to be identified received till a first moment when receiving the audio stream to be identified, and a first matching result is obtained; the first moment is before the moment of recording the audio stream to be identified;

based on the second part of the audio to be recognized and the first matching result, obtaining a second matching result, wherein the second matching result comprises:

starting to perform second matching in the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized so as to obtain a third matching result; matching in the first matching result based on the second part of the audio to be recognized so as to screen out a fourth matching result matched with the second part of the audio to be recognized from the first matching result; and determining the matching result with the matching degree meeting the preset condition as the second matching result from the third matching result and the fourth matching result.

5. A server, comprising:

a processor, configured to start performing a first matching in a feature database based on a first portion of the to-be-identified audio that has been received by a first time while the receiver receives the to-be-identified audio stream, to obtain a first matching result; wherein the first time is before the time of completing receiving the audio stream to be identified;

the processor is further configured to obtain a second portion of the audio to be identified that is received after the first time and before a second time while the receiver receives the audio stream to be identified, wherein the second time is after the first time;

6. The server of claim 5, wherein the server is configured to determine whether the first match has traversed the feature database at the second time; terminating the first match when the first match has not traversed the feature database; performing the second matching from the first position of the feature database based on the first part of the audio to be recognized and the second part of the audio to be recognized to obtain the third matching result; wherein the first location is a location in the feature database at which the first matching at the second time is terminated.

7. The server according to claim 6, wherein the second matching result includes a song name of a song corresponding to the audio stream to be identified and an offset position of the second portion of the audio to be identified in the song, and the processor is further configured to obtain the song from a song database corresponding to the feature database based on the song name after obtaining the second matching result;

the server further comprises:

8. A user equipment, comprising:

the audio input device is used for recording the audio stream to be identified;

the transmitter is used for transmitting the audio stream to be identified to a server while recording the audio stream to be identified so that the server starts to perform first matching in a feature database based on a first part of audio to be identified received by the first moment to obtain a first matching result while receiving the audio stream to be identified; the first moment is before the moment of recording the audio stream to be identified; acquiring a second part of the audio to be identified received after the first time and before a second time while receiving the audio stream to be identified, wherein the second time is after the first time; based on the second part of the audio to be recognized and the first matching result, obtaining a second matching result, wherein the second matching result comprises:

9. A server, comprising:

the first matching unit is used for starting to perform first matching in the feature database based on a first part of audio to be identified received till a first moment while receiving the audio stream to be identified so as to obtain a first matching result; wherein the first time is before the time of completing receiving the audio stream to be identified;

a second matching unit, configured to obtain a second matching result based on the second part of the audio to be recognized and the first matching result, where the second matching unit includes:

10. A user equipment, comprising:

the second sending unit is used for sending the audio stream to be identified to a server while recording the audio stream to be identified so that the server starts to perform first matching in a feature database based on a first part of audio to be identified received by the first moment to obtain a first matching result while receiving the audio stream to be identified; the first moment is before the moment of recording the audio stream to be identified; acquiring a second part of the audio to be identified received after the first time and before a second time while receiving the audio stream to be identified, wherein the second time is after the first time; based on the second part of the audio to be recognized and the first matching result, obtaining a second matching result, wherein the second matching result comprises: