CN113241098B

CN113241098B - Target recommendation method based on audio recording

Info

Publication number: CN113241098B
Application number: CN202110612785.8A
Authority: CN
Inventors: 李照飞
Original assignee: Yeelion Online Network Technology Beijing Co Ltd
Current assignee: Yeelion Online Network Technology Beijing Co Ltd
Priority date: 2021-06-02
Filing date: 2021-06-02
Publication date: 2022-04-26
Anticipated expiration: 2041-06-02
Also published as: CN113241098A

Abstract

The invention relates to a target recommendation method based on audio recording, which comprises the steps of starting a recording module and recording current environment sound; transmitting the recorded first audio data to a remote end; while transmitting, recording is continuously carried out to obtain supplementary audio data; after the far end receives the first audio data, analyzing and matching operations can be carried out, and acquiescent or appointed target recommendation information is tried to be obtained; and after the remote end further receives the supplementary audio data, continuously performing analysis and matching operation to try to obtain default or specified target recommendation update information. The invention improves and optimizes the recording strategy, improves the efficiency of audio recording, provides a solution for the situation far away from the pronunciation source, is more convenient for users to use, improves the analysis efficiency to a certain extent, improves the analysis accuracy, enriches the information enrichment degree of target recommendation, meets the personalized requirements, and expands the application scene of the function of listening to songs and identifying songs.

Description

Target recommendation method based on audio recording

Technical Field

The invention relates to the technical field of acquisition, analysis (identification) and recommendation of data fragments, in particular to a target recommendation method based on audio recording. The data segment is particularly a data segment obtained based on audio recording.

Background

In some application scenarios, only some data segments can be obtained indirectly, and the complete data including the data segments cannot be obtained directly, for example:

sometimes we hear a very mood song, but do not know the name of the song, so the complete data of the song can not be directly obtained;

or, since we only remember the music melody but forget the title of the song for a long time, the complete data of the song cannot be directly obtained.

In this case, we have no policy even though our mind is hard to endure.

Based on the user requirements, the acquisition, analysis (identification) and recommendation technologies of the data segments are emphasized, researched and developed, and a song listening and song recognition function is loaded in a plurality of apps so as to meet the requirements of the user on quickly recognizing and searching songs.

The existing song listening and song recognition function needs to ensure the quiet environment as much as possible and is closer to a sound source as much as possible, and one song is recognized at a time, and the processing process is approximately as follows:

the audio to be identified is recorded,

the client extracts the audio features of the recording,

the client sends the audio features to the server,

the server receives the audio features, matches the audio features with the songs in the server, and returns the matched song information (including song names, song titles and the like) to the client.

The existing song listening and song recognition function depends on recording the audio frequency needing to be recognized completely for a long time, which causes that the function takes a long time; the existing song listening and song recognition functions are not suitable for being used when the song is far away from a pronunciation source, song information cannot be matched when the song is far away from the pronunciation source, or the matching error rate is very high.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a target recommendation method based on audio recording, which improves and optimizes a recording strategy, improves the audio recording efficiency, provides a solution for the situation far away from a pronunciation source, is more convenient for a user to use, improves the analysis efficiency to a certain extent, improves the analysis accuracy, enriches the information enrichment degree of target recommendation, meets the personalized requirements, and expands the application scene of the song listening and song recognition functions.

In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:

a target recommendation method based on audio recording is characterized in that,

starting a recording module to record the current environmental sound;

transmitting the recorded first audio data to one or more appointed remote ends;

recording the current environmental sound continuously while transmitting the first audio data, and recording to obtain at least one supplementary audio data in the continuous process;

the recorded supplementary audio data are sequentially transmitted according to the time sequence and transmitted to one or more specified remote ends which are the same as or different from the above-mentioned supplementary audio data;

after the far end receives the first audio data, analyzing and matching operations can be carried out, and acquiescent or appointed target recommendation information is tried to be obtained;

after the remote end further receives the supplementary audio data, the same or different analysis and matching operations are continuously carried out, and the default or designated target recommendation updating information is tried to be obtained.

On the basis of the technical scheme, the recording module is arranged locally on client equipment, and the client equipment refers to equipment currently used by a user;

or the recording module is arranged locally on remote equipment, and the remote equipment cannot be used by a user but can remotely request and control the recording module in the equipment through an instruction.

On the basis of the above technical solution, the recording of the current environmental sound refers to: collecting current environmental sound through a sound pickup device, wherein the current environmental sound is in an external playing state,

alternatively, the first and second electrodes may be,

the recording of the current environmental sound refers to: the current ambient sound is collected by the internal circuitry of the device, which is in an external or internal playback state.

On the basis of the technical scheme, a recording time parameter is set, the default value is 1-3 seconds, and the start-stop time of recording is controlled by the first audio data and the supplementary audio data according to the recording time;

the recording duration of the first audio data is the same as or different from the recording duration of the supplementary audio data, and when the recording duration of the supplementary audio data is different from the recording duration of the first audio data, the recording duration of the supplementary audio data is greater than the recording duration of the first audio data.

On the basis of the technical scheme, a recording time increment parameter is set, the default value is 1 second, the starting and stopping time of recording is controlled by the first audio data according to the recording time, and each section of the recording time of the supplementary audio data is increased progressively according to the recording time increment.

On the basis of the technical scheme, recording segmentation parameters are set, and the quantity of the complementary audio data obtained by recording is controlled by the recording segmentation.

On the basis of the technical scheme, in the recording process, the recorded data is cached in a buffer area, and the first audio data or the supplementary audio data is acquired in a mode of intercepting and transferring the data from the buffer area;

the first audio data or the supplementary audio data which are transferred and stored are transmitted, the data in the buffer area are not influenced, and the current environmental sound is recorded continuously while the transmission is realized.

On the basis of the technical scheme, when the far end receives the first audio data, the far end directly carries out analysis and matching operation on the first audio data,

when the far end receives the supplementary audio data after receiving the first audio data, combining the first audio data and the supplementary audio data to form second audio data, and analyzing and matching the second audio data;

when the far end only receives the supplementary audio data, the remote end directly analyzes and matches the supplementary audio data,

and when the far end receives the new supplementary audio data after receiving the supplementary audio data, combining the supplementary audio data and the new supplementary audio data to form third audio data, and analyzing and matching the third audio data.

On the basis of the technical scheme, each remote end is preset with at least two different analysis and matching algorithms,

the same remote end adopts the same or different analysis and matching algorithms aiming at different analysis objects;

different remote ends adopt different analysis and matching algorithms aiming at the same analysis object,

different remote ends adopt different analysis and matching algorithms aiming at different analysis objects.

On the basis of the technical scheme, at least one designated target recommendation information is preset in the analysis and matching algorithm, wherein the at least one target recommendation information is used for the song listening and song recognition function.

The target recommendation method based on audio recording has the following beneficial effects:

1. the recording strategy is improved and optimized, the audio recording efficiency is improved, the data fragment acquisition mode is more flexible, and the recording is continuous and uninterrupted when the data fragment is transmitted to the server;

2. the solution is provided aiming at the situation far away from the pronunciation source, so that the use by a user is more convenient, and the adverse effect of the far away from the pronunciation source on the functions of listening to songs and identifying songs is avoided as much as possible;

3. the efficiency and the data volume are both considered when the data fragments are transmitted to the server, the analysis efficiency is improved to a certain extent, and the analysis accuracy is improved;

4. after the data segments are analyzed, the returned information amount is increased and can be customized, the application scene of the song listening and song recognition function is expanded, the information enrichment degree of target recommendation is enriched, and the personalized requirements are met.

Drawings

The invention has the following drawings:

the drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

fig. 1 is a flowchart of a first embodiment of a target recommendation method based on audio recording according to the present invention.

FIG. 2 is a schematic diagram of an embodiment of a sound recording module.

Fig. 3 is a schematic diagram of an application scenario one.

Fig. 4 is a schematic diagram of the one case described in application scenario two.

Fig. 5 is a schematic diagram one of another case described in application scenario two.

Fig. 6 is a diagram two of another scenario as described in application scenario two.

Fig. 7 is a schematic diagram of the further case described in application scenario two.

Fig. 8 is a schematic diagram of application scenario three.

Fig. 9 is a schematic diagram of recording the current environmental sound.

Fig. 10 is a diagram illustrating control of start and stop times of recording.

Fig. 11 is a diagram illustrating control of start and stop times of recording.

FIG. 12 is a first diagram illustrating a parsing and matching operation.

FIG. 13 is a second diagram illustrating the parsing and matching operations.

FIG. 14 is a third schematic diagram illustrating a parsing and matching operation.

FIG. 15 is a fourth schematic diagram illustrating the parsing and matching operations.

FIG. 16 is a schematic diagram of a parsing and matching algorithm.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings. The detailed description, while indicating exemplary embodiments of the invention, is given by way of illustration only, in which various details of embodiments of the invention are included to assist understanding. Accordingly, it will be appreciated by those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

As shown in fig. 1, the target recommendation method based on audio recording according to the present invention includes the following steps:

starting a recording module to record the current environmental sound;

On the basis of the above technical solution, as shown in fig. 2, the recording module is locally disposed on a client device, where the client device refers to a device currently used by a user, and for example, the client device may be: a mobile phone, a tablet computer, an MP3, a desktop computer, a notebook computer, an intelligent sound box, etc. which are currently used by a user;

or, the recording module is locally disposed on a remote device, where the remote device is a device that cannot be used by a user but can remotely request and control the recording module in the device through an instruction, and the unavailable means cannot directly operate the device, for example: the device is not owned by the user and cannot be used, or the device is owned by the user and cannot be used because the user is currently located at a different location than the device, for example, the remote device may be: mobile phones, tablet computers, MP3, desktop computers, notebook computers, smart speakers, etc. owned by others, or the remote device may be: the mobile phone, the tablet computer, the MP3, the desktop, the notebook computer, the smart sound box and the like which are owned by the user but are different from the user at present.

The method of the invention divides the recording module into local device arranged at the client side and local device arranged at the remote device, and is based on the consideration that the recording module is suitable for the following application scenes:

in an application scenario one, as shown in fig. 3, a client device owned by a user and being used by the user is provided with a recording module, that is, the recording module is located locally on the client device, and when the user hears some sounds (e.g., songs, etc.), the user is interested in the some sounds and can approach the sound source of the some sounds within a certain distance, then the current environmental sound can be recorded through the recording module locally on the client device;

the application scenario two is different from the application scenario one in that a user cannot approach the pronunciation source of some sounds within a certain distance, and the user cannot record the current environmental sound through the local recording module of the client device because the recording effect is poor;

one situation is: as shown in fig. 4, a user may attempt to record the current environmental sound through a recording module local to the client device, but the recording effect may be poor due to the inability to change the distance from the client device to the pronunciation source (which may be difficult to change or may be idle to attempt to change due to laziness, etc.);

the other situation is that: the user may attempt to record the current ambient sound through a recording module local to the remote device, which is a device closer to the source of the sound, such as: as shown in fig. 5, in a concert, a user a is in an area far away from the stage, and a user C is in an area near the stage, which is a situation where there is a certain difficulty that cannot change the distance from the client device to the sound source, the user a and the user C are friends, the user C has authorized that the user a can remotely request and control the recording module in the device C1 currently used by the user C through an instruction, the device C1 of the user C is a remote device, and the user a can attempt to record the current environmental sound through the recording module local to the remote device; another example is: as shown in fig. 6, in a unit, a user a listens to songs by a co-worker user C out of several workstations, and does not want to leave a seat to go to the workstation of the co-worker C, which is a situation that changes are attempted due to laziness and the like, the user a and the user C are co-workers, the user C authorizes the user a to remotely request and control a recording module in a device C1 currently used by the user C through an instruction, the device C1 of the user C is a remote device, and the user a can attempt to record current environmental sounds through the recording module local to the remote device;

yet another situation is: as shown in fig. 7, a user may attempt to record current environmental sound through a recording module local to the remote device, where the remote device is a device where a sound source is located, for example, the sound source is an intelligent sound box located at a certain height, the user a may remotely request and control the intelligent sound box by an instruction, the intelligent sound box is provided with a recording module, the intelligent sound box located at the height is the remote device, and the user a may attempt to record current environmental sound through the recording module local to the remote device;

application scenario three differs from application scenario one in that the user does not hear certain sounds (e.g., songs, etc.) directly, but finds someone or some device in a state of playing sounds (e.g., songs, etc.), for example: as shown in fig. 8, user a and user C are friends, two people, and user a finds that user C is showing in his current state that he is listening to a song, i.e., a situation where some sound (e.g., a song, etc.) is not directly heard; another example is: the user a and the user C are colleagues, the user a finds that the user C is using the earphone, and the current state of the user C shows that the user C is listening to songs (listening to songs in the internal release state), which is a situation where some sounds (e.g. songs, etc.) are not directly heard; user C has authorized user A to remotely request and control the recording module in device C1 that user C is currently using by command, and device C1 of user C is the remote device, and user A may attempt to record the current ambient sound through the recording module local to the remote device.

On the basis of the above technical solution, as shown in fig. 9, the recording of the current environmental sound refers to: the current environmental sound is collected through the sound pick-up equipment, the current environmental sound is in an external playing state (the sound is diffused into the air and can be directly received by human ears),

alternatively, the first and second electrodes may be,

the recording of the current environmental sound refers to: the current environmental sound is collected through the internal circuit of the device and is in an external release state or an internal release state (the sound is not directly diffused to the air and cannot be directly received by human ears).

The recording module and the sound pickup equipment are matched and can be matched with each other to complete audio recording,

the recording module and the internal circuit are matched, and the recording module and the internal circuit can be matched with each other to complete audio recording.

Generally, when a mobile phone, a tablet computer, an MP3, a desktop, a notebook computer, an intelligent sound box, etc. are designed and manufactured, the recording module and the sound pickup device are already provided in a matching manner, or the recording module and the internal circuit are already provided in a matching manner, and the recording module and the sound pickup device, and the recording module and the internal circuit can be implemented by adopting the prior art, and the present invention is not described in detail.

As described above, in the embodiment example, in the first application scenario, the recording module and the sound pickup apparatus are suitable for use, in the second application scenario, the recording module and the sound pickup apparatus are suitable for use, and in the third application scenario, the recording module and the internal circuit are suitable for use.

On the basis of the above technical solution, as shown in fig. 10, a recording duration parameter is set, a default value is 1-3 seconds, and the start-stop time of recording is controlled by the first audio data and the supplemental audio data according to the recording duration;

For example: based on the fact that the value range of the recording time length parameter is 1-3 seconds, the recording time lengths of the first audio data and the supplementary audio data are both 2 seconds, for example: based on the fact that the value range of the recording time length parameter is 1-4 seconds, the recording time lengths of the first audio data and the supplementary audio data are respectively 1 second and 4 seconds.

As an alternative embodiment, as shown in fig. 11, a recording duration increment parameter is set, the default value is 1 second, the start-stop time of recording is controlled by the first audio data according to the recording duration, and the recording duration of the supplemental audio data is increased incrementally by the recording duration increment.

For example: the recording duration of the first audio data is 2 seconds, the recording duration of the first supplemental audio data is 3 seconds, the recording duration of the second supplemental audio data is 4 seconds, the recording duration of the third supplemental audio data is 5 seconds, and so on.

For example: setting the value of the recording segmentation parameter to be 5, the following data segments can be obtained finally: one first audio data, five supplemental audio data (first supplemental audio data to fifth supplemental audio data).

On the basis of the above technical solution, as shown in fig. 12, when the far end receives the first audio data, the parsing and matching operation is directly performed on the first audio data,

as shown in fig. 13, when the remote end receives the supplemental audio data after receiving the first audio data, the first audio data and the supplemental audio data are combined to form second audio data, and the second audio data is analyzed and matched;

as shown in fig. 14, when the remote end receives only the supplementary audio data, the parsing and matching operation is directly performed on the supplementary audio data,

as shown in fig. 15, when the far-end receives the supplemental audio data and then receives new supplemental audio data, the supplemental audio data and the new supplemental audio data are merged to form third audio data, and the third audio data is analyzed and matched.

The merging should be sequentially merged in the time order of the supplementary audio data.

On the basis of the above technical solution, as shown in fig. 16, each remote end presets at least two different parsing and matching algorithms,

In order to improve the analysis efficiency and the analysis accuracy, when a plurality of remote terminals work in coordination, different remote terminals adopt different analysis and matching algorithms for the same analysis object, so as to obtain the matching result more quickly and more widely, for example: when the far end A, the far end B and the far end C respectively receive the first audio data and directly analyze and match the first audio data, three different analysis and matching algorithms are respectively used, at the moment, the client finally receives and summarizes three matching results, the analysis efficiency of the first audio data can be improved, and the far end which analyzes the first audio data is fast can return data to the client at the first time, so that the waiting time of the client is reduced, the analysis accuracy is improved, and the comparison and the synthesis of the three matching results can complete the matching better and more accurately; this is: different remote ends adopt different analysis and matching algorithms aiming at the same analysis object;

different analysis and matching algorithms are adopted for different analysis objects, and are also based on the same consideration.

As an alternative embodiment, the matching result obtained first is preferentially displayed, the matching result obtained later is summarized with the matching result obtained first, the confidence degree is compared, and then the display is updated, and in the updating process, the matching result with low confidence degree is annotated by the identification to remind the user of reference.

As an alternative embodiment, the matching result is displayed in the client device and/or the remote device.

On the basis of the above technical solution, as shown in fig. 16, at least one designated target recommendation information is preset in the parsing and matching algorithm, wherein the at least one target recommendation information is used by the song listening and song recognition function.

The method can realize the song listening and song recognition functions, but the function is the most basic function realization, and the target recommendation information preset in the analysis and matching algorithm comprises any one or part or all of the following information in consideration of the information richness degree of rich target recommendation:

the target recommendation information is a song name or a song name; obtaining a corresponding song name or song name through matching based on the analysis of any one of the first audio data, the supplementary audio data, the second audio data and the third audio data;

the target recommendation information is a movie name or an MTV name; obtaining a corresponding movie name or MTV name through matching based on the analysis of any one of the first audio data, the supplementary audio data, the second audio data and the third audio data;

the target recommendation information is the name of the singer; obtaining a corresponding singer name through matching based on the analysis of any one of the first audio data, the supplementary audio data, the second audio data and the third audio data;

the target recommendation information is the playing times or the hot playing degree; obtaining corresponding playing times or hot playing degrees through matching based on the analysis of any one of the first audio data, the supplementary audio data, the second audio data and the third audio data; the playing times or the hot playing degree can be obtained by carrying out secondary matching on the basis of song names, song titles, movie names or MTV names;

the target recommendation information is characters; the characters can be lyrics or audio-converted characters; obtaining corresponding lyrics through matching based on the analysis of any one of the first audio data, the supplementary audio data, the second audio data and the third audio data, or identifying characters obtained through conversion after audio frequency is identified;

the target recommendation information is a sharing link; obtaining a corresponding sharing link through matching based on the analysis of any one of the first audio data, the supplementary audio data, the second audio data and the third audio data; the sharing link is obtained by secondary matching based on the song name, movie name or MTV name;

the target recommendation information is a download link; obtaining a corresponding download link through matching based on the analysis of any one of the first audio data, the supplementary audio data, the second audio data and the third audio data; obtaining the download link, wherein the download link can be obtained by secondary matching based on the song name, movie name or MTV name;

the target recommendation information is a ring tone setting instruction; obtaining a corresponding ring tone setting instruction through matching based on the analysis of any one of the first audio data, the supplementary audio data, the second audio data and the third audio data; the ring tone setting instruction can be obtained by secondary matching based on the song name, movie name or MTV name, or by secondary matching based on the download link;

the target recommendation information is a silent alarm instruction or an automatic contact notification instruction; obtaining a corresponding silence alarm instruction or an automatic contact notification instruction through matching based on the analysis of any one of the first audio data, the supplementary audio data, the second audio data and the third audio data; the silence alarm instruction or the automatic notification contact instruction can be obtained by secondary matching based on audio-to-text conversion.

Those not described in detail in this specification are within the skill of the art.

The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.

Claims

1. A target recommendation method based on audio recording is characterized in that,

starting a recording module to record the current environmental sound;

transmitting the recorded first audio data to one or more designated remote ends;

sequentially transmitting the recorded supplementary audio data to one or more remote ends which are the same as or different from the designated remote end according to the time sequence;

after the remote end further receives the supplementary audio data, continuously performing the same or different analysis and matching operation with the analysis and matching operation, and trying to obtain default or specified target recommendation updating information;

when the remote end receives the first audio data, the remote end directly analyzes and matches the first audio data,

when the far end receives the supplementary audio data and then receives new supplementary audio data, combining the supplementary audio data and the new supplementary audio data to form third audio data, and analyzing and matching the third audio data;

each remote end presets at least two different analysis and matching algorithms,

2. The audio recording-based target recommendation method of claim 1, wherein the recording module is locally installed on a client device, and the client device refers to a device currently used by a user;

3. The audio recording-based target recommendation method of claim 1, wherein the recording of the current environmental sound is: collecting current environmental sound through a sound pickup device, wherein the current environmental sound is in an external playing state,

alternatively, the first and second electrodes may be,

4. The method of claim 1, wherein a recording duration parameter is set to a default value of 1-3 seconds, and the first audio data and the supplemental audio data control the start-stop time of the recording according to the recording duration;

5. The method of claim 4, wherein a recording duration increment parameter is set, the default value is 1 second, the first audio data controls the start-stop time of recording according to the recording duration, and the recording duration of the supplemental audio data is increased incrementally according to the recording duration increment.

6. The audio recording based target recommendation method of claim 1, wherein a recording section parameter is set, and an amount of the supplemental audio data recorded is controlled by the recording section.

7. The audio recording-based target recommendation method of claim 1, wherein in the recording process, the recorded data is buffered in a buffer, and the first audio data or the supplementary audio data is obtained by intercepting and unloading the data from the buffer;

8. The method as claimed in claim 1, wherein at least one specific target recommendation information is preset in the parsing and matching algorithm, wherein the at least one target recommendation information is used by a song listening and song recognition function.