CN102236686A

CN102236686A - Voice sectional song search method

Info

Publication number: CN102236686A
Application number: CN2010101692232A
Authority: CN
Inventors: 李霄寒; 黄伟; 蔡洪滨
Original assignee: Shengle Information Technolpogy Shanghai Co Ltd
Current assignee: Shengle Information Technolpogy Shanghai Co Ltd
Priority date: 2010-05-07
Filing date: 2010-05-07
Publication date: 2011-11-09

Abstract

The invention discloses a voice sectional song search method, which is realized through interaction of a client side and a server side; and a song database is required to be established at the server side. The method comprises the following steps of: during search, the client side prompts a user to speak out song information to be searched in sections and transmits the voice spoken by the user to the server side after the user finishes speaking; after receiving a voice signal, the server side automatically identifies a character corresponding to the voice signal, then searches in the song database in layers and transmits a search result to the client side; and finally, the client side provides the song search result to the user. According to the method, a rational voice interaction flow is designed at the client side, so that a super large song search space can be decomposed into combination of a plurality of smaller song search spaces by the server side; and thus, search efficiency and automatic voice recognition accuracy in the super large song search space are improved, user experience is improved, meanwhile, hardware cost is saved for service providers.

Description

Voice segment formula song retrieval method

Technical field

The present invention relates to a kind of voice segment formula song retrieval method.

Background technology

Download of songs is current moving and a very active class of business of internet arena.The searching requirement that this business is provided by client according to the user finds the song that meets search condition for user's download in server Qu Ku.Modal example is the name of user in terminal input singer or song, and server finds this singer or the pairing head of song title or a series of songs to return to the user.

On a portable terminal, because the restriction of terminal physical size, the efficient of input method is lower usually, the user can be very consuming time by keyboard or screen input one first complete song title, and because phonetic input accounts for dominant position on mobile terminal input method, the user imports that the situation right and wrong of unisonance wrongly written or mispronounced characters usually see, be entered as " wearing strange " such as " legend " Wang Fei, this may directly cause certain server end searching algorithm can't find corresponding song.At above problem, a reasonable solution is to use speech recognition, such as, allow the user directly say the name of song, user terminal or server search target song or a series of song by speech recognition algorithm from the title of the song database then, from Qu Ku, obtain actual song again and return to the user, so both can improve user's input efficiency, can avoid the problem of unisonance wrongly written or mispronounced characters again, can promote user experience to a certain extent, improve the wish that the user uses the download of songs service.

But, speech recognition technology has a natural weakness, experience with people's ear is similar, speech recognition is not very accurately, and the accuracy rate of machine talk identification descends along with the increase of database, for example, increase along with the title of the song database, the similar song title of pronouncing also can increase, as " favourable turn " of Pan Weibai and " legend " of Wang Fei, and like this can the serious accuracy rate that reduces speech recognition.And in order to contain most of users' demand, a medium sized song database will comprise independently song of hundreds of thousands head usually, carries out speech recognition in a big like this scope, is a very large challenge for recognition accuracy.Simultaneously, speech recognition algorithm is computation-intensive normally, and its calculated amount increases sharply along with the increase of database, this arithmetic capability to server is a very large challenge, in order in time to handle a plurality of users' concurrent demand, the service provider often needs more hardware resource is provided, and has increased cost.

Summary of the invention

The technical problem to be solved in the present invention provides a kind of voice segment formula song retrieval method, and it can improve the accuracy rate of search efficiency and automatic speech recognition, the economize on hardware cost.

For solving the problems of the technologies described above, voice segment formula song retrieval method of the present invention, by the mutual realization of client and server end, and server end has song database, and this method comprises the following steps:

(1) the Client-Prompt customer segment is said the song information that needs retrieval, and the voice that the user says are sent to server end;

(2) server end receives the voice signal that client transmits;

(3) server end carries out automatic speech recognition to the voice signal that receives, and identifies this voice signal corresponding characters;

(4) server end starts search engine, the song of hierarchical search user appointment in song database;

(5) server end sends to client with the song search result;

(6) Search Results that transmits of client reception server end;

(7) client offers the user with the song search result.

Described step (1) comprises the following steps:

(1) the Client-Prompt user first talks about first section voice;

(2) the client recording also detects sound end simultaneously;

(3) client judges according to the sound end testing result whether the user finishes, if, then entered for (4) step, if not, then continued for (2) step;

(4) the Client-Prompt user says second section voice;

(5) the client recording also detects sound end simultaneously;

(6) client judges according to the sound end testing result whether the user finishes, if, then entered for (7) step, if not, then continued for (5) step;

(7) client sends to server end simultaneously with two sections voice.

Described step (1) also can comprise the following steps:

(1) the Client-Prompt user first talks about first section voice;

(2) the client recording also detects sound end simultaneously;

(4) the Client-Prompt user says second section voice, simultaneously, first section voice is sent to server end;

(5) the client recording also detects sound end simultaneously;

(7) client sends to server end with second section voice.

Song retrieval method of the present invention is passed through in client interactive voice flow process reasonable in design, allow customer segment import voice, make server end be adopted the way of hierarchical search, the song search spatial decomposition of a super large is become the combination in several less song search spaces, dwindled the hunting zone, therefore, compare with existing song retrieval method, search method of the present invention can effectively shorten retrieval time, improve the accuracy rate of automatic speech recognition, promote user experience, simultaneously, can also reduce the load of server, be service provider's economize on hardware cost.

Description of drawings

The present invention is further detailed explanation below in conjunction with accompanying drawing and embodiment:

Fig. 1 is the process flow diagram of the embodiment of the invention;

Fig. 2 is the synoptic diagram of the embodiment of the invention.

Embodiment

Understand for technology contents of the present invention, characteristics and effect being had more specifically, existing in conjunction with illustrated embodiment, details are as follows:

Fig. 1 and 2 is respectively the process step figure and the synoptic diagram of the embodiment of the invention, has used singer's name and song title to search for song in this embodiment, and server end has singer's name database and the song database with singer's classification foundation by name.As illustrated in fig. 1 and 2, when the user prepares to search for title of the song that singer Wang Fei sings for the song of " legend ", carry out according to the following step:

At first, the Client-Prompt user says the singer of the song that need retrieve, i.e. singer's name.The user says singer's name " Wang Fei ", on one side client is recorded, Yi Bian use voice activity detection algorithm to detect the terminal of voice signal in real time, whether finishes this section voice to judge the user.More common voice activity detection algorithm is the terminal of utilizing temporal signatures such as voice signal energy and short-time zero-crossing rate to come recognition of speech signals mostly at present.This deterministic process does not rely on the information of server end, and is therefore very fast.If client judges that the user does not finish as yet, then continue recording and detect sound end, if client judges that the user has finished singer's name, then point out the user to continue song title, and simultaneously singer's name voice signal is sent to server end by network.

After server end receives singer's name voice signal that client sends, at first identify this voice signal corresponding characters with speech recognition algorithm, then, start singer's name search engine, in singer's name database, search the close singer's name of sending with client of singer's name " Wang Fei ", singer's database is compared obviously much smaller with song database, can expect to obtain higher discrimination.In the search that this is taken turns, system chooses with immediate several results of user speech, is 3 results---" Wang Fei " among the embodiment, and " Dou Wei ", " Wang Feng " lists singer's name candidate list in.

Meanwhile, the Client-Prompt user continues song title, the user says song title " legend ", client is recorded on one side, use voice activity detection algorithm to judge whether the user finishes on one side,, then continue recording and detect sound end if client judges that the user does not finish as yet, if client judges that the user has finished song title, then the song title voice signal is sent to server end by network.

After server end receives the song title voice signal that client sends, at first identify this voice signal corresponding characters with speech recognition algorithm, then, start the song title search engine, in song database with singer's classification foundation by name, search for three singers---" Wang Fei " in aforementioned singer's name candidate list, " Dou Wei ", " Wang Feng " pairing song database, obtain name with the immediate several first songs of song title " legend ", for example " Wang Fei/legend ", " Dou Wei/outside window ", " Wang Feng/loyalty ", and send this Search Results to client by network.

Client receives the song search result that server end transmits, and this Search Results is showed the user with the form of list of songs, and the user can be as required, and the song in the selective listing is downloaded, and for example downloads " Wang Fei/legend ".

In sum, voice segment formula song retrieval method of the present invention, reciprocal process by above-mentioned client and server end, solved automatic speech recognition under the situation of search volume increase, the problem that discrimination descends fast, also improve search efficiency simultaneously, saved hardware cost, guaranteed favorable user experience.

Claims

1. voice segment formula song retrieval method, by the mutual realization of client and server end, and server end has song database, it is characterized in that, comprises the following steps:

(2) server end receives the voice signal that client transmits;

(5) server end sends to client with the song search result;

(6) Search Results that transmits of client reception server end;

(7) client offers the user with the song search result.

2. voice segment formula song retrieval method as claimed in claim 1 is characterized in that described step (1) comprises the following steps:

(1) the Client-Prompt user first talks about first section voice;

(2) the client recording also detects sound end simultaneously;

(4) the Client-Prompt user says second section voice;

(5) the client recording also detects sound end simultaneously;

(7) client sends to server end simultaneously with two sections voice.

3. voice segment formula song retrieval method as claimed in claim 1 is characterized in that described step (1) comprises the following steps:

(1) the Client-Prompt user first talks about first section voice;

(2) the client recording also detects sound end simultaneously;

(5) the client recording also detects sound end simultaneously;

(7) client sends to server end with second section voice.