CN111259170A

CN111259170A - Voice search method and device, electronic equipment and storage medium

Info

Publication number: CN111259170A
Application number: CN201811458192.5A
Authority: CN
Inventors: 薄琳
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-09

Abstract

The application relates to the technical field of voice search, in particular to a voice search method, which comprises the following steps: acquiring a reference text set; processing the reference text based on the language type corresponding to the reference text to obtain a candidate search text library aiming at each reference text in the reference text set; after the voice recognition text corresponding to the target voice is determined, searching a candidate search text library matched with the language type corresponding to the voice recognition text from all candidate search text libraries, and determining a reference text corresponding to the target voice according to the searched candidate search text library. By adopting the method, the target voice recognition inclusion degree is improved by processing the voice recognition text based on the language type, the accuracy of the target voice search is further improved by determining the search result based on the matching degree, and therefore the use experience of a user is improved. The application also provides a voice searching device, electronic equipment and a storage medium.

Description

Voice search method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of voice search technologies, and in particular, to a voice search method, an apparatus, an electronic device, and a storage medium.

Background

With the rapid development of mobile terminal technology, the functions of the mobile terminal are more and more abundant, for example, the mobile terminal is widely applied to a voice search function of a mobile phone and other mobile terminals. The user can search various types of data including weather, maps, music, contacts, etc. through the voice search function.

The existing voice search method is generally performed based on the recognition result of the voice content of the user, that is, after the voice content of the user is recognized, the search engine may provide the user with search content matching the voice content. Taking music search as an example, when a user speaks to a mobile terminal: "i want to listen to a xx song," it is desirable that the song can be identified and played quickly.

However, when recognizing voice content by using the existing voice recognition technology, noise influence possibly brought by various voice environments is often unavoidable, resulting in poor accuracy of voice recognition, which further results in inaccurate returned search content, even failure to return corresponding search content, and poor user experience.

Therefore, a technical solution capable of accurately pushing the content to be searched by the user is needed.

Disclosure of Invention

In view of this, an object of the embodiments of the present application is to provide a voice search method, an apparatus, an electronic device, and a storage medium, which can improve accuracy of voice search and improve user experience.

Mainly comprises the following aspects:

in a first aspect, an embodiment of the present application provides a voice search method, where the method includes:

acquiring a reference text set;

processing each reference text in the reference text set based on the language type corresponding to the reference text to obtain a candidate search text library;

after the voice recognition text corresponding to the target voice is determined, searching a candidate search text library matched with the language type corresponding to the voice recognition text from all candidate search text libraries, and determining a reference text corresponding to the target voice according to the searched candidate search text library.

In some embodiments, the reference text comprises at least one chinese character, the language category to which the at least one chinese character corresponds being a chinese category; processing the reference text based on the language type corresponding to the reference text, including:

extracting each Chinese character from the reference text in sequence;

sequentially combining the extracted multiple Chinese characters to obtain each Chinese character group;

and taking each Chinese character and each Chinese character group as a candidate search text after processing.

In some embodiments, the reference text includes at least one pinyin, and the language category corresponding to the at least one pinyin is a chinese category; processing the reference text based on the language type corresponding to the reference text, including:

sequentially extracting all pinyin from the reference text, and determining an initial part and a final part corresponding to each pinyin;

sequentially combining any multiple extracted pinyin to obtain each pinyin group;

for any two pinyins, determining an initial group formed by combining an initial part of one pinyin and an initial part of the other pinyin and determining a final group formed by combining a final part of one pinyin and a final part of the other pinyin;

and taking each pinyin, the pinyin group, the initial consonant group and the final group as a candidate search text after processing.

In some embodiments, the reference text includes at least one letter, the language category to which the at least one letter corresponds is an english category; processing the reference based on the language type corresponding to the reference text, including:

extracting each letter from the reference text in sequence;

sequentially combining any plurality of extracted letters to obtain each letter group;

and taking each letter group as a candidate search text after processing.

In one embodiment, before searching for a candidate search text library matching the language type corresponding to the speech recognition text from all the candidate search text libraries, the method further includes:

processing the voice recognition text based on the language type corresponding to the voice recognition text to obtain a processed voice recognition text;

searching candidate search texts matched with the language type corresponding to the voice recognition text from all candidate search text libraries, wherein the candidate search texts comprise:

and searching a candidate search text library matched with the processed voice recognition text from all candidate search text libraries.

In another embodiment, searching all the candidate search text libraries for a candidate search text library matching the language type corresponding to the speech recognition text includes:

for each candidate search text library, determining the matching degree between the processed voice recognition text and the candidate search text in the candidate search text library;

ranking all candidate search text libraries according to the sequence of high matching degree to low matching degree;

and taking the candidate search text library which accords with the preset name as a candidate search text library matched with the processed voice recognition text.

In some embodiments, the library of candidate search texts comprises a plurality of candidate search texts; the determining the matching degree between the processed speech recognition text and the candidate search text in the candidate search text library includes:

for each processed speech recognition text, determining whether the speech recognition text is consistent with any candidate search text in the candidate search text library;

if yes, determining that the voice recognition text is matched with the candidate search text;

counting the number of voice recognition texts matched with the candidate search texts in any candidate search text library;

and taking the counted number as the matching degree between the determined and processed voice recognition text and the candidate search text in the candidate search text library.

In yet another embodiment, before the determining the matching degree between the processed speech recognition text and the candidate search text in the candidate search text library, the method further includes:

aiming at each processed voice recognition text, corresponding importance information is given to the voice recognition text;

the determining the matching degree between the processed speech recognition text and the candidate search text in the candidate search text library includes:

and determining the matching degree between the voice recognition text endowed with the importance degree information and the candidate search text in the candidate search text library.

In another embodiment, before processing the reference text based on the language type corresponding to the reference text, the method further includes:

judging whether the reference text has an extended reference text;

processing the reference text based on the language type corresponding to the reference text, including:

and when the extended reference text exists in the reference text, processing the reference text based on the language type corresponding to the reference text, and processing the extended reference text based on the language type corresponding to the extended reference text.

In another embodiment, after obtaining the reference text set, the method further includes:

and performing text conversion on each reference text in the reference text set to obtain a converted reference text.

In a second aspect, an embodiment of the present application further provides a speech search apparatus, where the apparatus includes:

the acquisition module is used for acquiring a reference text set;

the first processing module is used for processing each reference text in the reference text set based on the language type corresponding to the reference text to obtain a candidate search text library;

and the searching module is used for searching a candidate searching text library matched with the language type corresponding to the voice recognition text from all candidate searching text libraries after the voice recognition text corresponding to the target voice is determined, and determining a reference text corresponding to the target voice according to the searched candidate searching text library.

In some embodiments, the reference text comprises at least one chinese character, the language category to which the at least one chinese character corresponds being a chinese category; the first processing module is specifically configured to:

extracting each Chinese character from the reference text in sequence;

In some embodiments, the reference text includes at least one pinyin, and the language category corresponding to the at least one pinyin is a chinese category; the first processing module is specifically configured to:

In some embodiments, the reference text includes at least one letter, the language category to which the at least one letter corresponds is an english category; the first processing module is specifically configured to:

extracting each letter from the reference text in sequence;

and taking each letter group as a candidate search text after processing.

In one embodiment, the method further comprises:

the second processing module is used for processing the voice recognition text based on the language type corresponding to the voice recognition text to obtain a processed voice recognition text;

the search module is specifically configured to search a candidate search text library matched with the processed speech recognition text from all candidate search text libraries.

In some embodiments, the search module is specifically configured to:

In another embodiment, the method further comprises:

the giving module is used for giving corresponding importance information to each processed voice recognition text;

the search module is specifically configured to:

In yet another embodiment, the method further comprises:

the judging module is used for judging whether the reference text has the extended reference text;

the first processing module is specifically configured to:

In yet another embodiment, the method further comprises:

and the conversion module is used for performing text conversion on each reference text in the reference text set to obtain a converted reference text.

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the voice search method according to the first aspect.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the voice search method according to the first aspect.

By adopting the scheme, after the reference text set is obtained, the reference text is processed for each reference text in the reference text set based on the language type corresponding to the reference text to obtain the candidate search text library, after the voice recognition text corresponding to the target voice is determined, the candidate search text library matched with the language type corresponding to the voice recognition text is searched from all the candidate search text libraries, and the reference text corresponding to the target voice is determined according to the searched candidate search text library. Therefore, the processing of the voice recognition text is carried out based on the language type, the accuracy of target voice recognition is improved, and the accuracy of a search result can be improved when the search is carried out based on the recognized text.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a flow chart illustrating a voice search method according to an embodiment of the present application;

FIG. 2 is a flow chart of a voice search method provided in the second embodiment of the present application;

FIG. 3 is a flow chart of another speech searching method provided in the second embodiment of the present application;

FIG. 4 is a flowchart illustrating a voice search method according to a third embodiment of the present application;

FIG. 5 is a flow chart of a voice search method according to the fourth embodiment of the present application;

FIG. 6 is a flow chart of a voice search method provided in the fifth embodiment of the present application;

fig. 7 is a schematic structural diagram illustrating a speech search apparatus according to a seventh embodiment of the present application;

fig. 8 shows a schematic structural diagram of an electronic device according to an eighth embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

In consideration of the fact that the accuracy of voice recognition is poor when the existing voice recognition technology recognizes voice content, the returned search content is inaccurate, even the corresponding search content cannot be returned, and the user experience is poor. In view of this, the embodiment of the present application provides a voice search method, which may be applied to the technical field of music search, the technical field of weather search, and other technical fields. This is specifically illustrated by several examples.

Example one

As shown in fig. 1, a schematic flow chart of a voice search method provided in an embodiment of the present application is provided, where an execution main body of the voice search method may be an electronic device, such as a private device, an operator network device (e.g., a base Station device), a site (Station) deployed by a certain industry organization, group, and individual, and specifically, the voice search method may include, but is not limited to, a mobile Station, a mobile terminal, a mobile phone, a user equipment, a mobile phone and portable device (portable), a vehicle (vehicle), and the like, for example, the terminal device may be a mobile phone (or referred to as a "cellular" phone), a computer with a wireless communication function, and the terminal device may also be a portable, pocket, handheld, built-in computer, or vehicle-mounted mobile apparatus, and the embodiment of the present application is not limited thereto. The method specifically comprises the following steps:

s101, acquiring a reference text set.

Here, first, a reference text set needs to be acquired. The set of reference texts may be a pre-collected set of correct reference texts. The reference text in the embodiment of the application can be Chinese characters, pinyin, letters and other texts.

S102, aiming at each reference text in the reference text set, processing the reference text based on the language type corresponding to the reference text to obtain a candidate search text library.

Here, after determining the language type corresponding to the reference text, the reference text may be processed according to the corresponding language type. In the embodiment of the application, the reference texts of different language types can be correspondingly processed to obtain the corresponding candidate search text library. The method mainly considers that the existing voice recognition system is sensitive to the environment, the targeted voice training is usually required for one scene, and various noises are usually mixed in the recognized voice recognition text, such as missing characters, homophones and different characters, similar initials, similar finals, English transliteration and English original text, and the like. When the recognized voice recognition text is mixed with noise, the returned search content is inaccurate or even the corresponding search content cannot be returned by the existing scheme of performing voice search by utilizing the word-based creation of the inverted index, and the user experience is poor. In order to solve the problems of the voice search scheme, the method and the device perform text processing by comprehensively considering the language type corresponding to the voice recognition text, so as to perform matching of search results according to the processed voice recognition text. Correspondingly, in the process of determining the candidate search text base, the embodiment of the present application may also perform the determination based on the language type.

S103, after the voice recognition text corresponding to the target voice is determined, searching a candidate search text library matched with the language type corresponding to the voice recognition text from all candidate search text libraries, and determining a reference text corresponding to the target voice according to the searched candidate search text library.

Here, a plurality of corresponding candidate search text libraries are obtained by performing analysis processing in a plurality of language types (including at least a chinese type and an english type) on the reference text set. According to the method and the device for searching the voice recognition text, the index relation between the reference text and the candidate search text library can be established in advance, namely, when the matching number between the voice recognition text and the candidate search text in the candidate search text library is determined, the corresponding search result can be determined based on the established index relation. That is, after the speech recognition text corresponding to the target speech is determined, the embodiment of the present application may search the candidate search text library matching the speech recognition text from all the candidate search text libraries, and then may determine the reference text corresponding to the target speech based on the index relationship between the reference text and the candidate search text library.

The embodiment of the present application can perform corresponding text processing for different language types, and the following description will focus on the processing procedure of the reference texts of chinese type and english type through the following second embodiment and third embodiment.

Example two

When the reference text includes at least one chinese character, the language type corresponding to the reference text is a chinese language type, and thus, the reference text is processed based on the language type corresponding to the reference text, as shown in fig. 2, which specifically includes the following steps:

s201, extracting each Chinese character from the reference text in sequence;

s202, sequentially combining a plurality of extracted Chinese characters to obtain each Chinese character group;

s203, taking each Chinese character and each Chinese character group as a candidate search text after processing.

Firstly, extracting each Chinese character from the reference text in sequence, then combining any plurality of Chinese characters in sequence to obtain each Chinese character group, and finally taking each Chinese character group and each Chinese character as a candidate search text after processing. Taking the reference text of Chengyang Xuan as an example, three Chinese characters of Chengyang, Yiyi and Xuan can be extracted, and after sequential combination, Chinese character groups of Chengyang, Yiyi and the like can be obtained, so that Chengyang, Yiyi, Chengyang and Yiyi can be used as a candidate search text after processing.

It is worth to be noted that, in the embodiment of the present application, not only each chinese character group and each chinese character can be used as a candidate search text after processing, but also after the chinese character group and the chinese character are screened, the screened chinese character and the chinese character group can be used as candidate search texts after processing, so as to improve the search efficiency on the premise of ensuring the accuracy of the voice search.

For a reference text including at least one Chinese character, the embodiment of the application can generate corresponding pinyins based on the Chinese characters included in the reference text, so that each Chinese character corresponds to one pinyin, and one reference text also corresponds to at least one pinyin, so that the pinyins and initial consonants and vowels included in the pinyins can be identified to improve the inclusion of voice identification. When pinyin is used as a reference text, the reference text is processed based on the language type corresponding to the reference text, as shown in fig. 3, the method specifically includes the following steps:

s301, extracting all pinyin from the reference text in sequence, and determining an initial part and a final part corresponding to each pinyin;

s302, sequentially combining the extracted multiple pinyin to obtain each pinyin group;

s303, aiming at any two pinyins, determining an initial group formed by combining the initial part of one pinyin and the initial part of the other pinyin and determining a final group formed by combining the final part of one pinyin and the final part of the other pinyin;

s304, taking each pinyin, the pinyin groups, the initial consonant groups and the final sound groups as a candidate search text after processing.

The pinyin, the pinyin groups, the initial consonant groups and the final group are respectively used as a candidate search text after processing.

Taking the reference text of chenyixun as an example, three pinyins of chen, yi and xun and the initial consonant part and the final part of each pinyin can be extracted (for example, the initial consonant part of chen is ch, the final part is en, the initial part of yi is y, the final part is i, the initial part of xun is x, and the final part is un), so that after sequentially combining a plurality of pinyins, pinyin groups such as chenyi and yixun can be obtained. In addition, for any two pinyins, corresponding initial parts can be combined to obtain initial groups, such as chy and yx, and corresponding final parts can be combined to obtain final groups, such as eni and itu, so that chen, yi, xun, chenyi, yixun, chy, yx, eni, itu, etc. can all be used as a candidate search text after processing.

Similarly, not only can each pinyin, each pinyin group, each initial consonant group and each final group be used as a candidate search text after processing, but also after the pinyin, the pinyin groups, the initial consonant groups and the final groups are screened, the screened pinyin, the pinyin groups, the initial consonant groups and the final groups can be used as candidate search texts after processing, so that the search efficiency is improved on the premise of ensuring the accuracy of voice search.

EXAMPLE III

When the reference text includes at least one letter, the corresponding language type is an english type, so that the reference text is processed based on the language type corresponding to the reference text, as shown in fig. 4, specifically including the following steps:

s401, extracting letters from the reference text in sequence;

s402, sequentially combining any plurality of extracted letters to obtain each letter group;

and S403, taking each letter group as a candidate search text after processing.

Here, each letter is extracted from the reference text in sequence, then any plurality of letters are combined in sequence to obtain each letter group, and finally each letter is used as a candidate search text after processing. Taking eason (i.e. fast english name in good games) as an example, each letter can be extracted, and the eas, aso, son and the like can be obtained after sequential combination and can be used as a candidate search text after processing.

It is worth noting that each letter can be used as a candidate search text after being processed, and after the letter groups are screened, the screened letter groups can be used as candidate search texts after being processed, so that the search efficiency is improved on the premise of ensuring the accuracy of voice search.

According to the voice searching method provided by the embodiment of the application, after the target voice is obtained, voice recognition can be performed firstly to obtain the corresponding voice recognition text. In the embodiment of the application, a voice recognition system can be used for performing voice recognition on target voice, that is, the target voice can be analyzed and processed through the characteristic parameter extraction unit, redundant information in abundant voice information is removed, information useful for voice recognition is obtained, and then the obtained information is recognized according to the mode matching and model training unit to obtain a voice recognition text.

Here, similar to the reference text, the speech recognition text in the embodiment of the present application may be a chinese character, a pinyin, a letter, or another text. After the language type corresponding to the speech recognition text is determined, the speech recognition text can be correspondingly processed according to the corresponding language type. In the practical application of speech recognition, the influence degrees of different language types on the environment are different, and the embodiment of the application can process texts of Chinese categories (Chinese characters, pinyin and the like), texts of English categories (letters) and texts of other language types, and is not described herein again.

There may be one or more speech recognition texts processed for different language types. In this way, the candidate search text library matching the current speech recognition text can be determined by the matching degree between the speech recognition text after the statistical processing and the candidate search texts in each candidate search text library, which is described in detail in the following embodiment four.

Example four

As shown in fig. 5, an embodiment of the present application provides a method for searching a candidate search text library, which specifically includes:

s501, aiming at each candidate search text library, determining the matching degree between the processed voice recognition text and the candidate search text in the candidate search text library;

s502, ranking all candidate search text libraries according to the sequence of the matching degree from high to low;

and S503, taking the candidate search text library which accords with the preset name as a candidate search text library matched with the processed voice recognition text.

Here, the voice search method provided by the embodiment of the present application may rank all candidate search text libraries in order from high matching degree to low matching degree; and then taking the candidate search text library which accords with the preset ranking as a candidate search text library matched with the processed voice recognition text so as to determine a reference text which is obtained according to the target voice according to the searched candidate search database.

Here, the preset ranking may be adaptively adjusted according to different application scenarios, for example, for some voice search scenarios with low requirement on the accuracy of the search result, for example, for application scenarios with fuzzy search (e.g., map navigation), the reference text corresponding to the candidate search text library with the preset ranking (e.g., the top 5) may be recommended to the user as the search result corresponding to the target voice, but for some voice search scenarios with high requirement on the accuracy of the search result, for example, for application scenarios with accurate search (e.g., music recommendation by using a voice robot), the reference text corresponding to the candidate search text library with the highest ranking may be recommended to the user as the search result corresponding to the target voice, so as to further improve the user experience.

It should be noted that the speech recognition text in the embodiment of the present application may include any combination of chinese characters, pinyin, and letters. For example, for the speech recognition texts such as 'Anji ba by', 'love ATM' and the like, the embodiment of the application can separately search according to the language types, namely, separately search Chinese and English, namely, adding the matching degree corresponding to the Chinese character and the matching degree corresponding to the letter to obtain the final matching degree of the speech recognition text, thereby solving the problem of higher search error rate caused by language mixing and ensuring the search accuracy.

The above matching degree determination process is a key step in the embodiments of the present application, and is explained by the following embodiment five.

EXAMPLE five

As shown in fig. 6, the process of determining the matching degree is specifically implemented by the following steps:

s601, aiming at each processed voice recognition text, determining whether the voice recognition text is consistent with any candidate search text in the candidate search text library;

s602, if the voice recognition texts are consistent with the candidate search texts, determining that the voice recognition texts are matched with the candidate search texts;

s603, counting the number of the voice recognition texts matched with the candidate search texts in the candidate search text library aiming at any candidate search text library;

s604, taking the counted number as the matching degree between the determined voice recognition text and the candidate search text in the candidate search text library.

Here, it may be determined first whether the speech recognition text matches any of the candidate search texts in the candidate search text library for each processed speech recognition text, and if so, it is stated that the speech recognition text matches the candidate search text, and then, for any of the candidate search text libraries, the number of speech recognition texts matching the candidate search text in the candidate search text library is counted, and the counted number is taken as a matching degree, which increases as the counted number increases.

The above-described process of determining the matching degree will be described with reference to a specific example. For example, when the voice recognition is carried out, the game squab is recognized as the squab of the game troops, at the moment, aiming at the three Chinese characters of the game troops, the squab, the game troops, the troops and the game troops can be used as a processed voice recognition text, for the reference text game show, the show, game show and game show can be used as the candidate search texts in the candidate search text library, on the premise that the determined processed speech recognition text is the same as the four candidate reference texts in the candidate search text library, and the two candidate reference texts are different, a degree of match between the candidate search texts in the library of candidate search texts and the target speech may be determined, at this time, the search result corresponding to the target speech can be determined from each candidate search text library based on the matching degree, that is, even if noise interference exists in the speech recognition stage, the speech searching method provided by the embodiment of the application can still perform accurate speech searching based on the matching degree.

In addition, considering that different types of speech recognition texts have different influences on the matching degree, for example, for a speech recognition text of a pinyin type of "yixun", the influence on the matching degree is often smaller than that of a speech recognition text of a word type of "yixun", so in the embodiment of the present application, for each processed speech recognition text, corresponding importance information may be given to the processed speech recognition text, and the corresponding matching degree may be determined based on the importance information and the speech recognition text.

Here, the importance information may be preset, and may also be adaptively adjusted to meet the requirements of different application scenarios.

EXAMPLE six

In order to facilitate better processing of the reference text, the embodiment of the present application may perform preprocessing on the reference text. The preprocessing may include a text conversion process, a text extension process, a text filtering process, and the like.

For text extension processing, when it is determined that an extended reference text exists in a reference text, the reference text is processed based on a language type corresponding to the reference text, and the extended reference text can be processed based on a language type corresponding to the extended reference text.

In the embodiment of the application, for example, the alias of the singer can be used as a synonym for expansion when the database is established, for example, the alias of the Zhou Jieren is used as an expansion voice recognition character, so that the correct singer can be matched when the user searches the alias by voice. For example, a Chinese name can be added to the English singer name to be used as an extended voice recognition character for supporting Chinese search, and for example, the Chinese name Jiastin of the English singer name Justin is used as an extended reference text, so that a correct English singer can be matched when a user searches the Chinese name Jiastin by voice.

For text conversion processing, the embodiment of the application considers the diversity of the numbers in language identification, and can process Arabic numbers in a semantic identification text into Chinese, for example, convert song name love 36 into love thirty-six, so as to realize the unification between the current voice search environment and the database and further improve the search effect.

Based on the above embodiment, the present application also provides a voice search apparatus, and the implementation of the following various apparatuses may refer to the implementation of the above method, and repeated details are not repeated.

EXAMPLE seven

As shown in fig. 7, a speech search apparatus provided in a sixth embodiment of the present application includes:

an obtaining module 701, configured to obtain a reference text set;

a first processing module 702, configured to, for each reference text in the reference text set, process the reference text based on a language type corresponding to the reference text, so as to obtain a candidate search text library; the candidate search text library comprises a plurality of candidate search texts;

the searching module 703 is configured to, after determining the speech recognition text corresponding to the target speech, search a candidate search text matching the language type corresponding to the speech recognition text from the candidate search text library, and determine a reference text corresponding to the target speech according to the searched candidate search text.

In some embodiments, the reference text comprises at least one chinese character, the language category to which the at least one chinese character corresponds being a chinese category; the first processing module 702 is specifically configured to:

extracting each Chinese character from the reference text in sequence;

In some embodiments, the reference text includes at least one pinyin, and the language category corresponding to the at least one pinyin is a chinese category; the first processing module 702 is specifically configured to:

In some embodiments, the reference text includes at least one letter, the language category to which the at least one letter corresponds is an english category; the first processing module 702 is specifically configured to:

extracting each letter from the reference text in sequence;

and taking each letter group as a candidate search text after processing.

In one embodiment, the method further comprises:

a second processing module 704, configured to process the speech recognition text based on a language type corresponding to the speech recognition text, so as to obtain a processed speech recognition text;

the search module 703 is specifically configured to search the candidate search text matching the processed speech recognition text from the candidate search text library.

In some embodiments, the search module 703 is specifically configured to:

determining the matching degree between the processed voice recognition text and the candidate search text in the candidate search text library;

ranking all candidate search texts according to the sequence of the matching degree from high to low;

and taking the candidate search text which accords with the preset name as the candidate search text matched with the processed voice recognition text.

In some embodiments, the search module 703 is specifically configured to:

for each processed voice recognition text, determining whether the voice recognition text is consistent with a search text associated with any candidate search text in the candidate search text library;

counting the number of the voice recognition texts matched with any candidate search text;

In another embodiment, the method further comprises:

a giving module 705, configured to give corresponding importance information to each processed speech recognition text;

the search module 703 is specifically configured to:

In yet another embodiment, the method further comprises:

a judging module 706, configured to judge whether the reference text has an extended reference text;

the first processing module 702 is specifically configured to:

In yet another embodiment, the method further comprises:

a conversion module 707, configured to perform text conversion on each reference text in the reference text set to obtain a converted reference text.

Example eight

As shown in fig. 8, a schematic structural diagram of an electronic device according to a seventh embodiment of the present application includes: a processor 801, a storage medium 802 and a bus 803, wherein the storage medium 802 stores machine-readable instructions executable by the processor 801, the processor 801 and the storage medium 802 communicate with each other through the bus 803 when the electronic device runs, and the machine-readable instructions are executed by the processor 801 to execute the voice search method provided by any one of the above embodiments.

Example nine

The ninth embodiment of the present application further provides a computer-readable storage medium 802, where a computer program is stored on the computer-readable storage medium 802, and when the computer program is executed by the processor 801, the steps of the voice search method corresponding to the foregoing embodiment are executed.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is run, the voice search method can be executed, so that the problem of low voice search accuracy at present is solved, the accuracy of voice search is improved, and the use experience of a user is improved.

Based on the same technical concept, embodiments of the present application further provide a computer program product, which includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the voice search method, and specific implementation may refer to the above method embodiments, and will not be described herein again.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for voice searching, the method comprising:

acquiring a reference text set;

2. The method of claim 1, wherein the reference text comprises at least one chinese character, and the language category corresponding to the at least one chinese character is a chinese category; processing the reference text based on the language type corresponding to the reference text, including:

extracting each Chinese character from the reference text in sequence;

3. The method of claim 1, wherein the reference text comprises at least one pinyin, and the language category corresponding to the at least one pinyin is a chinese language category; processing the reference text based on the language type corresponding to the reference text, including:

4. The method of claim 1, wherein the reference text comprises at least one letter, and wherein the language category to which the at least one letter corresponds is an english category; processing the reference based on the language type corresponding to the reference text, including:

extracting each letter from the reference text in sequence;

and taking each letter group as a candidate search text after processing.

5. The method according to claim 1, further comprising, before searching all the candidate search textbooks for a candidate search textbook matching the language type corresponding to the speech recognition text:

searching a candidate search text library matched with the language type corresponding to the voice recognition text from all the candidate search text libraries, wherein the candidate search text library comprises:

6. The method of claim 5, wherein searching all the candidate search textbooks for a candidate search textbook matching the language type corresponding to the speech recognition text comprises:

7. The method of claim 6, wherein the candidate search text library comprises a plurality of candidate search texts; the determining the matching degree between the processed speech recognition text and the candidate search text in the candidate search text library includes:

8. The method of claim 6, further comprising, prior to determining a degree of match between the processed speech recognized text and candidate search text in the library of candidate search text:

9. The method of claim 1, further comprising, before processing the reference text based on the language type corresponding to the reference text:

judging whether the reference text has an extended reference text;

10. The method of claim 1, after obtaining the reference text set, further comprising:

11. A speech searching apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a reference text set;

12. The apparatus of claim 11, wherein the reference text comprises at least one chinese character, and the language category corresponding to the at least one chinese character is a chinese category; the first processing module is specifically configured to:

extracting each Chinese character from the reference text in sequence;

13. The apparatus of claim 11, wherein the reference text comprises at least one pinyin, and the language category corresponding to the at least one pinyin is a chinese language category; the first processing module is specifically configured to:

14. The apparatus of claim 11, wherein the reference text comprises at least one letter, and wherein the language category corresponding to the at least one letter is an english category; the first processing module is specifically configured to:

extracting each letter from the reference text in sequence;

and taking each letter group as a candidate search text after processing.

15. The apparatus of claim 11, further comprising:

16. The apparatus of claim 15, wherein the search module is specifically configured to:

17. The apparatus of claim 16, wherein the candidate search text library comprises a plurality of candidate search texts; the search module is specifically configured to:

18. The apparatus of claim 16, further comprising:

the search module is specifically configured to:

19. The apparatus of claim 11, further comprising:

the first processing module is specifically configured to:

20. The apparatus of claim 11, further comprising:

21. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the voice search method according to any one of claims 1 to 10.

22. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the speech search method according to one of claims 1 to 10.