CN110956958A

CN110956958A - Searching method, searching device, terminal equipment and storage medium

Info

Publication number: CN110956958A
Application number: CN201911228496.7A
Authority: CN
Inventors: 卢甜恬
Original assignee: Shenzhen Zhuiyi Technology Co Ltd
Current assignee: Shenzhen Zhuiyi Technology Co Ltd
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2020-04-03

Abstract

The embodiment of the application provides a searching method, a searching device, terminal equipment and a storage medium. The method comprises the steps of obtaining voice data of a user in an interactive process, judging whether the voice data meet a target condition through a preset algorithm model if the voice data are used for information search, correcting the voice data to obtain corrected voice data when a network state meeting a preset state is detected if the voice data do not meet the target condition, searching a search result matched with the corrected voice data, and outputting the search result. By means of the method, under the condition that the voice data of the user does not meet the target conditions, the voice data are corrected to obtain corrected voice data, so that the search result matched with the corrected voice data is searched and output, the search is more accurate, and the user experience is improved.

Description

Searching method, searching device, terminal equipment and storage medium

Technical Field

The present application relates to the field of image search technologies, and in particular, to a search method, an apparatus, a terminal device, and a storage medium.

Background

With the continuous development of search engine technology, voice search has been gradually applied to various terminal devices. As one way, the search speech input by the user may be subjected to speech recognition to convert the search speech into words, analyze keywords in the words, search for a matching search result according to the keywords or query a corresponding question-answer result in a database of the question-answer system according to the keywords, and present the search result to the user in the form of speech, web page, words, and the like. However, when searching by using voice, a search result error usually occurs due to the fact that the voice content is not standard, and it is difficult to realize accurate search.

Disclosure of Invention

In view of the above problems, the present application provides a searching method, apparatus, terminal device and storage medium to solve the above problems.

In a first aspect, an embodiment of the present application provides a search method, where the method includes: acquiring voice data of a user in an interaction process; if the voice data is used for information search, judging whether the voice data meets a target condition through a preset algorithm model; if the voice data do not meet the target condition, when the network state is detected to meet the preset state, the voice data are corrected to obtain corrected voice data; searching for a search result matching the corrected voice data and outputting the search result.

Further, the performing the correction processing on the voice data includes: acquiring a preset target voice model, wherein the target voice model is obtained by training voice characteristic data of a user or historical voice characteristic data of the user; and carrying out correction processing on the voice data based on the target voice model.

Further, the searching for the search result matching the corrected voice data and outputting the search result includes: converting the corrected voice data into text data; and searching a search result matched with the text data and outputting the search result.

Further, the performing the correction processing on the voice data further includes: converting the corrected voice data into text data; and correcting the text data by adopting a target text model to obtain text data with complete semantics.

Further, the searching for the search result matching the corrected voice data and outputting the search result includes: searching for a search result matched with the corrected text data and outputting the search result.

Further, before the performing the correction processing on the voice data, the method includes: acquiring voiceprint characteristics of the voice data; judging whether the voiceprint features are preset voiceprint features or not; and if so, converting the voice data into target voice data.

Further, the performing the correction processing on the voice data includes: and carrying out correction processing on the target voice data.

Further, the target condition is used for representing the search intention of the user which can be completely recognized according to the voice data.

In a second aspect, an embodiment of the present application provides a search apparatus, including: the acquisition module is used for acquiring voice data of a user in the interaction process; the judging module is used for judging whether the voice data meet the target condition or not through a preset algorithm model if the voice data are used for information search; the processing module is used for correcting the voice data to obtain corrected voice data when detecting that the network state meets a preset state if the voice data does not meet the target condition; and the searching module is used for searching the searching result matched with the corrected voice data and outputting the searching result.

Further, the processing module may be specifically configured to acquire a preset target speech model, where the target speech model is a model obtained through training of speech feature data of a user or historical speech feature data of the user; and carrying out correction processing on the voice data based on the target voice model.

Further, the search module may be specifically configured to convert the corrected speech data into text data; and searching a search result matched with the text data and outputting the search result.

Further, the apparatus further comprises: a conversion module for converting the corrected voice data into text data; and the correction processing module is used for correcting the text data by adopting a target text model so as to obtain text data with complete semantics.

Further, the search module may be specifically configured to search for a search result that matches the corrected text data, and output the search result.

Further, the apparatus further comprises: the voice print characteristic acquisition module is used for acquiring voice print characteristics of the voice data; the judging unit is used for judging whether the voiceprint features are preset voiceprint features or not; and the processing unit is used for converting the voice data into target voice data if the target voice data is the target voice data.

Further, the processing module may be configured to perform correction processing on the target speech data.

In a third aspect, an embodiment of the present application provides a terminal device, which includes: a memory; one or more processors coupled with the memory; one or more programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of the first aspect as described above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which program code is stored, and the program code can be called by a processor to execute the method according to the first aspect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 shows a schematic diagram of an application environment suitable for the embodiment of the present application.

Fig. 2 shows a flowchart of a method of a search method according to an embodiment of the present application.

Fig. 3 shows a flowchart of a method of searching according to another embodiment of the present application.

Fig. 4 shows a flowchart of a method of searching according to another embodiment of the present application.

Fig. 5 is a flowchart illustrating a method of searching according to still another embodiment of the present application.

Fig. 6 shows a block diagram of a search apparatus according to an embodiment of the present application.

Fig. 7 shows a block diagram of a terminal device for executing a search method according to an embodiment of the present application.

Fig. 8 illustrates a storage unit for storing or carrying program codes for implementing a search method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In recent years, with the accelerated breakthrough and wide application of technologies such as mobile internet, big data, cloud computing, sensors and the like, the development of artificial intelligence also enters a brand-new stage. While the intelligent voice search technology is used as a key ring in the Artificial Intelligence industry chain, AI (Artificial Intelligence) is one of the most mature technologies, and is rapidly developed in the fields of marketing customer service, intelligent home, intelligent vehicle-mounted, intelligent wearing, intelligent search and the like. Such as a cell phone smart assistant.

As one mode, the mobile phone intelligent assistant can be used for recognizing the voice input by the user, searching the content matched with the recognized voice data and displaying the content to the user through a mobile phone interface. However, if the user is at a too fast speed or has unclear pronunciation, the mobile phone may not be able to accurately identify the search intention of the user, which reduces the user experience.

The inventor finds in research that a customized voice correction strategy can be provided for a user through historical voice data of the user by combining speaking habits of the user, so that complete voice data are obtained and then search is carried out based on the complete voice data, the accuracy of voice search is improved, and user experience is improved. Therefore, the search method, the search device, the terminal device and the storage medium in the embodiment of the application are provided.

In order to better understand the searching method, the searching apparatus, the terminal device, and the storage medium provided in the embodiments of the present application, an application environment suitable for the embodiments of the present application is described below.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment suitable for the embodiment of the present application. The search method provided by the embodiment of the present application can be applied to the polymorphic interaction system 100 shown in fig. 1. The polymorphic interaction system 100 includes a terminal device 101 and a server 102, the server 102 being communicatively coupled to the terminal device 101. The server 102 may be a conventional server or a cloud server, and is not limited herein.

The terminal device 101 may be various electronic devices having a display screen and supporting data input, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, a wearable electronic device, and the like. Specifically, the data input may be voice input based on a voice module provided on the terminal apparatus 101, character input based on a character input module, or the like. Terminal equipment 101 is provided with the camera, and the camera can set up in the one side that terminal equipment 101 is furnished with the display screen, and optionally, the camera of terminal equipment 101 also can set up in the one side that terminal equipment 101 deviates from the display screen. It should be noted that, image data of the user can be collected through the camera, and the image data includes posture information of the user, so as to assist in accurately identifying the search intention of the user.

The terminal device 101 may have a client application installed thereon, and the user may communicate with the server 102 based on the client application (e.g., APP, wechat applet, etc.). Specifically, the server 102 is installed with a corresponding server application, a user may register a user account in the server 102 based on the client application, and communicate with the server 102 based on the user account, for example, the user logs in the user account in the client application, inputs the user account through the client application based on the user account, and may input text information, voice data, image data, and the like, after receiving information input by the user, the client application may send the information to the server 102, so that the server 102 may receive, process, and store the information, and the server 102 may also receive the information and return a corresponding output information to the terminal device 101 according to the information.

In some embodiments, the means for processing the information input by the user may also be disposed on the terminal device 101, so that the terminal device 101 can interact with the user without relying on establishing communication with the server 102, and in this case, the polymorphic interaction system 100 may only include the terminal device 101.

The above application environments are only examples for facilitating understanding, and it is to be understood that the embodiments of the present application are not limited to the above application environments.

The search method, apparatus, terminal device and storage medium provided by the embodiments of the present application will be described in detail below with specific embodiments.

As shown in fig. 2, a flowchart of a method of a search method provided in an embodiment of the present application is shown. The searching method provided by the embodiment can be applied to terminal equipment with a display screen or other image output devices, and the terminal equipment can be electronic equipment such as a smart phone, a tablet personal computer and a wearable intelligent terminal.

In a specific embodiment, the search method can be applied to the search apparatus 500 shown in fig. 6 and the terminal device 101 shown in fig. 7. The flow shown in fig. 2 will be described in detail below. The above search method may specifically include the steps of:

step S110: and acquiring voice data of the user in the interactive process.

It should be noted that, in the embodiment of the present application, the voice data of the user includes the voice feature of the user. For example, the timbre of the voice of the user (the timbre of the male is different from the timbre of the female, and optionally, the gender of the user can be discriminated according to the timbre), the volume, the pitch, the fundamental frequency, and the language to which the voice belongs (for example, common speech, Sichuan speech, Henan speech, Shandong speech, Shanghai speech, Guangdong speech, etc.), the language (for example, English, German, French, Russian, Korean, Japanese, etc.), and the like can be included. The voice data differs for different users.

As one way, the voice data in the embodiment of the present application may be voice data input by a user through a voice input function of the terminal device on the human-computer interaction interface, for example, the voice data input by the user may be collected through a voice assistant installed in the terminal device, a voice SDK (Software Development Kit), a voice recognition engine application, or the like. For example, the voice data of the user in the interaction process may be the voice data of the user who is currently interacting with the terminal device through the human-computer interaction interface of the terminal device. Optionally, the voice data of the user may be acquired during the call of the user through the terminal device.

Alternatively, the voice data may be voice recording information of the user stored in advance. Optionally, the voice data of the same user may include voice data of the user at the same time or different times, or may be voice data of different users, and the like, which is not limited herein.

As one mode, the voice data of the user in the interactive process can be obtained by extracting the features of the voice data and then decoding the extracted voice features by using the acoustic model and the language model obtained by pre-training.

Optionally, the voice data of the user acquired by the terminal device may be stored locally, or may be sent to the server by the terminal device for storage. The method for storing the data by the server can avoid the reduction of the operation speed caused by the redundancy of the stored data of the terminal equipment.

Step S120: and judging whether the voice data is used for information search.

It can be understood that the obtained voice data of the user is not all used for searching, for example, the voice data collected during the call process of the user through the terminal device, if the user searches through the terminal device in this case, resource consumption, such as power consumption and reduction of the operating memory, will be caused. In order to avoid these problems, the acquired voice data may be judged, and as a way, whether the voice data is used for information retrieval may be judged, and if so, the subsequent processing may be performed; if not, the corresponding voice data is directly discarded, or the voice data is not added into the list to be searched.

As one approach, it may be detected whether other intelligent search type applications or windows are open while voice data is being acquired. Alternatively, if turned on, the corresponding retrieved voice data may be used as the voice data for searching, and if not turned on, the retrieved voice data may be ignored (i.e., discarded or discarded).

For example, in a specific application scenario, when a user picks up a mobile phone to speak a piece of voice, as an implementation manner, the terminal device may read a monitoring event of an application program, detect whether an application program of a search class is in an open state, and if so, determine that the application program is in the search state, and further recognize the voice of the user to implement voice search. If a listening event is not captured for the search class application, it may be determined that it is not in the search state and the speech will be discarded.

As another embodiment, for some search-type applications, when a user speaks a voice to attempt to search, a reminder or an instruction of the application is usually received, and optionally, the terminal device may determine whether a prompt instruction exists within a period of time (including before speaking the voice, while speaking the voice, and within a short time after speaking the voice, for example, 5 seconds, 10 seconds, 20 seconds, and the like) during which the user speaks the voice, and if so, may determine that the terminal device is in a search state, for example, when the user searches a map of a certain place through voice, a voice prompt may pop up to remind the user to input a name of the place (destination name) to be searched; if not, the terminal device may determine not to be in the search state, for example, the user may not receive a similar prompt instruction during the call through the terminal device.

By judging whether the acquired voice data is used for searching or not, the voice data which is not used for searching can be prevented from being searched, the power consumption is reduced, and the standby time of the terminal equipment is prolonged.

Step S130: and if so, judging whether the voice data meets the target condition.

As one mode, if it is determined that the voice data is for voice search, it may be further determined whether the voice data satisfies the target condition. The target condition can be used for representing the search intention of the user which can be completely recognized according to the voice data.

It can be understood that, when a user searches through voice data, due to individual differences, some users speak faster or pronounce the speech data, so that the spoken voice data is unclear, that is, the search intention of the user cannot be completely recognized through the voice data. Then, in this case, if the search is still performed through the voice data, the search result may not be as expected by the user, and may cause an increase in power consumption to some extent. Then, as a way of improving the above-described problem, in order to increase the reliability of the search result and reduce power consumption, it may be determined whether the voice data used for the search satisfies the target condition. Alternatively, if the target condition is satisfied, the voice data may be used for a subsequent voice search, and if the target condition is not satisfied, the voice data may not be used for the subsequent voice search.

As one way, whether the voice data satisfies the target condition may be judged through a preset algorithm model. The preset algorithm model may be a Neural Network model obtained by training sample speech data of a large number of users, for example, a Neural Network model such as RNN (Recurrent Neural Network) or LSTM (Long Short-Term Memory Network), which is not illustrated herein. Optionally, if the voice data satisfies the target condition, the search intention of the user may be completely recognized according to the voice data, and if the voice data does not satisfy the target condition, the search intention of the user may not be completely recognized according to the voice data.

Alternatively, if it is determined that the speech data is not for a speech search, the search process will end. Power consumption due to redundant searches can be avoided.

Step S140: and if the target condition is met, searching a search result matched with the voice data and outputting the search result.

It is understood that if the voice data satisfies the target condition, a search result matching the voice data may be directly searched and output.

In the embodiment of the present application, the search result matched with the voice data includes, but is not limited to, pictures, texts, videos, audios, animations, and any combination form therebetween. Optionally, the output mode of the search result may be picture output, text output, voice output, ring output, animation pop-up or other multimedia presentation, and may be a combination of different output modes, for example, a combination of picture and text output; the manner of outputting the animation and the ring tone in combination is not illustrated and not limited herein.

Optionally, the search result may be displayed and output by a terminal device currently used for searching, or may also be displayed and output by another terminal device, for example, the search result is displayed remotely, which is not limited herein.

Step S150: and if the target condition is not met, detecting whether the network state meets a preset state or not.

It is understood that, during the search of the voice data by the user, the user experience is greatly reduced if the search is interrupted due to poor network signals, and the user may be required to repeatedly try the search process. In order to ensure the reliability of the searching process and enhance the user-friendly experience, under the condition that the voice data is judged not to meet the target condition, whether the current network state of the terminal equipment meets the preset state or not can be further detected, the searching interruption or termination caused by the sudden abnormality of the network can be prevented, and the power consumption of the terminal equipment is saved.

The preset state may be the strength of the network signal. As one mode, a network signal threshold may be set, and whether the network state satisfies the preset state may be detected by comparing the network signal of the terminal device with the signal threshold. Optionally, when the current network signal is greater than the threshold, determining that the network state meets a preset state; and if the current signal is not larger than the threshold value, judging that the network state does not meet the preset state.

As an embodiment, on the basis that the network signal is greater than the signal threshold, it may be determined whether the power of the terminal device is sufficient, for example, whether the power reaches a set threshold, and if so, it may be determined that the network status satisfies the preset status; if not, it can be determined that the network status does not satisfy the preset status.

Alternatively, the preset state may be a variation trend of the network signal. As an implementation mode, the variation trend of the network signal intensity can be counted through the terminal equipment. If the network signal is weaker and weaker, the terminal device may be in a network disconnection state soon, and further abnormal interruption of the voice search process may be caused, and in this case, it may be determined that the network state does not satisfy the preset state. Optionally, if the network signal is stronger, it may be determined that the network state satisfies the preset state.

It should be noted that, if the content that the user needs to search by voice is stored locally, for example, a certain song downloaded by voice search, the user can directly search by using the voice data meeting the target condition without detecting whether the network state meets the preset state.

Step S160: and if the preset state is met, correcting the voice data to obtain corrected voice data.

The correction processing refers to correcting the voice data that cannot completely recognize the search intention of the user, and includes semantic correction, semantic filling, and the like, which will be specifically described in the following embodiments to obtain corrected voice data. Alternatively, the corrected voice data may be understood as voice data that can more completely recognize the search intention of the user.

For example, in a specific application scenario, if a user says that the sentence "how to split a tire" is to search for the tire dismounting process, since the sentence includes the introduction "zha", the identification requirement of the user cannot be accurately identified, the sentence "how to split a tire" can be corrected into "how to split a tire" through correction processing, and thus, the search can be performed according to the sentence "how to split a tire", and the search intention of the user can be more completely identified.

By performing correction processing on the voice data when the preset state is satisfied, the reliability and accuracy of the search can be increased.

Step S170: searching for a search result matching the corrected voice data and outputting the search result.

As one mode, after correcting the voice data, a search result matching the obtained corrected voice data may be directly searched and output. The form and the output form of the search result may refer to the corresponding description in the step S140, and are not described herein again.

Step S180: and if the preset state is not met, discarding the voice data.

It can be understood that if the network state does not satisfy the preset state, the obtained voice data can be directly discarded, so that the problem of abnormal search caused by abnormal network interruption can be avoided, and the user experience is improved. Optionally, the terminal device may send a prompt message to prompt the user to search after the network state is normal, so as to avoid power consumption.

In the searching method provided by this embodiment, the voice data of the user in the interactive process is acquired, and then, if the voice data is used for information search, whether the voice data meets the target condition is judged through a preset algorithm model, and if the voice data does not meet the target condition, when it is detected that the network state meets the preset state, the voice data is corrected to obtain corrected voice data, and then, a search result matched with the corrected voice data is searched and the search result is output. By means of the method, under the condition that the voice data of the user does not meet the target conditions, the voice data are corrected to obtain corrected voice data, so that the search result matched with the corrected voice data is searched and output, the search is more accurate, and the user experience is improved.

As shown in fig. 3, a flowchart of a method of searching provided in another embodiment of the present application is shown, where the method includes:

step S210: and acquiring voice data of the user in the interactive process.

Step S220: and judging whether the voice data is used for information search.

Step S230: and if so, judging whether the voice data meets the target condition.

Optionally, if the search is not for voice search, the search process may be ended.

Step S240: and if the target condition is met, searching a search result matched with the voice data and outputting the search result.

Step S250: and if the target condition is not met, detecting whether the network state meets a preset state or not.

Step S261: and if the preset state is met, acquiring a preset target voice model.

The target voice model in the embodiment of the application is a model obtained by pre-training voice feature data of a user or historical voice feature data of the user. As one mode, historical speech data of the user may be acquired, and the historical speech data of the user may be input into the machine learning model to obtain the target speech model. Optionally, the machine may learn a target speech model (i.e., personalized) suitable for the speech data characteristics of the user according to the speech characteristics of the user, such as language, speaking habit, etc. Optionally, the target speech models corresponding to different users are different.

As one way, in the case that it is determined that the voice data of the user satisfies the preset state, a preset target voice model may be acquired so as to perform corresponding correction processing on the voice data of the user based on the target voice model.

Step S262: and correcting the voice data based on the target voice model to obtain corrected voice data.

As one way, after the target speech model is obtained, the speech data may be subjected to correction processing based on the target speech model to obtain corrected speech data. Optionally, a large amount of speech data of the user, i.e. speech data matching the speaking habits of the user, is stored in the target speech model. For example, if a user speaks too fast and often says "circuit diagram" as "electrogram", the target speech model corresponding to the user may automatically correct all "electrograms" spoken by the user into "circuit diagram". The voice data is corrected through a pre-customized target voice model matched with the voice data characteristics of the user, so that the searching speed can be increased, and meanwhile, the searching accuracy is improved.

Step S263: the corrected voice data is converted into text data.

Alternatively, the corrected speech data may be converted into text data using an existing speech recognition technique to facilitate searching based on the converted text data.

Step S264: and searching a search result matched with the text data and outputting the search result.

Optionally, the search result matched with the text data converted from the corrected voice data is searched, so that the search requirement of the user can be met, and the search result meeting the expectation of the user is obtained. For the style of the search result and the form of outputting the search result, reference may be made to the corresponding description in step S140 in the foregoing embodiment, which is not described herein again.

Step S270: and if the preset state is not met, discarding the voice data.

In the searching method provided by this embodiment, the voice data of the user in the interactive process is acquired, and then, if the voice data is used for information search, whether the voice data meets the target condition is judged through a preset algorithm model, and if the voice data does not meet the target condition, when it is detected that the network state meets the preset state, a preset target voice model is acquired, the voice data is corrected based on the target voice model to obtain corrected voice data, then, the corrected voice data is converted into text data, a search result matched with the text data is searched, and the search result is output. By the method, under the condition that the voice data of the user does not meet the target condition, the voice data is corrected through the target voice model to obtain corrected voice data, and then the corrected voice data is converted into text data, so that the search result matched with the text voice data is searched and output, and the search accuracy is improved.

As shown in fig. 4, a flowchart of a method of searching provided in another embodiment of the present application is shown, where the method includes:

step S310: and acquiring voice data of the user in the interactive process.

Step S320: and judging whether the voice data is used for information search.

Step S330: and if so, judging whether the voice data meets the target condition.

Step S340: and if the target condition is met, searching a search result matched with the voice data and outputting the search result.

Step S350: and if the target condition is not met, detecting whether the network state meets a preset state or not.

Step S361: and if the preset state is met, acquiring a preset target voice model.

Step S362: and correcting the voice data based on the target voice model to obtain corrected voice data.

Step S363: the corrected voice data is converted into text data.

Step S364: and correcting the text data by adopting a target text model to obtain text data with complete semantics.

It is understood that the correction of the voice data of the user can supplement the voice data of the user, for example, supplement the voice data which cannot be reflected due to the fast speed or unclear pronunciation of the user, so that the obtained voice data is complete. However, for some user voice data, if the voice data is not recorded in the target voice model, for example, the voice data that the user does not usually speak, after the user speaks such voice data, there may be a grammar error and the like, and further, in order to improve the accuracy of the search result, the target text model may be used to correct the text data to obtain text data with complete semantics. The target text model may refer to the existing speech recognition technology, and is not described herein again.

For example, in a specific application scenario, suppose the user's voice data is "electrogram, good? Once searching ", the voice data can be corrected into" circuit diagram, how good? Search for one. It will be appreciated that the corrected speech data may still be semantically erroneous, which may lead to inaccurate searches. Then, the correction processing may be continued on the text data converted from the corrected voice data, and optionally, "do me search for a circuit diagram? And text data with complete semantics can be searched according to the text data, so that the searching accuracy is improved.

Step S365: searching for a search result matched with the corrected text data and outputting the search result.

Referring to the above description, searching for a search result matched with the corrected text data may make the search result more accurate, according to the search intention of the user. Alternatively, the specific form of the search result matched with the corrected text data may be unlimited, for example, pictures, texts, voice, video, ring tones, advertisements, etc. or any combination thereof, and is not limited herein, and the description in the foregoing embodiments may be referred to specifically. Optionally, the output form of the search result may also refer to the description in the foregoing embodiments, and is not described herein again.

Step S370: and if the preset state is not met, discarding the voice data.

According to the searching method provided by the embodiment, under the condition that the voice data of the user does not meet the target condition, the voice data is corrected to obtain the corrected voice data, the corrected voice data is converted into the text data, and then the text data is corrected, so that the searching result matched with the text data after correction is searched and output, the searching accuracy and reliability are further improved, and the user experience is improved.

As shown in fig. 5, a flowchart of a method of searching provided by another embodiment of the present application is shown, where the method includes:

step S410: and acquiring voice data of the user in the interactive process.

Step S420: and judging whether the voice data is used for information search.

Step S430: and if so, judging whether the voice data meets the target condition.

Step S440: and if the target condition is met, searching a search result matched with the voice data and outputting the search result.

Step S450: and if the target condition is not met, detecting whether the network state meets a preset state or not.

Step S460: and if the preset state is met, acquiring the voiceprint characteristics of the voice data.

As one approach, assuming that the user speaks several languages or dialects, if the user changes one language, it may cause recognition errors in the speech search. For example, a Sichuan person speaking Mandarin or learning to perform a phonetic search in northeast China may cause inaccurate speech recognition due to inaccurate pronunciation. It should be noted that the pronunciation inaccuracy is not that the user pronounces words, but does not learn the expression of the language that the user wants to speak, for example, if a person in Sichuan says northeast and does not say northeast by himself, it is easy to cause the error of voice search.

In order to reduce such an error, in the case where it is detected that the voice data of the user is for information search and the network status satisfies the preset status, a voiceprint feature of the voice data of the user may be acquired. Optionally, the voiceprint characteristics may include a frequency, an intensity, a variation characteristic of a lapse of a sound pressure year, or a characteristic of an intensity and a frequency of the sound wave in a certain period of time corresponding to the voice data of the user. As an implementation manner, the voiceprint feature may be obtained by analyzing the voice data of the user by using a filter or the like, or may be obtained by using another method for obtaining the voiceprint feature, which is not limited herein.

By acquiring the voiceprint features of the voice data of the user, the search result matched with the voice data features of the user can be identified according to the voiceprint features of the user even if the user performs voice expression by using different languages or languages, and the user experience is improved.

Step S470: and judging whether the voiceprint features are preset voiceprint features.

Optionally, a voiceprint feature of a voice when the user performs a voice search using original voice data (i.e., data frequently used by the user or only voice data that the user should initially) may be obtained, and the obtained voiceprint feature is used as a preset voiceprint feature. When the voiceprint features of the voice data when the user performs voice search by adopting different languages or languages are obtained, the voiceprint features can be compared with the preset voiceprint features, and if the voiceprint features are the same, the voiceprint features can be judged to be the preset voiceprint features; if not, it can be determined that the voiceprint feature is not the preset voiceprint feature.

By judging whether the voiceprint features are preset voiceprint features or not, voice search of voice data which are not the preset voiceprint features can be avoided, and therefore resources can be saved.

Step S471: and if the voice data are the preset voiceprint characteristics, converting the voice data into target voice data.

The target speech data may be understood as the original speech data of the user, that is, no matter what language or language the user uses for the speech search, for each user, a target speech data suitable for the user may be adapted. Optionally, if a certain user searches by using different voice data, accurate voice search can be realized only by converting the data which is spoken by the user and is different from the target voice data corresponding to the user into the target voice data.

For example, in one particular application scenario, assume that a user is about "where to play on the weekend? "the voice data has many different expressions, for example, the user can express" weekend duhukui "in Sichuan, the user can express" medium weekend group kneading "in Lanzhou, and the user can express" this weekend dry-hakui "in Xinjiang. Optionally, no matter what language or language the user uses for the voice search, if the voiceprint feature of the voice data is the preset voiceprint feature, the voice data may be converted into the target voice data, and optionally, the target voice data of the user may be "where to play on the weekend".

Step S472: and carrying out correction processing on the target voice data to obtain corrected voice data.

The specific implementation of the correction processing on the target voice data may refer to the description in the foregoing embodiments, and is not described herein again.

Step S473: searching for a search result matching the corrected voice data and outputting the search result.

Step S474: and if the voice data is not the preset voiceprint characteristic, discarding the voice data.

It can be understood that if the voice print feature is not the preset voiceprint feature, the voice data may be voice data of another user or voice data counterfeited by a lawless person, so that in order to avoid false recognition, the voice data may be discarded, and the security of voice search is improved.

Step S480: and if the preset state is not met, discarding the voice data.

According to the searching method provided by the embodiment, under the condition that the voice data of the user does not meet the target condition, if the network state is detected to meet the preset state, the voiceprint feature of the voice data is obtained, then under the condition that the voiceprint feature is the preset voiceprint feature, the voice data is converted into the target voice data, then the target voice data is corrected, the corrected voice data is obtained, and therefore the searching result matched with the corrected voice data is searched and output, searching is more accurate, and user experience is improved.

As shown in fig. 6, a block diagram of a searching apparatus 500 provided in this embodiment of the present application is shown, where the apparatus 500 operates in a terminal device having a display screen or other audio or image output devices, and the terminal device may be an electronic device such as a smart phone, a tablet computer, a wearable smart terminal, and the apparatus 500 includes:

an obtaining module 510, configured to obtain voice data of the user during the interaction process.

A determining module 520, configured to determine whether the voice data meets a target condition through a preset algorithm model if the voice data is used for information search.

Wherein the target condition is used for representing the search intention of the user which can be completely recognized according to the voice data.

The processing module 530 is configured to, if the voice data does not meet the target condition, correct the voice data when it is detected that the network state meets a preset state, so as to obtain corrected voice data.

Optionally, the apparatus 500 may further include: the voice print characteristic acquisition module is used for acquiring voice print characteristics of the voice data; the judging unit is used for judging whether the voiceprint features are preset voiceprint features or not; and the processing unit is used for converting the voice data into target voice data if the target voice data is the target voice data.

As one way, the processing module 530 may be specifically configured to perform correction processing on the target voice data.

As a manner, the processing module 530 may be specifically configured to obtain a preset target speech model, where the target speech model is a model obtained through training of speech feature data of a user or historical speech feature data of the user; and carrying out correction processing on the voice data based on the target voice model.

Optionally, the apparatus 500 may further include: a conversion module for converting the corrected voice data into text data; and the correction processing module is used for correcting the text data by adopting a target text model so as to obtain text data with complete semantics.

And a searching module 540, configured to search for a search result matching the corrected voice data and output the search result.

As one way, the search module 540 may be specifically configured to convert the corrected voice data into text data; and searching a search result matched with the text data and outputting the search result.

Optionally, the searching module 540 may be further specifically configured to search for a search result matching the corrected text data and output the search result.

The search device provided in this embodiment obtains the voice data of the user during the interaction process, and then, if the voice data is used for information search, determines whether the voice data meets the target condition through a preset algorithm model, and if the voice data does not meet the target condition, when it is detected that the network state meets the preset state, performs correction processing on the voice data to obtain corrected voice data, and then searches for a search result matched with the corrected voice data and outputs the search result. By means of the method, under the condition that the voice data of the user does not meet the target conditions, the voice data are corrected to obtain corrected voice data, so that the search result matched with the corrected voice data is searched and output, the search is more accurate, and the user experience is improved.

The searching device provided by the embodiment of the application is used for realizing the corresponding searching method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again.

It can be clearly understood by those skilled in the art that the search apparatus provided in the embodiment of the present application can implement each process in the foregoing method embodiments, and for convenience and simplicity of description, the specific working processes of the apparatus and the module described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be in an electrical, mechanical or other form.

In addition, each functional module in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Referring to fig. 7, a block diagram of a terminal device 101 according to an embodiment of the present disclosure is shown. The terminal device 101 may be a terminal device capable of running an application, such as a smart phone, a tablet computer, and an electronic book. The terminal device 101 in the present application may include one or more of the following components: a processor 1012, a memory 1014, and one or more applications, wherein the one or more applications may be stored in the memory 1014 and configured to be executed by the one or more processors 1012, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

Processor 1012 may include one or more processing cores. The processor 1012 connects various parts within the entire terminal apparatus 101 using various interfaces and lines, and performs various functions of the terminal apparatus 101 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1014 and calling data stored in the memory 1014. Alternatively, the processor 1012 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1012 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is to be appreciated that the modem can be implemented solely using a communication chip without being integrated into the processor 1012.

The Memory 1014 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 1014 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1014 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the terminal device 101 during use (such as a phonebook, audio-video data, chat log data), and the like.

Referring to fig. 8, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable storage medium 700 has stored therein program code that can be called by a processor to execute the methods described in the above-described method embodiments.

The computer-readable storage medium 700 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer-readable storage medium 700 includes a non-volatile computer-readable storage medium. The computer readable storage medium 700 has storage space for program code 710 to perform any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 710 may be compressed, for example, in a suitable form.

To sum up, according to the search method, the search device, the terminal device, and the storage medium provided in the embodiments of the present application, voice data of a user during an interaction process is acquired, and then, if the voice data is used for information search, whether the voice data satisfies a target condition is determined by using a preset algorithm model, and if the voice data does not satisfy the target condition, when a network state is detected to satisfy a preset state, the voice data is corrected to obtain corrected voice data, and then, a search result matched with the corrected voice data is searched and the search result is output. By means of the method, under the condition that the voice data of the user does not meet the target conditions, the voice data are corrected to obtain corrected voice data, so that the search result matched with the corrected voice data is searched and output, the search is more accurate, and the user experience is improved.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of searching, the method comprising:

acquiring voice data of a user in an interaction process;

if the voice data is used for information search, judging whether the voice data meets a target condition through a preset algorithm model;

if the voice data do not meet the target condition, when the network state is detected to meet the preset state, the voice data are corrected to obtain corrected voice data;

searching for a search result matching the corrected voice data and outputting the search result.

2. The method of claim 1, wherein the step of performing correction processing on the voice data to obtain corrected voice data comprises:

acquiring a preset target voice model, wherein the target voice model is obtained by training voice characteristic data of a user or historical voice characteristic data of the user;

and correcting the voice data based on the target voice model to obtain corrected voice data.

3. The method according to claim 2, wherein the step of searching for a search result matching the corrected voice data and outputting the search result comprises:

converting the corrected voice data into text data;

and searching a search result matched with the text data and outputting the search result.

4. The method of claim 2, further comprising:

converting the corrected voice data into text data;

and correcting the text data by adopting a target text model to obtain text data with complete semantics.

5. The method according to claim 4, wherein the step of searching for a search result matching the corrected voice data and outputting the search result comprises:

searching for a search result matched with the corrected text data and outputting the search result.

6. The method according to any one of claims 1 to 5, wherein the step of performing correction processing on the speech data is preceded by:

acquiring voiceprint characteristics of the voice data;

judging whether the voiceprint features are preset voiceprint features or not;

if yes, converting the voice data into target voice data;

the step of performing correction processing on the voice data to obtain corrected voice data includes:

and carrying out correction processing on the target voice data to obtain corrected voice data.

7. The method according to any one of claims 1-6, wherein the target condition is used to characterize a search intention of the user that can be recognized completely from the speech data.

8. A search apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring voice data of a user in the interaction process;

the judging module is used for judging whether the voice data meet the target condition or not through a preset algorithm model if the voice data are used for information search;

the processing module is used for correcting the voice data to obtain corrected voice data when detecting that the network state meets a preset state if the voice data does not meet the target condition;

and the searching module is used for searching the searching result matched with the corrected voice data and outputting the searching result.

9. A terminal device, comprising:

a memory;

one or more processors coupled with the memory;

one or more programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-7.

10. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 7.