CN108153875B

CN108153875B - Corpus processing method and device, intelligent sound box and storage medium

Info

Publication number: CN108153875B
Application number: CN201711429605.2A
Authority: CN
Inventors: 常哲珲; 黄开粤; 高铭瑜
Original assignee: Beijing Kingsoft Internet Security Software Co Ltd
Current assignee: Beijing Kingsoft Internet Security Software Co Ltd
Priority date: 2017-12-26
Filing date: 2017-12-26
Publication date: 2022-03-11
Anticipated expiration: 2037-12-26
Also published as: CN108153875A

Abstract

The embodiment of the invention discloses a corpus processing method, a corpus processing device, an intelligent sound box and a storage medium, wherein the method comprises the following steps: determining a mapping relation between the first corpus text and the first characteristic voice according to the acquired first corpus text and the first characteristic voice to form a mapping relation library; converting the received second characteristic voice into a second corpus text, extracting keywords of the second corpus text, and determining a matching rule according to the keywords; and searching the response voice matched with the second corpus text in the mapping relation library according to the matching rule. So that the user can obtain diversified response voices.

Description

Corpus processing method and device, intelligent sound box and storage medium

Technical Field

The embodiment of the invention relates to a voice and corpus processing technology, in particular to a corpus processing method and device, an intelligent sound box and a storage medium.

Background

Corpora, which generally means that large-scale language instances cannot be actually observed in statistical natural language processing, simply replaces corpora with text and substitutes the context in the text for the context in the real world language, and a collection of texts is generally called a corpus.

In general corpus processing, a mapping relationship is usually formed between a corpus and a corresponding voice, and when it is recognized that the voice input by a user includes the corresponding corpus, the corresponding voice is called. The voice matching is in a fixed form, the voice response rule is single, the intelligent degree is low, and the user experience is poor.

Disclosure of Invention

The embodiment of the invention provides a corpus processing method and device, an intelligent sound box and a storage medium, so that a user can obtain diversified response voices.

In a first aspect, an embodiment of the present invention provides a corpus processing method, where the method includes:

determining a mapping relation between the first corpus text and the first characteristic voice according to the acquired first corpus text and the first characteristic voice to form a mapping relation library;

converting the received second characteristic voice into a second corpus text, extracting keywords of the second corpus text, and determining a matching rule according to the keywords;

and searching the response voice matched with the second corpus text in the mapping relation library according to the matching rule.

In a second aspect, an embodiment of the present invention further provides a corpus processing apparatus, where the apparatus includes:

the mapping relation library determining module is used for determining the mapping relation between the first corpus text and the first characteristic voice according to the acquired first corpus text and the first characteristic voice so as to form a mapping relation library;

the matching rule determining module is used for converting the received second characteristic voice into a second corpus text, extracting keywords of the second corpus text and determining a matching rule according to the keywords;

and the response voice determining module is used for searching the response voice matched with the second corpus text in the mapping relation library according to the matching rule.

In a third aspect, an embodiment of the present invention further provides an intelligent sound box, including a memory, a processor, and a computer program that is stored in the memory and can be run on the processor, where when the processor executes the program, the corpus processing method according to any one of the embodiments of the present invention is implemented.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the corpus processing method according to any one of the embodiments of the present invention.

In the embodiment of the invention, a mapping relation between a first corpus text and a first characteristic voice is determined according to the acquired first corpus text and the first characteristic voice to form a mapping relation library, a received second characteristic voice is converted into a second corpus text, a keyword of the second corpus text is extracted, a matching rule is determined according to the keyword, and a response voice matched with the second corpus text is searched in the mapping relation library according to the matching rule. Combining the keywords and the matching rules in the corpus, the user can obtain diversified response voices.

Drawings

FIG. 1a is a flowchart illustrating a corpus processing method according to a first embodiment of the present invention;

FIG. 1b is a diagram illustrating a mapping relation library according to an embodiment of the present invention;

FIG. 1c is a flow chart of a secondary speech extraction process in accordance with one embodiment of the present invention;

FIG. 2a is a flowchart illustrating a corpus processing method according to a second embodiment of the present invention;

FIG. 2b is a diagram illustrating scores of voices corresponding to a corpus according to a second embodiment of the present invention;

FIG. 2c is a diagram illustrating scores of voices corresponding to a keyword corpus according to a second embodiment of the present invention;

FIG. 2d is a diagram of a constructed response network suitable for use in the second embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a corpus processing apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an intelligent sound box in the fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1a is a flowchart of a corpus processing method according to an embodiment of the present invention, where the method is applicable to a case of performing a voice response for a voice input by a user, and the method can be executed by a corpus processing apparatus according to an embodiment of the present invention, and the apparatus can be implemented in a software and/or hardware manner. Referring to fig. 1a, the method may specifically include the steps of:

s110, determining a mapping relation between the first corpus text and the first characteristic voice according to the acquired first corpus text and the first characteristic voice to form a mapping relation library.

The mapping relation library stores the matching relation between the corpus text and the characteristic voice, and in the mapping relation library, the characteristic voice corresponding to the corpus text can be acquired through the corpus text, and similarly, the characteristic voice corresponding to the corpus text can also be acquired according to the characteristic voice. And the language material texts forming the mapping relation library are called first language material texts, the characteristic voices forming the mapping relation library are called first characteristic voices, and the number of the first language material texts and the number of the first characteristic voices are at least two respectively. The more the number of the first corpus text and the number of the first feature speech are, the more comprehensive the content of the mapping relation library is, and the higher the accuracy of the response speech acquired according to the mapping relation library is.

Specifically, the mapping relation library may be obtained in the following manner, which is only a preferred implementation manner provided in the embodiments of the present invention, and no limitation is made to a specific obtaining manner of the mapping relation library.

(1) Obtaining a self-defined first corpus text; inputting a first characteristic voice based on the self-defined first corpus text; and determining the mapping relation between the self-defined first corpus and the first characteristic voice to form a mapping relation library.

Optionally, the customized first corpus text is preset by a developer of the smart speaker, specifically, the first corpus text is designed in advance according to requirements of a user, a background research and development worker inputs the first corpus text into the speaker server, the speaker server classifies content of the first corpus text, and a corresponding corpus is established for the first corpus texts of different types. Illustratively, the corpus may be named with the corresponding corpus type. The first feature speech includes an original speech entered for the first corpus text, and if the first corpus text is a query corpus, the first feature speech may be entered only for a response corpus of the query corpus. For example, a person who enters characteristic voice is called an "anchor", the same type of first corpus text may be entered by the same anchor, a user may select a corpus text of the anchor that the user likes by trying to listen to sounds of several anchors, and the user may also select a corpus text of the anchor that the user enters by using a specific application installed on a smart speaker or a terminal device that has a binding relationship with the smart speaker.

In a specific example, when a customized first corpus text "do you are monkeys", the first corpus text is added to the query library according to keywords such as "you", "yes", "monkeys", and "do" therein; and the response corpus aiming at the first corpus text is 'you are monkeys', and the response corpus is added into a response library corresponding to the question library. Receiving a first characteristic voice recorded aiming at a first language material text, namely, a voice recorded aiming at a specific answer language material and recorded at a specific anchor, wherein the voice and the answer language material have a corresponding relation and the aim is to call the corresponding voice when corresponding language material information is identified.

In a specific example, fig. 1b shows a structural diagram of a mapping relation library, where the mapping relation library includes a corpus and a corpus, the corpus is composed of 6 first corpus texts, the corpus is composed of 6 first feature voices, and each first corpus text and its corresponding first feature voice form a set of mapping relations.

(2) Receiving input first characteristic voice; recognizing the first characteristic voice as a corresponding first corpus text; and determining the mapping relation between the first corpus text and the first characteristic voice to form a mapping relation library.

Specifically, a first characteristic voice recorded by the anchor is received through a voice recording module of the intelligent sound box, and then the first characteristic voice is subjected to automatic voice recognition technology to identify and output corpus information, and the corpus information is formed into a corresponding first corpus text.

It should be noted that, a forming process of the mapping relation library in the embodiment of the present invention is a forming process of a voice packet, and a user may determine a voice packet required by the user according to aspects of the voice packet, such as the sound quality, the tone color, and the content.

Optionally, the obtaining process of the first characteristic voice may also be implemented by obtaining a corpus text from a network resource. Specifically, research personnel of the intelligent sound box obtain the current highly popular speech on the network through big data analysis and processing, and then arrange the speech into corresponding corpus texts. In a specific example, taking the obtained original corpus texts corresponding to the utterance with higher current popularity as examples, which are a1, a2, A3 and a4, the response corpus corresponding to the four corpus texts is B1, B2, B3, B4, B5, B6 and B7, where there is no strict one-to-one correspondence between the original corpus text and the response corpus text. The specific application scenario may be that a1, a2, A3 and a4 respectively represent evaluation queries of song X in four aspects of music style, word making, composition, singing and the like for the internet friends, and B1, B2, B3, B4, B5, B6 and B7 respectively represent answers of 7 internet friends to the evaluation queries in four aspects. The research and development personnel record the corresponding relations of A1, A2, A3 and A4 and B1, B2, B3, B4, B5, B6 and B7 into the intelligent sound box, and when setting requirements exist, a corresponding response corpus in B1, B2, B3, B4, B5, B6 and B7 is subjected to voice recognition and then converted into first characteristic voice.

S120, converting the received second characteristic voice into a second corpus text, extracting keywords of the second corpus text, and determining a matching rule according to the keywords.

The recorded voice of the user is called as second characteristic voice, and the specific application scenario is to monitor the second characteristic voice recorded by the user and respond with the first characteristic voice. Converting the received second characteristic speech into a second corpus text, wherein the specific conversion method can be realized by a speech recognition technology, for example: feature extraction technology, pattern matching criterion, model training technology and the like.

And extracting key words in the second corpus text, wherein the key words can be subject words, verbs, nouns, moods and the like in the second corpus text. In a specific example, the keywords representing the user attitudes may be: don't, not, you've, good, not i want, etc.; if the second corpus text is "how we want to go to the new zhujiang city", the keywords may be "i", "go", "new zhujiang city", and "how to go". Determining matching rules according to the keywords, wherein the matching rules are different for different second corpus texts, and in the specific example, according to the keywords in "how we want to go to the new city of the pearl river", the specific matching rules may be: the shortest time, the least walking and the least transfer.

S130, searching the response voice matched with the second corpus text in the mapping relation library according to the matching rule.

Specifically, based on the determined matching rule, the response voice matched with the second corpus text is searched in the mapping relation library. In a specific example, when the second corpus text is "how we want to go to the new city of the zhujiang river", three response voices exist in the mapping relation library, which are "taxi taking", "550 road to 545 road", and "No. 5 subway line", respectively, and the time consumption is shortest according to the matching rule determined by the keyword, and the second response voice is determined to be "No. 5 subway line" in the mapping relation library.

On the basis of the above embodiment, after searching the response voice matching with the second corpus text in the mapping relation library according to the matching rule, the method further includes: broadcasting the response voice, and receiving and analyzing feedback information of the user to the response voice within a set time range; and updating the priority of the response corpus corresponding to the response voice according to the words which are used for evaluating the attitude of the user in the user feedback information.

Specifically, after the response voice matched with the second corpus text is searched in the mapping relation library according to the matching rule, the response voice corresponding to the second corpus text is broadcasted to the user, and feedback information of the user to the response voice in a set time range is acquired. In a specific example, the feedback information of the user may include evaluation information for the responding voice, such as: "I am dissatisfied", "I do not like", and "simply too happy", etc. In an actual application scenario, whether the user feeds back the response voice depends on the user, in the scheme of the embodiment of the invention, when the user has feedback information on the response voice, the feedback information is actively acquired, and if the user does not feed back the response voice, the feedback information cannot be acquired after the response voice is broadcasted. Illustratively, the process of obtaining feedback information is referred to as a secondary speech extraction process.

And after feedback information is acquired during secondary voice recovery, updating the priority of the response corpus corresponding to the response voice according to words which are used for evaluating the attitude of the user in the user feedback information. The priority may represent the utility of the responsive voice, the high priority represents the high utility, the low priority represents the low utility, and illustratively, the priority may be represented by positive and negative feedback values, the positive feedback value represents the improvement of the priority of the responsive voice, and the negative feedback value represents the reduction of the priority of the responsive voice. In the words representing the user attitude, the positive words correspond to positive feedback values, and the negative words correspond to negative feedback values. And updating the priority of the response linguistic data corresponding to the response speech according to the words representing the attitude of the user, so that more choices are provided for the user, and the provided response speech is more in line with the habit and preference of the user.

In a specific example, fig. 1c shows a flowchart of a secondary speech extraction, and in fig. 1c, the mapping relation library includes the corpus and the speech library in fig. 1 b. The form of outputting the response voice can be broadcasting the response voice, then collecting the second feature voice for the second time in a set time range, extracting the key words, matching the extracted key words with the key word library, receiving and analyzing feedback information of the response voice, updating the priority of the response language material corresponding to the response voice according to words which are used for assessing user attitudes in the feedback information, associating the priority with the positive and negative feedback values, updating the priority of the response language material, and calling the language material library and the feature voice library to output the response voice.

Example two

Fig. 2a is a flowchart of a corpus processing method according to a second embodiment of the present invention, and in this embodiment, "determining a matching rule according to the keyword" is optimized based on the foregoing embodiment. Referring to fig. 2a, the method may specifically include the steps of:

s210, determining a mapping relation between the first corpus text and the first characteristic voice according to the acquired first corpus text and the first characteristic voice to form a mapping relation library.

S220, converting the received second characteristic voice into a second language material text, and extracting keywords of the second language material text.

And S230, determining the task request type corresponding to the second corpus text according to the keyword.

The task request types comprise food strategy inquiry types, chat request types, geographic position request types and weather condition request types, and the task request types corresponding to the second corpus texts are determined according to the keywords. In a specific example, if the second corpus text is "how do i want to eat the curry hot pot", the keywords are "eat", "curry hot pot" and "how do go", and the task request type corresponding to the second corpus text is determined to be the food route request type.

And S240, respectively establishing matching rules corresponding to the task request types according to set standards.

Specifically, the set standard may be implemented by the user, and the set standard is input to the sound box server, and a matching rule corresponding to the task request type is established according to the standard. Illustratively, if the task request type is a food route request type, the matching rule is a matching rule A; if the task request type is a chat request type, the matching rule is a matching rule B; if the task request type is the geographic position request type, the matching rule is a matching rule C; and if the task request type is a weather condition request type, the matching rule is a matching rule D.

In a specific example, after the corresponding response corpus is matched by using the first matching rule to respond, the matching rule defined by calling different response corpora by using a rule different from the first matching rule after receiving the same user voice in a certain period of time is established. The application scenario may be: the first matching rule is used for matching a first response corpus, the second matching rule is used for matching a second response corpus, the third matching rule is used for matching a third response corpus, the fourth matching rule is used for matching a fourth response corpus and the like, then the priorities of the four response corpuses are counted according to daily use records of a user and positive and negative feedback records, and the priorities are updated according to the user in real time.

Optionally, a current user record library and a network user record library are obtained, where the current user record library and the network user record library are database usage states and loudspeaker server update data formed by combining records of long-term usage of a current user, records made in response to the usage heat of corpora in the network. Optionally, the popularity of the response corpus in the network refers to the freshness of the response corpus in the network. The positive and negative feedback values reflect the current usage habits of the user and record the usage habits, and in a specific example, the positive and negative feedback values further include the usage heat of the corpus in the network, the freshness of the network, and the like, and the user habits, the usage heat, the freshness, the number of times of occurrence of the same corpus in the time region, and other values respectively correspond to certain scores, such as 40%, 20%, 15%, 5%. "other" may refer to the age score and style score of the response corpus, etc. If the response corpus is lost, the positive and negative feedback values are 0, wherein the response corpus may not be known or the response voice is not successfully converted into corpus. In a specific example, fig. 2b shows a schematic diagram of scores of voices corresponding to a corpus, where 260 represents a first feature voice in the voice 1 region, the score of each voice is a record of usage habits of a current user, a higher score indicates a higher probability of being called, and the score is updated according to a positive and negative feedback value, that is, the score is updated and changed in real time. Specifically, the score may be used to characterize the priority of each speech, with a high score indicating a high priority and a low score indicating a low priority. In this specific example, each speech is a speech in the corpus 1 area.

Fig. 2c is a schematic diagram illustrating the scores of the voices corresponding to the keyword corpus, where 270 represents the first corpus text in

corpus

1, 280 represents the first feature voice in voice 1, the higher the score is, the higher the probability that the corpus is called is, the score is updated in real time, and the score is updated according to the positive and negative feedback value transformation of the received corpus corresponding to the voice. In this specific example, each corpus is a corpus in the corpus 1 area, and each speech is a speech in the corpus 1 area.

Fig. 2d shows a constructed response network diagram, where 290 represents a second corpus text corresponding to the second feature speech, 291 represents a response corpus in the response corpus 1 area, 292 represents a response speech in the response speech 1 area, and if the second feature speech is "how i want to go to the new zhuang city, ask what kind of nice and nice tweed is near the new zhuang city, and if the second feature speech is raining near the new zhuang city", then the corpora matched in the corpus 1 area may be: the linguistic data 1a is 'how we want to go to the new pearl river city', the linguistic data 1b is 'what is enjoyable near the new pearl river city', the linguistic data 1c is 'what is enjoyable near the new pearl river city', and the linguistic data 1d is 'how much we are in the rain near the new pearl river city'. Taking the corpus 1a as an example, the keyword 1a is "zhujiang new city", and the keyword 1b is "how to go", then a matching rule is determined according to the keyword and the request type corresponding to the keyword, and a response corpus and an echo voice are determined according to the matching rule.

It should be noted that, when counting scores of the same corpus or different corpora in different aspects (such as using the heat and freshness), corresponding pairs of counted objects may be the same or different, and therefore, the sum of the scores is not necessarily related to 100%. The scores mentioned in the examples of the present invention are only used as a preferred embodiment, and the technical scheme of the present invention is not particularly limited.

And S250, searching the response voice matched with the second corpus text in the mapping relation library according to the matching rule.

In the embodiment of the invention, the task request type corresponding to the second corpus text is determined according to the keywords, and then the matching rules corresponding to the task request type are respectively established according to the set standard. Different matching rules provide more choices for the user when obtaining the response voice, and the voice response is more intelligent and more meets the requirements of the user.

Optionally, on the basis of the above technical solution, the searching for the response voice matched with the second corpus text in the mapping relationship library according to the matching rule includes: collecting the use frequency of the user to the response corpus in a set time period, and determining the priority of the response corpus according to the use frequency and the matching rule; and establishing a response network for the mapping relation library according to the priority, determining the priority of the second corpus text based on the response network and the priority, and searching for response voice matched with the second corpus text.

The method comprises the steps of selecting a set time period, such as one week, collecting the use frequency of a user on response linguistic data in one week, determining the priority of the response linguistic data by combining the use frequency of the response linguistic data with a determined matching rule, establishing a response network for the mapping relation library according to the priority, determining the priority of a second linguistic data text based on the response network and the priority, and searching for response voice matched with the second linguistic data text. The priority is determined by the use frequency of the response corpus and the matching rule, and response voice more conforming to the requirements of users is provided. Optionally, the user may download a favorite voice library from the speaker server into the user's smart speaker, and may also download an intelligent voice that is not suitable for the age group.

In a specific example, when the answer network is constructed, the corpus 1a, the corpus 1b, the corpus 1c, and the corpus 1d correspond to the corresponding voices 1a, 1b, 1c, and 1d, respectively, in the previous period, when the related voices are called, the integrity of the voices needs to be determined, and when the voices are damaged, the address of the next stage of voice is called in the same area and the next stage of voice is put on line. If speech 1a is corrupted, the address of speech 1b is invoked. And extracting response information according to the characteristic information and the related positive and negative feedback values, and if the response information is confirmed, preparing a second corpus text responding to the user. If the response corpus text does not have the corresponding first characteristic voice, the default voice converter is used for converting the response corpus into the response voice and outputting the response voice to the intelligent sound box.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a corpus processing apparatus according to a third embodiment of the present invention, which is suitable for executing a corpus processing method according to the third embodiment of the present invention. As shown in fig. 3, the apparatus may specifically include:

a mapping relation library determining module 310, configured to determine a mapping relation between the first corpus text and the first feature speech according to the obtained first corpus text and the first feature speech to form a mapping relation library;

a matching rule determining module 320, configured to convert the received second feature speech into a second corpus text, extract a keyword of the second corpus text, and determine a matching rule according to the keyword;

and the response voice determining module 330 is configured to search, according to the matching rule, the mapping relation library for a response voice matching the second corpus text.

Further, the matching rule determining module 320 is specifically configured to:

determining a task request type corresponding to the second corpus text according to the keyword;

and respectively establishing matching rules corresponding to the task request types according to set standards.

Further, the response voice determination module 330 is specifically configured to:

collecting the use frequency of the user to the response corpus in a set time period, and determining the priority of the response corpus according to the use frequency and the matching rule;

and establishing a response network for the mapping relation library according to the priority, determining the priority of the second corpus text based on the response network and the priority, and searching for response voice matched with the second corpus text.

Further, the method also comprises the following steps:

the response voice broadcasting module is used for broadcasting the response voice after searching the response voice matched with the second corpus text in the mapping relation library according to the matching rule, and receiving and analyzing feedback information of the user to the response voice in a set time range;

and the updating module is used for updating the priority of the response linguistic data corresponding to the response voice according to the words which are used for evaluating the attitude of the user in the user feedback information.

Further, the mapping relation library determining module 310 is specifically configured to:

obtaining a self-defined first corpus text;

inputting a first characteristic voice based on the self-defined first corpus text;

and determining the mapping relation between the self-defined first corpus and the first characteristic voice to form a mapping relation library.

receiving input first characteristic voice;

recognizing the first characteristic voice as a corresponding first corpus text;

and determining the mapping relation between the first corpus text and the first characteristic voice to form a mapping relation library.

The corpus processing device provided by the embodiment of the invention can execute the corpus processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 4 is a schematic structural diagram of an intelligent sound box according to a fourth embodiment of the present invention. Fig. 4 illustrates a block diagram of an exemplary smart sound box 12 suitable for use in implementing embodiments of the present invention. The smart sound box 12 shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiment of the present invention.

As shown in fig. 4, smart speaker 12 is embodied in the form of a general purpose computing device. The components of smart sound box 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Smart speaker 12 typically includes a variety of computer system readable media. These media may be any available media that may be accessed by smart sound box 12 and include both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Smart sound box 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Smart sound box 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with smart sound box 12, and/or with any devices (e.g., network card, modem, etc.) that enable smart sound box 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, smart sound box 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 20. As shown, network adapter 20 communicates with the other modules of smart sound box 12 via bus 18. It should be appreciated that although not shown in fig. 4, other hardware and/or software modules may be used in conjunction with smart sound box 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the corpus processing method provided by the embodiment of the present invention:

that is, the processing unit implements, when executing the program: determining a mapping relation between the first corpus text and the first characteristic voice according to the acquired first corpus text and the first characteristic voice to form a mapping relation library; converting the received second characteristic voice into a second corpus text, extracting keywords of the second corpus text, and determining a matching rule according to the keywords; and searching the response voice matched with the second corpus text in the mapping relation library according to the matching rule.

EXAMPLE five

The fifth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the corpus processing method provided in all embodiments of the present invention:

that is, the program when executed by the processor implements: determining a mapping relation between the first corpus text and the first characteristic voice according to the acquired first corpus text and the first characteristic voice to form a mapping relation library; converting the received second characteristic voice into a second corpus text, extracting keywords of the second corpus text, and determining a matching rule according to the keywords; and searching the response voice matched with the second corpus text in the mapping relation library according to the matching rule. .

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A corpus processing method, comprising:

the number of the first corpus texts and the number of the first characteristic voices are at least two respectively;

searching the response voice matched with the second corpus text in the mapping relation library according to the matching rule;

the determining, according to the obtained first corpus text and the first feature speech, a mapping relationship between the first corpus text and the first feature speech to form a mapping relationship library, including: obtaining a self-defined first corpus text; inputting a first characteristic voice based on the self-defined first corpus text; determining a mapping relation between the user-defined first corpus and the first characteristic voice to form a mapping relation library; the user-defined first corpus text is preset by developers of the intelligent sound box, background developers input the first corpus text into a sound box server, the sound box server classifies the content of the first corpus text and establishes corresponding corpora for the first corpus texts of different types; the first characteristic voice comprises original voice input aiming at a first language material text, the first language material text of the same type is input by the same anchor, and a user selects the language material text input by the sound of the favorite anchor by auditioning the sound of a plurality of anchors or selects the language material text input by the sound of a certain anchor by a specific application program installed on a smart sound box or a terminal device in binding relation with the smart sound box.

2. The method of claim 1, wherein determining matching rules based on the keywords comprises:

3. The method according to claim 1, wherein said searching the mapping relation library for the response speech matching with the second corpus text according to the matching rule comprises:

4. The method according to any one of claims 1 to 3, further comprising, after searching the mapping relation library for the responding voice matching with the second corpus text according to the matching rule:

broadcasting the response voice, and receiving and analyzing feedback information of the user to the response voice within a set time range;

and updating the priority of the response corpus corresponding to the response voice according to the words which are used for evaluating the attitude of the user in the user feedback information.

5. The method according to claim 1, wherein the determining a mapping relationship between the first corpus text and the first feature speech according to the obtained first corpus text and the first feature speech to form a mapping relationship library further comprises:

receiving input first characteristic voice;

6. A corpus processing apparatus, comprising:

the response voice determining module is used for searching the response voice matched with the second corpus text in the mapping relation library according to the matching rule;

the determining, according to the obtained first corpus text and the first feature speech, a mapping relationship between the first corpus text and the first feature speech to form a mapping relationship library, including: obtaining a self-defined first corpus text; inputting a first characteristic voice based on the self-defined first corpus text; determining a mapping relation between the user-defined first corpus and the first characteristic voice to form a mapping relation library;

the user-defined first corpus text is preset by developers of the intelligent sound box, background developers input the first corpus text into a sound box server, the sound box server classifies the content of the first corpus text and establishes corresponding corpora for the first corpus texts of different types; the first characteristic voice comprises original voice input aiming at a first language material text, the first language material text of the same type is input by the same anchor, and a user selects the language material text input by the sound of the favorite anchor by auditioning the sound of a plurality of anchors or selects the language material text input by the sound of a certain anchor by a specific application program installed on a smart sound box or a terminal device in binding relation with the smart sound box.

7. The apparatus of claim 6, wherein the matching rule determining module is specifically configured to:

8. A smart sound box comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1-5 when executing the program.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.