CN109523990A

CN109523990A - Speech detection method and device

Info

Publication number: CN109523990A
Application number: CN201910054694.XA
Authority: CN
Inventors: 李鸣; 肖云; 官世良; 陈海宾; 马春宇
Original assignee: FUTURE TV Co Ltd
Current assignee: FUTURE TV Co Ltd
Priority date: 2019-01-21
Filing date: 2019-01-21
Publication date: 2019-03-26
Anticipated expiration: 2039-01-21
Also published as: CN109523990B

Abstract

The embodiment of the present invention provides a kind of speech detection method and device, is applied to detection terminal, which includes response detection instruction, and is played out to content to be detected corresponding with the detection instruction；The audio-frequency information treated when detection content plays out is obtained, and the audio-frequency information is converted into text information to be measured；Inquiry resource to be checked corresponding with the text information to be measured, judge whether the resource to be checked is default resource corresponding with the content to be detected, if default resource corresponding with the content to be detected, then determine that this speech detection passes through and will test in result write-in speech detection list.The present invention can effectively improve speech detection efficiency, reduce the artificial participation in detection process, it is ensured that speech detection precision.

Description

Speech detection method and device

Technical field

The present invention relates to information detection technology fields, in particular to a kind of speech detection method and device.

Background technique

In existing voice assistant (such as Tencent's cloud voice, news rumours sound, Baidu's voice, think of must speed, cloud knows sound) exploitation In the process, it needs to carry out accuracy, reliability test to speech recognition result, to ensure the integrality of the function of voice assistant, But existing voice assistant relies primarily on be accomplished manually in the detection process, testing cost cumbersome so as to cause detection process Height, and detection accuracy not can guarantee yet.

Summary of the invention

In view of this, the present invention provides a kind of speech detection method and device, the above problem can effectively solve the problem that, reduce language Sound testing cost improves detection accuracy.

Present pre-ferred embodiments provide a kind of speech detection method, are applied to detection terminal, the speech detection method Include:

Detection instruction is responded, and content to be detected corresponding with the detection instruction is played out；

The audio-frequency information treated when detection content plays out is obtained, and the audio-frequency information is converted into text envelope to be measured Breath；

Inquiry resource to be checked corresponding with the text information to be measured, judge the resource to be checked whether be with it is described to be checked The corresponding default resource of content is surveyed, if so, determining that this speech detection passes through and will test result write-in speech detection list In.

In the selection of present pre-ferred embodiments, the detection terminal can be communicated with controlling terminal, the response inspection The step of surveying instruction, comprising:

Receive and respond the detection instruction that user is sent based on the controlling terminal.

In the selection of present pre-ferred embodiments, judge whether the resource to be checked is corresponding with the content to be detected Default resource the step of, comprising:

The identification information of the resource to be checked is compared with the identification information of the default resource, if the money to be checked The matching degree of the identification information in source and the identification information of the default resource is greater than the first preset value, then determines the resource to be checked For default resource corresponding with the content to be detected.

In the selection of present pre-ferred embodiments, the resource type of the resource to be checked and the default resource includes answering With one of program, song, video or article.

In the selection of present pre-ferred embodiments, the speech detection method further include:

When the resource to be checked is not default resource corresponding with the content to be detected, judging result is shown To prompt this detection of user not pass through, and content to be detected corresponding with the detection instruction is played again, until Testing result about the content to be detected is to pass through or repeat to stop detection when detection number is more than the second preset value.

In the selection of present pre-ferred embodiments, the method also includes:

Response detection list configuration instruction；

It is instructed based on the list configuration and calls speech detection configured list and show；

Obtain the configuration information that user input based on shown voice messaging configured list and to completing with the voice inspection postponed List is surveyed to be saved.

In the selection of present pre-ferred embodiments, including the sequence of content to be detected in the voice messaging configured list Number, the identification information of the identification information of content to be detected, resource to be checked, resource to be checked, whether pass through, execute one in the time Or multinomial configuration information.

Present pre-ferred embodiments also provide a kind of speech detection device, are applied to detection terminal, comprising:

Respond module is instructed, is broadcast for responding detection instruction, and to content to be detected corresponding with the detection instruction It puts；

Audio obtains module, for obtaining the audio-frequency information treated when detection content plays out, and by the audio-frequency information Be converted to text information to be measured；

Information detecting module judges the money to be checked for inquiring resource to be checked corresponding with the text information to be measured Whether source is default resource corresponding with the content to be detected, if so, determining that this speech detection passes through and will test knot Fruit is written in speech detection list.

In the selection of present pre-ferred embodiments, the detection terminal can be communicated with controlling terminal, and described instruction is rung Module is answered to be also used to receive and respond the detection instruction that user is sent based on the controlling terminal.

In the selection of present pre-ferred embodiments, the information detecting module is also used to the mark of the resource to be checked Information is compared with the identification information of the default resource, if the identification information of the resource to be checked and the default resource The matching degree of identification information is greater than the first preset value, then determines that the resource to be checked is corresponding with the content to be detected default Resource.

Compared with prior art, speech detection method and device provided in an embodiment of the present invention, can be effectively reduced voice Artificial participation and speech detection cost in detection process, while speech detection precision can be also effectively improved, it is adapted to difference Speech detection demand under scene.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 is the frame structure schematic diagram of detection terminal provided in an embodiment of the present invention.

Fig. 2 is the flow diagram of speech detection method provided in an embodiment of the present invention.

Fig. 3 is the schematic diagram of speech detection list provided in an embodiment of the present invention.

Fig. 4 is the frame structure schematic diagram of speech detection device provided in an embodiment of the present invention.

Icon: 10- detects terminal；100- speech detection device；110- instructs respond module；120- audio obtains module； 130- information detecting module；200- memory；300- storage control；400- processor.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.

Therefore, the detailed description of the embodiment of the present invention provided in the accompanying drawings is not intended to limit below claimed The scope of the present invention, but be merely representative of selected embodiment of the invention.Based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without creative efforts belongs to the model that the present invention protects It encloses.

It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.

As shown in Figure 1, the side of the detection terminal 10 for application speech detection method provided in an embodiment of the present invention and device Mount structure schematic diagram.Wherein, the detection terminal 10 include speech detection device 100, memory 200, storage control 300 with And processor 400.Wherein, the detection terminal 10 can be but not limited to computer, mobile internet surfing equipment (Mobile Internet Device, MID) etc. with processing function electronic equipment, can also be server etc..

Optionally, the memory 200, storage control 300, each element of processor 400 are direct or indirect between each other Ground is electrically connected, to realize the transmission or interaction of data.For example, passing through one or more communication bus or letter between these elements Number line, which is realized, to be electrically connected.The speech detection device 100 includes that at least one can be stored in institute in the form of software or firmware State in memory 200 or be solidificated in the software function module in the operating system of the detection terminal 10.The processor 400 exists The memory 200 is accessed under the control of the storage control 300, with for execute stored in the memory 200 can Execution module, such as software function module included by the speech detection device 100 and computer program etc..

It is appreciated that structure shown in FIG. 1 is only to illustrate, the detection terminal 10 may also include more than shown in Fig. 1 Perhaps less component or with the configuration different from shown in Fig. 1.Each component shown in Fig. 1 can use hardware, software Or combinations thereof realize.

According further to actual demand, the detection terminal 10 can also be connect with one or more controlling terminals, optionally, In the present embodiment, the controlling terminal, which can be but not limited to remote controler etc., can be realized the electronic equipment of instruction transmitting-receiving.

Further, Fig. 2 is please referred to, the embodiment of the present invention, which also provides, a kind of can be applied to the detection terminal 10 Speech detection method.It should be noted that speech detection method of the present invention is not with Fig. 2 and as described below specific Sequence is limitation.It should be appreciated that the sequence of speech detection method part step of the present invention can be according to practical need It is exchanged with each other or part steps therein also can be omitted or delete.

Step S11 responds detection instruction, and plays out to content to be detected corresponding with the detection instruction；

Step S12 is obtained and is treated audio-frequency information when detection content plays out, and the audio-frequency information is converted to be measured Text information；

Step S13, inquiry resource to be checked corresponding with the text information to be measured, judge the resource to be checked whether be and The corresponding default resource of the content to be detected, if so, S14 is thened follow the steps, conversely, thening follow the steps S15；

Step S14 determines that this speech detection passes through and will test in result write-in speech detection list；

Step S15, judging result are shown to prompt this detection of user not pass through, and to the detection instruction pair The content to be detected answered is played again, until the testing result about the content to be detected is to pass through or repeat to detect secondary Stop detection when number is more than the second preset value.

Compared with prior art, the speech detection method provided in above-mentioned steps S11- step S15, can be realized to voice Quick, the efficient detection of the function of assistant, while artificial participation and testing cost in detection process is greatly reduced, improve inspection Intelligence during survey, it is ensured that the reliability of testing result.

In detail, in step S11 respond detection instruction mode can there are many, can be the mode manually clicked and trigger Detection instruction is also possible to so that the detection response of terminal 10 by acquiring and identifying that the phonetic order of user refers to detection to realize The response of order can also be reception and respond the detection instruction etc. that user is sent based on the controlling terminal, and the present embodiment is herein With no restrictions.

As an implementation, when carrying out speech detection, openable voice assistant system view, and then open The excel test case file of the local windows clicks content to be detected manually to trigger the detection instruction and realize voice inspection It surveys, it is to be understood that the content to be detected can be saved in a manner of excel test case file, can also be tied in detection Shu Shi will test result and be stored in excel test case file.

In addition, the content to be detected can be but not limited to be pre-stored in the detection terminal 10 in the form of text In, it is such as saved, is used for when carrying out speech detection in the form of excel test case file, it can be to described to be detected interior Appearance plays out.

In step S12, the audio-frequency information can be acquired by the radio equipment being set in detection terminal 10, can also To be to be sent to the detection terminal 10 by other equipment acquisition, the present embodiment is herein with no restrictions.It should be noted that by institute Conversion can be realized by voice assistant to be detected when audio-frequency information is converted to text information to be measured by stating.

In step S13, the resource type of the resource to be checked and the default resource be can include but is not limited to using journey One of sequence, song, video or article can specifically be set according to actual needs.According to actual needs, judgement it is described to It may include: by the mark of the resource to be checked that whether inspection resource, which is the process of default resource corresponding with the content to be detected, Information is compared with the identification information of the default resource, if the identification information of the resource to be checked and the default resource The matching degree of identification information is greater than the first preset value, then determines that the resource to be checked is corresponding with the content to be detected default Resource.

Wherein, Fig. 3 is please referred to, the identification information can be but be limited to title, the network address etc. of the resource to be checked, For example, the identification information can be movie name, if Great Britain is fought to the finish, it is also possible to network address, such ashttp://47.98.36.22/ 121936.mp3；Can also be identical as the content to be checked, nowadays as how, the present embodiment does not limit day Tianjin weather herein System.It is understood that Fig. 3 is only to illustrate, details are not described herein for the present embodiment.

As an implementation, by the identification information of the identification information of the resource to be checked and the default resource into When row compares, each character in each identification information can be compared one by one, to be calculated between the two according to comparison result Matching degree.Wherein, first preset value can be but not limited to 50%, 70% or 99% etc., and the matching degree Q can pass through Formula Q=(identical characters number)/(total number of characters in identification information) is calculated, for example, presetting money referring to Fig. 3 The identification information in source is " fighting to the finish Great Britain ", and the identification information of resource to be measured is " I wants to see the film of Cheng Long ", then according to be measured The identification information for the playing resource that text information searches is " fighting to the finish Great Britain ", that is, can determine that the identification information of the playing resource It is greater than the first preset value with the matching degree of the pre-set text information.

The testing result in step S14 include audio-frequency information, text information to be measured, detect whether by, detection hold It is one or more in the row time.For example, detecting whether to be can be reserved for by this as PASS, guarantor on the contrary when detection passes through Save as NO.

Further, second preset value in step S15 can be but not limited to 1 time, it is 3 inferior.When actual implementation, When repeating step S11- step S14, can also be passed through by the raisings detection such as size, broadcasting speed of adjusting broadcast sound volume A possibility that.

It is further to note that in order to improve speech detection efficiency, it, can be first before executing step S11- step S15 It is first configured to carry out the speech detection list of speech detection, concrete configuration process includes: response detection list configuration instruction；Base Calling voice messaging configured list is instructed in the list configuration and is shown；It obtains user and is based on shown voice messaging configured list The configuration information of input and preservation.Wherein, can include but is not limited in the voice messaging configured list it is as shown in Figure 3 to The serial number of detection content, content to be detected, the identification information of resource to be checked, resource to be checked identification information, whether pass through, execute One or more configuration informations in time.

In addition to this, the specific detection process being related to when carrying out speech detection can flexibly be set according to actual needs It is fixed, any one row-column list item data may be selected such as to play voice automatically；Input is appointed in content to be detected that can be shown in Fig. 3 What text and the broadcasting that content to be detected is realized by selection voice broadcast button；Can by setting volume value and pace value come Realize use-case scene requirement；Only voice plays check box to non-selected default, only plays the list item data of selection；Selection default Only voice plays check box, and will test after detection result and saved and be presented to user etc., and the present embodiment is herein It is not particularly limited.

Further, Fig. 4 is please referred to, the embodiment of the present invention also provides a kind of speech detection device 100, voice inspection Surveying device 100 includes that agreement splits module 110, business datum obtains module 120 and information detecting module 130.

Described instruction respond module 110, for responding detection instruction, and to content to be detected corresponding with the detection instruction It plays out and described instruction respond module 110 can also be used to receiving and responding what user was sent based on the controlling terminal Detection instruction.In the present embodiment, step S11 can be executed by described instruction respond module 110, and detailed process please refers to step S11, Details are not described herein.

The audio obtains module 120, for obtaining the audio-frequency information treated when detection content plays out, and by the sound Frequency information is converted to text information to be measured；In the present embodiment, step S12 can obtain module 120 by the audio and execute, specific mistake Journey please refers to step S12, and details are not described herein.

The information comparison module 130, for inquiring resource to be checked corresponding with the text information to be measured, described in judgement Whether resource to be checked is default resource corresponding with the content to be detected, if default money corresponding with the content to be detected Source then determines that this speech detection passes through and will test in result write-in speech detection list；And the information detecting module 130 can also be used to for the identification information of the resource to be checked being compared with the identification information of the default resource, if it is described to The matching degree for examining the identification information of resource and the identification information of the default resource is greater than the first preset value, then determines described to be checked Resource is default resource corresponding with the content to be detected.In the present embodiment, step S13 and step S15 can be by the information Comparison module 130 executes, and detailed process please refers to step S13 and step S15, and details are not described herein.

In conclusion speech detection method provided in an embodiment of the present invention and device, can be effectively reduced speech detection mistake Artificial participation and speech detection cost in journey, while speech detection precision can be also effectively improved, it is adapted under different scenes Speech detection demand.

In the description of the present invention, term " setting ", " connected ", " connection " shall be understood in a broad sense, for example, it may be fixed Connection, may be a detachable connection, or be integrally connected；It can be mechanical connection, be also possible to be electrically connected；It can be directly It is connected, the connection inside two elements can also be can be indirectly connected through an intermediary.For the ordinary skill of this field For personnel, the concrete meaning of above-mentioned term in the present invention can be understood with concrete condition.

In several embodiments provided by the embodiment of the present invention, it should be understood that disclosed device and method, it can also To realize by other means.Device and method embodiment described above is only schematical, for example, the stream in attached drawing Journey figure and block diagram show that the device of preset quantity embodiment according to the present invention, method and computer program product may be real Existing architecture, function and operation.In this regard, each box in flowchart or block diagram can represent module, a journey A part of sequence section or code.A part of the module, section or code include one or preset quantity for realizing Defined logic function.

It should also be noted that function marked in the box can also be with difference in some implementations as replacement The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes It can also execute in the opposite order, this depends on the function involved.It is also noted that in block diagram and or flow chart The combination of box in each box and block diagram and or flow chart, can function or movement as defined in executing it is dedicated Hardware based system is realized, or can be realized using a combination of dedicated hardware and computer instructions.

The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of speech detection method, it is applied to detection terminal, which is characterized in that the speech detection method includes:

The audio-frequency information treated when detection content plays out is obtained, and the audio-frequency information is converted into text information to be measured；

Inquiry resource to be checked corresponding with the text information to be measured, judge the resource to be checked whether be with it is described to be detected interior Hold corresponding default resource, if so, determining that this speech detection passes through and will test in result write-in speech detection list.

2. speech detection method according to claim 1, which is characterized in that the detection terminal can be logical with controlling terminal The step of letter, the response detection instruction, comprising:

3. speech detection method according to claim 1, which is characterized in that judge the resource to be checked whether be with it is described The step of content to be detected corresponding default resource, comprising:

The identification information of the resource to be checked is compared with the identification information of the default resource, if the resource to be checked The matching degree of the identification information of identification information and the default resource is greater than the first preset value, then determine the resource to be checked be with The corresponding default resource of the content to be detected.

4. speech detection method according to claim 3, which is characterized in that the resource to be checked and the default resource Resource type includes one of application program, song or video.

5. speech detection method according to claim 1, which is characterized in that the speech detection method further include:

When the resource to be checked is not default resource corresponding with the content to be detected, judging result is shown to mention Show that this detection of user does not pass through, and content to be detected corresponding with the detection instruction played again, until about The testing result of the content to be detected is to pass through or repeat to stop detection when detection number is more than the second preset value.

6. speech detection method according to claim 1, which is characterized in that the method also includes:

Response detection list configuration instruction；

Obtain the configuration information that user input based on shown voice messaging configured list and to completing with the speech detection column postponed Table is saved.

7. speech detection method according to claim 6, which is characterized in that include in the voice messaging configured list to The serial number of detection content, content to be detected, the identification information of resource to be checked, resource to be checked identification information, whether pass through, execute One or more configuration informations in time.

8. a kind of speech detection device is applied to detection terminal characterized by comprising

Respond module is instructed, is played out for responding detection instruction, and to content to be detected corresponding with the detection instruction；

Audio obtains module, converts for obtaining the audio-frequency information treated when detection content plays out, and by the audio-frequency information For text information to be measured；

Information detecting module judges that the resource to be checked is for inquiring resource to be checked corresponding with the text information to be measured It is no for default resource corresponding with the content to be detected, write if so, determining that this speech detection passes through and will test result Enter in speech detection list.

9. speech detection device according to claim 8, which is characterized in that the detection terminal can be logical with controlling terminal Letter, described instruction respond module are also used to receive and respond the detection instruction that user is sent based on the controlling terminal.

10. speech detection device according to claim 8, which is characterized in that the information detecting module is also used to institute The identification information for stating resource to be checked is compared with the identification information of the default resource, if the identification information of the resource to be checked Be greater than the first preset value with the matching degree of the identification information of the default resource, then determine the resource to be checked be with it is described to be checked Survey the corresponding default resource of content.