CN110784591A

CN110784591A - Intelligent voice automatic detection method, device and system

Info

Publication number: CN110784591A
Application number: CN201910908123.8A
Authority: CN
Inventors: 李钻达
Original assignee: FUJIAN NEW LAND SOFTWARE ENGINEERING Co Ltd
Current assignee: FUJIAN NEW LAND SOFTWARE ENGINEERING Co Ltd
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2020-02-11

Abstract

The invention discloses an intelligent voice automatic detection method, a device and a system, wherein a target number of a detection task is obtained according to task information issued by a user, and a system call module is called to dial the target number; judging whether the call is connected or not, and muting and recording the call; judging whether the recording can be finished or not through the configuration file, and if the recording can be finished, finishing the recording through an automatic testing tool; converting the format of the generated recording file, calling a voice recognition module and obtaining a voice recognition result; calling a semantic analysis module to intelligently analyze the voice recognition result and obtain an analysis result; judging whether the task is finished or not according to a result returned by the semantic analysis module; clicking a target key when the information returned by the semantic analysis module contains key information; the recognized voice recognition result, the key and the starting and ending time of the step task. The invention can automatically identify and record the conversation content and the result of the key recording file so as to determine whether the task execution is successful.

Description

Intelligent voice automatic detection method, device and system

Technical Field

The invention relates to the technical field of intelligent voice recognition, in particular to an intelligent voice automatic detection method, device and system.

Background

With the rapid development of artificial intelligence technology, speech recognition technology is becoming more and more powerful, wherein the smart platform (science news) is one of the strongest speech recognition platforms at present and is widely used. The problem to be solved by the Auto Speech recognition technology (ASR for short) in science and science news is to make the device Recognize the human Speech and "extract" the text information contained in the Speech. ASR technology plays an important role in current electronic devices, making electronic devices have the function of recognizing human speech, and making human-computer communication and interaction more convenient.

However, the automatic speech recognition technology of science and technology news only provides functions such as speech recognition, and can only operate real-time audio streams of existing audio files or microphones of equipment, and cannot perform speech recognition operation under the Android mobile phone call condition, and cannot perform subsequent detection. In addition, the automatic speech recognition technology of science news does not provide the record of the related results of task execution, and cannot provide detailed detection results.

Disclosure of Invention

The invention aims to solve the technical problem of how to provide an intelligent voice automatic detection method and system which can perform voice recognition operation under the condition of mobile phone conversation and can provide detailed detection results.

In order to solve the technical problems, the technical scheme of the invention is as follows:

an intelligent voice automatic detection method comprises the following steps:

acquiring a target number of a detection task according to task information issued by a user, and calling a system call module to dial the target number;

judging whether the call is connected, if so, muting and recording the call;

judging whether the recording can be finished or not through the configuration file, and if the recording can be finished, finishing the recording through an automatic testing tool;

converting the format of the generated recording file, transmitting the converted file to a voice recognition module, and calling the voice recognition module to obtain a voice recognition result;

calling a semantic analysis module to intelligently analyze the voice recognition result and obtain an analysis result;

judging whether the task is finished or not according to a result returned by the semantic analysis module;

clicking a target key when the information returned by the semantic analysis module contains key information;

and recording a task result, a recognized voice recognition result, a key and the start and end time of the step task.

Preferably, the step of calling the semantic analysis module to intelligently analyze the speech recognition result and obtain the analysis result includes:

calling a semantic analysis module through a network request, and transmitting task information and a voice recognition result of the current step to the semantic analysis module;

the semantic analysis module judges whether a successful target keyword exists or not, and if so, returns end information;

the semantic analysis module checks the task information association library to acquire menu association information;

the semantic analysis module divides the speech recognition result of the current step into words and phrases;

the semantic analysis module analyzes the task information and acquires a target of the current task information;

the semantic analysis module compares the phrases with the target by screening the phrases to obtain the similarity of each phrase, and then returns the key with the highest similarity.

Preferably, the step of converting the format of the generated recording file, transmitting the converted file to a voice recognition module, and calling the voice recognition module to obtain the voice recognition result includes:

searching a recording file meeting the conditions in a recording folder of the system according to the recording time during recording;

after finding out a sound recording file with a format suffix name of mp3 which meets the conditions, calling an open source audio conversion program integrated in the sound recording file, and converting the sound recording file into an audio stream;

and transmitting the audio stream file obtained by conversion to a voice recognition service, and acquiring a task result returned by the voice recognition service.

Preferably, the recording of the task result, the recognized voice recognition result, the key, and the start and end time of the step task includes:

if the result returned by the semantic analysis module contains task ending information, notifying the system to apply background service;

the system uses background service to call an automatic testing tool, and simulates a user to click and hang up the phone;

and the system terminates the detection task by applying a background service and records the voice content, the key-press process and the step execution time of each step.

Preferably, the step of obtaining a target number of the detection task according to task information issued by the user, and the calling the system call module to dial the target number includes:

configuring task information and task scripts by a user, and issuing the task information;

the system regularly acquires task information issued by a user, and notifies a background service to execute a task through a message communication mechanism after acquiring the task;

and the system application calls a target number in the task information by calling a call module of the system.

The invention also provides an intelligent voice automatic detection device, which comprises:

a background task module: acquiring a target number of a detection task according to task information issued by a user, and calling a system call module to dial the target number;

a recording module: judging whether the call is connected, if so, muting and recording the call;

automated test tool: judging whether the recording can be finished or not through the configuration file, and if the recording can be finished, finishing the recording;

the audio conversion module: converting the format of the generated recording file, and transmitting the converted file to a voice recognition module;

a voice recognition module: carrying out voice recognition on the audio to obtain a voice recognition result;

a semantic analysis module: carrying out intelligent analysis on the voice recognition result to obtain an analysis result, and clicking a target key when the returned information contains key information;

a judging module: judging whether the task is finished or not according to a result returned by the semantic analysis module;

a recording module: and recording a task result, a recognized voice recognition result, a key and the start and end time of the step task.

Preferably, the semantic analysis module includes:

the information receiving unit is used for receiving the task information and the voice recognition result of the current step;

keyword unit: judging whether a successful target keyword exists or not, and if so, returning to end information;

a viewing unit: checking the task information association library to acquire menu association information;

word segmentation unit: performing word segmentation on the voice recognition result of the current step, and dividing the voice recognition result into word groups;

an analysis unit: analyzing the task information to obtain a target of the current task information;

an alignment unit: and through screening the phrases, comparing the phrases with the target to obtain the similarity of each phrase, and then returning the key with the highest similarity.

Preferably, the audio conversion module includes:

a file searching unit: searching a recording file meeting the conditions in a recording folder of the system according to the recording time during recording;

a conversion unit: after finding out a sound recording file with a format suffix name of mp3 which meets the conditions, calling an open source audio conversion program integrated in the sound recording file, and converting the sound recording file into an audio stream;

a task acquisition unit: and transmitting the audio stream file obtained by conversion to a voice recognition service, and acquiring a task result returned by the voice recognition service.

Preferably, the recording module: if the result returned by the semantic analysis module contains task ending information, notifying the system to apply background service; the system uses background service to call an automatic testing tool, and simulates a user to click and hang up the phone; the system application background service terminates the detection task and records the voice content, the key process and the step execution time of each step;

the background task module comprises: a configuration unit: configuring task information and task scripts by a user, and issuing the task information; a system application unit: and regularly acquiring task information issued by a user, and informing a background service to execute a task through a message communication mechanism after the task is acquired.

An intelligent voice automated detection system comprising:

the background service module regularly accesses the task server and checks whether a task needs to be executed;

the task analysis module is used for analyzing the task information to obtain required task information and scripts;

the message communication module is used for carrying out communication transmission for the background service;

the automatic testing tool module is used for acquiring the content on the mobile phone page and simulating the clicking and sliding operations of a user;

the system application obtains the call recording file and then converts the existing format into an audio stream through the audio conversion module;

the database processing module is used for realizing the operations of adding, modifying, deleting and inquiring the database and recording the task information and the task result;

the network communication module requests the voice recognition service to recognize the audio stream of the call recording to obtain the recognition result of the audio stream; requesting semantic recognition service, transmitting the result of the voice recognition service to a semantic recognition module, and obtaining a recognition result;

the voice recognition module is used for recognizing the audio stream transmitted by the Android application and returning a character recognition result;

and the semantic analysis module analyzes the key with the highest similarity through the task information and the voice recognition character result transmitted by the system application, obtains the key result with the highest similarity through the comparison result of the association library and the task information and the voice recognition character result, and returns the result to the application to perform key operation.

By adopting the technical scheme, the mobile phone can automatically record sound, call voice recognition service, intelligently judge the key according to task requirements, automatically end the detection task, terminate the call, record the detection result and the like during the call. The method is used for automatic detection, and the mobile phones are controlled by the application to carry out operations such as automatic call making, automatic recording, automatic call content recognition, automatic key pressing and the like, and a plurality of mobile phones can be controlled to carry out intelligent voice automatic detection. In addition, by adopting the scheme, the result information such as the approximate conversation text content, the key, the recording file and the like of each step in the task can be recorded, and whether the task is successfully executed is determined according to the menu association library and the successful keyword.

Drawings

FIG. 1 is a flow chart of an embodiment of an intelligent voice automated detection method of the present invention;

FIG. 2 is a detailed flowchart of step S50 in FIG. 1;

FIG. 3 is a block diagram of an embodiment of an intelligent voice automated detection system according to the present invention.

In the figure, the system comprises a background service module 10, a task analysis module 20, a message communication module 30, an automatic test tool module 40, an audio conversion module 50, a database processing module 60, a network communication module 70, a voice recognition module 80 and a semantic analysis module 90.

Detailed Description

The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Referring to fig. 1, an embodiment of the present invention provides an intelligent voice automatic detection method, including:

s10: acquiring a target number of a detection task according to task information issued by a user, and calling a system call module to dial the target number;

specifically, step S10 includes:

configuring necessary task information and task scripts by a user, and issuing the task information;

S20: judging whether the call is connected, if so, muting and recording the call;

monitoring the mobile phone page through an automatic testing tool, judging that the call is connected when the mobile phone page is subjected to specific change, and continuing subsequent operation; automatic microphone muting and automatic recording starting operation are carried out through an automatic testing tool, and the automatic microphone muting avoids other external interference so as to obtain a better recording effect;

s30: judging whether the recording can be finished or not through the configuration file, and if the recording can be finished, finishing the recording through an automatic testing tool;

s40: converting the format of the generated recording file, transmitting the converted file to a voice recognition module, and calling the voice recognition module to obtain a voice recognition result;

specifically, a recording file meeting the conditions is searched in a recording folder of the system according to the recording time during recording;

S50: calling a semantic analysis module to intelligently analyze the voice recognition result and obtain an analysis result;

referring to fig. 2, specifically, step S50 includes:

s51: calling a semantic analysis module through a network request, and transmitting task information and a voice recognition result of the current step to the semantic analysis module;

s52: the semantic analysis module judges whether a successful target keyword exists or not, and if so, returns end information;

s53: the semantic analysis module checks the task information association library to acquire menu association information;

s54: the semantic analysis module divides the speech recognition result of the current step into words and phrases;

s55: the semantic analysis module analyzes the task information and acquires a target of the current task information;

s56: the semantic analysis module compares the phrases with the target by screening the phrases to obtain the similarity of each phrase, and then returns the key with the highest similarity.

S60: judging whether the task is finished or not according to a result returned by the semantic analysis module;

s70: clicking a target key when the information returned by the semantic analysis module contains key information;

if the result returned by the semantic analysis service does not contain a task termination instruction, the Android application background service is notified to click a target key; and the Android application background service calls an automatic test tool to simulate a user to click a target key.

S80: and recording a task result, a recognized voice recognition result, a key and the start and end time of the step task.

Specifically, step S80 includes:

The invention also provides an intelligent voice automatic detection device which comprises the following components: the method comprises the following steps:

the background task module comprises: a configuration unit: configuring necessary task information and task scripts by a user, and issuing the task information; a system application unit: and regularly acquiring task information issued by a user, and informing a background service to execute a task through a message communication mechanism after the task is acquired.

specifically, the audio conversion module includes:

specifically, the semantic analysis module includes:

Specifically, the recording module: if the result returned by the semantic analysis module contains task ending information, notifying the system to apply background service; the system uses background service to call an automatic testing tool, and simulates a user to click and hang up the phone; the system application background service terminates the detection task and records the voice content, the key process and the step execution time of each step;

referring to fig. 3, another embodiment of the present invention further provides an intelligent voice automatic detection system, including:

a background service module 10, which can access the task server regularly to check whether there is any task to be executed;

a task analysis module 20, which analyzes the task information to obtain the required task information and script;

a message communication module 30 through which communication between background services is transmitted;

the automatic test tool module 40 can acquire the content on the mobile phone page and can simulate the basic operations of clicking, sliding and the like of a user;

the audio conversion module 50 is used for converting the existing format into an audio stream through the Android application after the Android application acquires the call recording file when the system generates the call recording file;

the database processing module 60 is used for realizing the operations of adding, modifying, deleting and inquiring the database and recording the task information and the task result;

a network communication module 70, which is mainly used for network communication and can request a voice recognition service to recognize the audio stream of the call recording to obtain the recognition result of the audio stream; semantic recognition service can be requested, the result of the voice recognition service is transmitted to a semantic recognition module, and a recognition result is obtained;

a voice recognition module 80, which is mainly used for recognizing the audio stream transmitted by the Android application and returning a character recognition result;

and the semantic analysis module 90 analyzes the key with the highest similarity through the task information and the voice recognition character result transmitted by the Android application, acquires the key result with the highest similarity through a comparison result of the association library and the task information and the voice recognition character result, and returns the result to the Android application to perform key operation.

The scheme provides a block diagram of a software architecture of the system, solves the problem that recording and call content recognition cannot be carried out in a call state, and realizes call recording, voice recognition and intelligent key detection in the call state; if the existing universal voice recognition platform is adopted, the call content recognition can not be carried out in the call state; if the manual voice detection is adopted, the labor cost is very high, and the manual voice detection tasks of a plurality of mobile phones cannot be simultaneously carried out, and occasionally some errors occur. By adopting the scheme, automatic detection can be carried out through the method, the mobile phones are controlled through application to carry out operations such as automatic dialing, automatic recording, automatic call content identification, automatic key pressing and the like, multiple mobile phones can be controlled to carry out intelligent voice automatic detection, and the success rate is more than nine-five percent under the condition that the content of the configuration file is correct.

The problem of recording the task information is solved through the scheme, and the recording of information such as the task information, the task result, the conversation content and the like is realized. If the current universal voice recognition platform is adopted, the task information cannot be recorded. If the voice detection is carried out manually, the labor cost is very high, and common workers cannot carry out operations such as call content recording at normal speed for a long time, by adopting the scheme, result information such as rough call text content, keys, recording files and the like of each step in the task can be recorded, and whether the task is successfully executed is determined according to the menu association library and the success keywords.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.

Claims

1. An intelligent voice automatic detection method is characterized in that: the method comprises the following steps:

judging whether the call is connected, if so, muting and recording the call;

2. The intelligent voice automation detection method of claim 1, characterized in that: calling a semantic analysis module, carrying out intelligent analysis on a voice recognition result, and acquiring an analysis result, wherein the method comprises the following steps:

3. The intelligent voice automation detection method of claim 1, characterized in that: converting the format of the generated recording file, transmitting the converted file to a voice recognition module, calling the voice recognition module, and acquiring a voice recognition result, wherein the voice recognition result comprises the following steps:

4. The intelligent voice automation detection method of claim 1, characterized in that: recording a task result, wherein the recognized voice recognition result, the key and the start and end time of the step task comprise:

5. The intelligent voice automatic detection method according to claim 1, wherein the step of obtaining a target number of the detection task according to task information issued by a user, and the calling the system call module to dial the target number comprises:

6. The utility model provides an automatic detection device of intelligence pronunciation which characterized in that: the method comprises the following steps:

7. The intelligent voice automation detection device of claim 6, wherein: the semantic analysis module comprises:

8. The intelligent voice automation detection device of claim 6, wherein: the audio conversion module includes:

9. The intelligent voice automation detection device of claim 6,

a recording module: if the result returned by the semantic analysis module contains task ending information, notifying the system to apply background service; the system uses background service to call an automatic testing tool, and simulates a user to click and hang up the phone; the system application background service terminates the detection task and records the voice content, the key process and the step execution time of each step;

the background task module comprises:

a configuration unit: configuring task information and task scripts by a user, and issuing the task information;

a system application unit: and regularly acquiring task information issued by a user, and informing a background service to execute a task through a message communication mechanism after the task is acquired.

10. An intelligent voice automated detection system, comprising: