CN111128233A

CN111128233A - Recording detection method and device, electronic equipment and storage medium

Info

Publication number: CN111128233A
Application number: CN201910970151.2A
Authority: CN
Inventors: 李德大; 林梓棱
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2019-10-12
Filing date: 2019-10-12
Publication date: 2020-05-08

Abstract

A method of sound recording detection, the method comprising: acquiring a target recording file of a service worker; judging whether the voiceprint information of the target recording file contains the voiceprint information of the service personnel; if the voiceprint information of the target recording file contains the voiceprint information of the service personnel, extracting target keywords matched with preset keywords, wherein the preset keywords are related to the first scene type; obtaining a second scene type according to the sound scene recognition model; if the first scene type is consistent with the second scene type, acquiring first playing time; judging whether the target keyword is abnormal or not according to the first playing time and the time length of the target sound recording file; if the target keyword is not abnormal, judging whether the target audio file is abnormal or not; and if the target sound recording file is abnormal, determining the target sound recording file as a false sound recording file. The invention also provides a recording detection device, electronic equipment and a storage medium. The invention can accurately detect the false recording file.

Description

Recording detection method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of sound recording detection technologies, and in particular, to a sound recording detection method and apparatus, an electronic device, and a storage medium.

Background

At present, many companies require a service person to record the operation condition by using a recording device during operation, and to grade the service quality of the service person by using a stored recording file.

However, in practice, it has been found that there are cheating actions by the service personnel, such as: preparing sound recordings in advance and adding the sound recordings to the actual job scene, such as: recording before the operation is carried out or recording after the operation is finished. These cheating actions may affect the confidence level of the scoring of the quality of service.

Therefore, how to detect the cheating behavior of the recording is an urgent technical problem to be solved.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a recording detection method, apparatus, electronic device and storage medium, which can detect a false recording file more accurately.

A first aspect of the present invention provides a recording detection method, including:

acquiring a target recording file of a service worker;

judging whether the voiceprint information of the target recording file contains the voiceprint information of the service personnel;

if the voiceprint information of the target recording file comprises the voiceprint information of the service personnel, extracting a target keyword matched with a preset keyword from the target recording file, wherein the preset keyword is related to a first scene type corresponding to the target recording file;

inputting the sound features extracted from the target sound recording file into a sound scene recognition model to obtain a second scene type;

if the first scene type is consistent with the second scene type, acquiring first playing time of the target keyword in the target sound recording file;

judging whether the target keyword is abnormal or not according to the first playing time and the time length of the target sound recording file;

if the target keyword is not abnormal, judging whether the target sound recording file is abnormal or not according to the first playing time, the target keyword and the preset keyword;

and if the target sound recording file is abnormal, determining that the target sound recording file is a false sound recording file.

In a possible implementation manner, the determining whether the target keyword is abnormal according to the first playing time and the time length of the target audio file includes:

judging whether the time length of the target sound recording file is smaller than a preset time length threshold value or not;

if the time length of the target sound recording file is smaller than a preset time length threshold, determining the time interval between two adjacent target keywords according to the first playing time, and obtaining the time interval between a plurality of groups of adjacent target keywords;

judging whether the time interval of the adjacent target keywords is smaller than a preset time interval threshold or not according to the time interval of each group of the adjacent target keywords;

if the time interval of the adjacent target keywords is smaller than a preset time interval threshold, determining the time interval of the adjacent target keywords as an abnormal time interval;

judging whether the number of the abnormal time intervals is greater than a preset number threshold value or not;

if the number of the abnormal time intervals is larger than a preset number threshold, calculating a first ratio of the playing times of the target keyword in a first preset time range to the playing times of the target keyword in the time length, and calculating a second ratio of the playing times of the target keyword in a second preset time range to the playing times of the target keyword in the time length;

if the first ratio or the second ratio is larger than a preset ratio threshold, determining that the target keyword is abnormal; or

And if the first ratio and the second ratio are not larger than a preset ratio threshold, determining that the target keyword has no abnormality.

In a possible implementation manner, the determining whether the target audio file is abnormal according to the first playing time, the target keyword, and the preset keyword includes:

acquiring a plurality of historical recording files of the service personnel, wherein the scene type corresponding to the historical recording files is consistent with the first scene type;

extracting historical keywords matched with the preset keywords from the historical sound recording files aiming at each historical sound recording file;

acquiring second playing time of the historical keywords in the historical sound recording file;

determining the file similarity of the historical sound recording file and the target sound recording file according to the first playing time, the second playing time, the historical keywords and the target keywords;

judging whether a target historical sound recording file with the file similarity larger than a preset file similarity threshold exists in a plurality of historical sound recording files or not;

and if a target historical sound recording file with the file similarity larger than a preset file similarity threshold exists in the plurality of historical sound recording files, determining that the target sound recording file is abnormal.

In a possible implementation manner, the determining, according to the first playing time, the second playing time, the history keyword, and the target keyword, the file similarity between the history audio file and the target audio file includes:

determining two adjacent historical keywords as historical keyword groups, and determining two adjacent target keywords as target keyword groups;

judging whether the historical key phrases consistent with the target key phrase exist in the plurality of historical key phrases;

if the historical keyword group consistent with the target keyword group exists in the plurality of historical keyword groups, determining the keyword similarity of the historical keyword group and the target keyword group according to the first playing time and the second playing time;

and determining the file similarity of the target sound recording file and the historical sound recording file according to the keyword similarity.

In a possible implementation manner, the determining, according to the first playing time and the second playing time, the keyword similarity between the historical keyword group and the target keyword group includes:

determining a first playing time interval of two target keywords of the target keyword group according to the first playing time;

determining a second playing time interval of two historical keywords of the historical keyword group according to the second playing time;

and calculating the keyword similarity of the historical keyword group and the target keyword group by using a similarity calculation method according to the first playing time interval and the second playing time interval.

In a possible implementation manner, the determining the file similarity between the target sound recording file and the historical sound recording file according to the keyword similarity includes:

if the keyword similarity is greater than a preset keyword similarity threshold, determining the target keyword group as a similar keyword group;

calculating a third ratio of the number of the similar key phrases to the number of the target key phrases;

and determining the third ratio as the file similarity of the target sound recording file and the historical sound recording file.

In one possible implementation, the method further includes:

acquiring a video file corresponding to the target sound recording file;

judging whether a face image of a client exists in the video file or not by using a face recognition technology;

and if the face image of the client does not exist in the video file, determining that the target sound recording file is a false sound recording file.

A second aspect of the present invention provides an apparatus for detecting a recording, the apparatus comprising:

the acquisition module is used for acquiring a target recording file of a service staff;

the first judgment module is used for judging whether the voiceprint information of the target sound recording file contains the voiceprint information of the service personnel;

the extracting module is used for extracting a target keyword matched with a preset keyword from the target sound recording file if the voiceprint information of the target sound recording file comprises the voiceprint information of the service personnel, wherein the preset keyword is related to a first scene type corresponding to the target sound recording file;

the input module is used for inputting the sound features extracted from the target sound recording file into a sound scene recognition model to obtain a second scene type;

the obtaining module is further configured to obtain a first playing time of the target keyword in the target audio file if the first scene type is consistent with the second scene type;

the second judgment module is used for judging whether the target keyword is abnormal or not according to the first playing time and the time length of the target sound recording file;

a third judging module, configured to, if there is no abnormality in the target keyword, judge whether there is an abnormality in the target audio file according to the first playing time, the target keyword, and the preset keyword;

and the determining module is used for determining the target sound recording file as a false sound recording file if the target sound recording file is abnormal.

A third aspect of the invention provides an electronic device comprising a processor and a memory, the processor being configured to implement the recording detection method when executing a computer program stored in the memory.

A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the sound recording detection method.

By the technical scheme, the target recording file of the service personnel can be obtained; judging whether the voiceprint information of the target recording file contains the voiceprint information of the service personnel; if the voiceprint information of the target recording file comprises the voiceprint information of the service personnel, extracting a target keyword matched with a preset keyword from the target recording file, wherein the preset keyword is related to a first scene type corresponding to the target recording file; inputting the sound features extracted from the target sound recording file into a sound scene recognition model to obtain a second scene type; if the first scene type is consistent with the second scene type, acquiring first playing time of the target keyword in the target sound recording file; judging whether the target keyword is abnormal or not according to the first playing time and the time length of the target sound recording file; if the target keyword is not abnormal, judging whether the target sound recording file is abnormal or not according to the first playing time, the target keyword and the preset keyword; and if the target sound recording file is abnormal, determining that the target sound recording file is a false sound recording file. Therefore, in the invention, the sound recording can be detected in multiple directions by combining the voiceprint information of the service personnel, the scene type corresponding to the sound recording file, the time length of the sound recording file, the target keyword, the preset keyword, the first playing time and other factors, and the false sound recording file can be detected more accurately.

Drawings

FIG. 1 is a flowchart illustrating a recording detection method according to a preferred embodiment of the present invention.

FIG. 2 is a functional block diagram of a recording detection apparatus according to a preferred embodiment of the present invention.

Fig. 3 is a schematic structural diagram of an electronic device implementing a recording detection method according to a preferred embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The recording detection method of the embodiment of the invention is applied to the electronic equipment, and can also be applied to a hardware environment formed by the electronic equipment and a server connected with the electronic equipment through a network, and the hardware environment is executed by the server and the electronic equipment together. Networks include, but are not limited to: a wide area network, a metropolitan area network, or a local area network.

A server may refer to a computer system that provides services to other devices (e.g., electronic devices) in a network. A personal computer may also be called a server if it can externally provide a File Transfer Protocol (FTP) service. In a narrow sense, a server refers to a high-performance computer, which can provide services to the outside through a network, and compared with a common personal computer, the server has higher requirements on stability, security, performance and the like, and therefore, hardware such as a CPU, a chipset, a memory, a disk system, a network and the like is different from that of the common personal computer.

The electronic device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like. The electronic device may also include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. The user device includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.

Referring to fig. 1, fig. 1 is a flowchart illustrating a recording detection method according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed, and some steps may be omitted.

And S11, the electronic equipment acquires the target sound recording file of the service personnel.

Wherein. The target recording file is the recording file of the service personnel, and is the recording file which is subjected to service quality grading and reaches the grade above, namely, a plurality of keywords related to the operation scene exist in the recording content of the recording file.

And S12, the electronic equipment judges whether the voiceprint information of the target sound recording file contains the voiceprint information of the service personnel, if so, the step S13 is executed, and if not, the process is ended.

In the embodiment of the invention, the voiceprint information of the service personnel can be obtained from the database according to the identity of the service personnel and is matched with the voiceprint information of the target recording file, and if the matching is successful, the voiceprint information of the target recording file is determined to contain the voiceprint information of the service personnel. And if the matching is unsuccessful, determining that the voiceprint information of the target sound recording file does not contain the voiceprint information of the service personnel.

Optionally, if the voiceprint information of the target recording file does not include the voiceprint information of a person other than the service person, it is determined that the target recording file is recorded by the service person alone, and it may be determined that the service person cheats, that is, the target recording file is abnormal.

S13, the electronic equipment extracts target keywords matched with preset keywords from the target sound recording file, wherein the preset keywords are related to a first scene type corresponding to the target sound recording file.

Wherein the first scene type includes: scene types such as hospital visits and family visits; the preset keywords related to each scene type, such as the preset keywords for hospital visits, may include, but are not limited to: your key words such as good, healthy, comfortable, and caring for the body.

The target keyword refers to a word which is consistent with the preset keyword and exists in the sound recording content of the target sound recording file.

In the embodiment of the invention, if the voiceprint information of the target recording file comprises the voiceprint information of the service personnel, the recording content of the recording file can be converted into the text content through a voice recognition technology, the text content is searched, and the target keyword matched with the preset keyword is extracted.

Optionally, if the voiceprint information of the target recording file does not include the voiceprint information of the service staff, it is indicated that the recording file may be a recording file faked by the service staff using other staff, and it is determined that the target recording file is a false recording file, that is, it is determined that the service staff has a cheating behavior.

And S14, inputting the sound features extracted from the target sound recording file into a sound scene recognition model by the electronic equipment, and acquiring a second scene type.

In the embodiment of the present invention, the sound feature may be extracted from the audio file, and the sound feature may be input to the sound scene recognition model to obtain a scene recognition result, that is, the second scene type.

And S15, if the first scene type is consistent with the second scene type, the electronic equipment acquires the first playing time of the target keyword in the target sound recording file.

The first playing time refers to a time point when the target keyword is played in the target sound recording file.

In the embodiment of the invention, if the first scene type is consistent with the second scene type, the first playing time of the target keyword in the sound recording file is obtained.

Optionally, if the first scene type is not consistent with the second scene type, it is determined that the recording file may be recorded elsewhere, and is not recorded during the operation, and the recording file is determined to be a false recording file.

And S16, the electronic equipment judges whether the target keyword is abnormal or not according to the first playing time and the time length of the target sound recording file, if not, the step S17 is executed, and if yes, the process is ended.

Specifically, the determining whether the target keyword is abnormal according to the first playing time and the time length of the target audio file includes:

Wherein, the first preset time range refers to the beginning period of the time length of the target sound recording file; the second preset time range refers to the last period of time of the time length of the target sound recording file. The first preset time range is consistent with the second preset time range in size. For example, if the time length is 1 minute, and the size of the first preset time range and the second preset time range is 20% of the time length, the first preset time range is 0 min 0 s to 0 min 12 s, and the second preset time range is 0 min 48 s to 1 min 0 s.

The first ratio is used for representing the concentration degree of the target keyword in the first preset time range of the target audio file, and the second ratio is used for representing the concentration degree of the target keyword in the second time range of the target audio file.

In this optional implementation, different preset time length thresholds may be set for different scene types in combination with different service requirements, and if the time length of the target sound recording file is smaller than the preset time length threshold, it is indicated that the target sound recording file may be recorded by the service staff who recites the keyword quickly and completes the job task in a payable manner, and further detection needs to be performed on the target sound recording file. Determining the time interval between two adjacent target keywords according to the first playing time to obtain the time interval between a plurality of groups of adjacent target keywords; determining the time interval of the adjacent target keywords less than the preset time interval as an abnormal time interval, because different keywords generally appear in different sentences, if the time interval of two keywords is too short, it indicates that the service personnel may continuously recite the keywords in a short time, which is an abnormal situation, why to prevent misjudgment, the number of the abnormal time intervals needs to be judged, if the number of the abnormal time intervals is greater than the preset number threshold, it indicates that the service personnel is likely to continuously and rapidly recite the keywords, which is likely to be keyword recording before starting operation or keyword recording after completing operation, if the first ratio of the number of times of playing the target keyword in the first preset time range to the number of times of playing the target keyword in the time length is greater than the preset ratio threshold, determining that the service personnel carries out keyword recording before starting operation; or if a second ratio of the playing times of the target keyword in a second preset time range to the playing times of the target keyword in the time length is greater than a preset ratio threshold, determining that the service personnel performs keyword recording after completing the operation, determining that the target keyword is abnormal, and if the first ratio and the second ratio are not greater than the preset ratio threshold, determining that the target keyword is not abnormal.

And S17, the electronic equipment judges whether the target sound recording file is abnormal or not according to the first playing time, the target keyword and the preset keyword, if so, the step S18 is executed, and if not, the process is ended.

Specifically, the step of judging whether the target audio file is abnormal or not according to the first playing time, the target keyword and the preset keyword comprises:

And the second playing time refers to a time point when the history keywords are played in the history sound recording file.

In this optional implementation manner, a plurality of historical sound recording files of the service staff may be obtained, where a scene type corresponding to the historical sound recording file is consistent with a scene type of the target sound recording file, that is, the same plurality of keywords may appear in the sound recording content of the historical sound recording file and the sound recording content of the target sound recording file. Extracting historical keywords matched with the preset keywords from the historical sound recording files, acquiring playing time (the second playing time) of the historical keywords, and determining the file similarity of the historical sound recording files and the target sound recording files according to the first playing time, the second playing time, the historical keywords and the target keywords; if the target historical file with the file similarity larger than the preset file similarity threshold exists, the service personnel can possibly reuse the sound recording file prepared in advance, and the target sound recording file is determined to be abnormal.

Specifically, the determining the file similarity between the historical sound recording file and the target sound recording file according to the first playing time, the second playing time, the historical keyword and the target keyword includes:

In this alternative embodiment, two adjacent history keywords may be determined as a history keyword group, and two adjacent target keywords may be determined as a target keyword group; judging whether the historical keyword group consistent with the target keyword group exists in a plurality of historical keyword groups or not, if a first keyword in the historical keyword group is consistent with a first keyword in the target keyword group, and a second keyword in the historical keyword group is consistent with a second keyword in the target keyword group, determining that the historical keyword group is consistent with the target keyword group, such as the historical keyword group (hello, healthy) is consistent with the target keyword group (hello, healthy), and the historical keyword group (hello, healthy) is inconsistent with the target keyword group (healthy, hello); if the historical keyword group consistent with the target keyword group exists, determining the keyword similarity of the historical keyword group and the target keyword group according to the first playing time and the second playing time, and then determining the file similarity of the target recording file and the historical recording file according to the keyword similarity.

Specifically, the determining the keyword similarity between the historical keyword group and the target keyword group according to the first playing time and the second playing time includes:

It is assumed that target keywords a1 and a2 are identical to history keywords b1 and b2, a1 is identical to b1, a2 is identical to b2, target keyword groups (a1 and a2) are identical to history keyword groups (b1 and b2), d (a1 and a2) are time intervals of a1 and a2, d (b1 and b2) are time intervals of b1 and b2, and the keyword similarity of the target keyword groups and the history keyword groups is f (a1, a2, b1 and b 2).

The similarity algorithm formula is as follows:

in this optional embodiment, a first playing time interval of two target keywords of the target keyword group may be determined according to the first playing time; determining a second playing time interval of two historical keywords of the historical keyword group according to the second playing time; the keyword similarity between the historical keyword group and the target keyword group may be calculated by using a similarity calculation method according to the first play time interval and the second play time interval.

Specifically, the determining the file similarity between the target sound recording file and the historical sound recording file according to the keyword similarity includes:

In this optional embodiment, the keyword similarity threshold is generally 0.6, and may be adjusted according to specific situations. And if the similarity of the keywords is greater than a preset keyword similarity threshold value, determining that the target keyword group is a similar keyword group, namely determining that the target keyword group is similar to the historical keyword group. A third ratio, which is a ratio of the similar keyword group in the target keyword group, may be calculated, and the third ratio is determined as a file similarity between the target sound recording file and the historical sound recording file.

And S18, the electronic equipment determines that the target sound recording file is a false sound recording file.

In the embodiment of the invention, if the recording file is abnormal, the cheating of the service personnel is determined, namely the target recording file is determined to be the false recording file recorded by the service personnel.

As an optional implementation, the method further comprises:

acquiring a video file corresponding to the target sound recording file;

In this optional embodiment, a video file corresponding to the target audio file may be obtained, and a face image appearing in the video file is detected by a face recognition technology, so as to determine whether a face image of a client exists in the video file; if the face image of the client does not exist in the video file, the service personnel is proved not to provide service for the client, and the target sound recording file is proved to be a false sound recording file.

In the method flow depicted in fig. 1, a target audio file of a service person may be obtained; judging whether the voiceprint information of the target recording file contains the voiceprint information of the service personnel; if the voiceprint information of the target recording file comprises the voiceprint information of the service personnel, extracting a target keyword matched with a preset keyword from the target recording file, wherein the preset keyword is related to a first scene type corresponding to the target recording file; inputting the sound features extracted from the target sound recording file into a sound scene recognition model to obtain a second scene type; if the first scene type is consistent with the second scene type, acquiring first playing time of the target keyword in the target sound recording file; judging whether the target keyword is abnormal or not according to the first playing time and the time length of the target sound recording file; if the target keyword is not abnormal, judging whether the target sound recording file is abnormal or not according to the first playing time, the target keyword and the preset keyword; and if the target sound recording file is abnormal, determining that the target sound recording file is a false sound recording file. Therefore, the sound recording can be detected in multiple directions by combining the voiceprint information of the service personnel, the scene type corresponding to the sound recording file, the time length of the sound recording file, the target keyword, the preset keyword, the first playing time and other elements, and the false sound recording file can be detected more accurately.

The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it will be apparent to those skilled in the art that modifications may be made without departing from the inventive concept of the present invention, and these modifications are within the scope of the present invention.

Referring to fig. 2, fig. 2 is a functional block diagram of a recording detection apparatus according to a preferred embodiment of the present invention.

In some embodiments, the recording detection apparatus operates in an electronic device. The recording detection device may comprise a plurality of functional modules composed of program code segments. The program codes of the program segments in the recording detection apparatus may be stored in a memory and executed by at least one processor to perform some or all of the steps in the recording detection method described in fig. 1, which may specifically refer to the related description in the method described in fig. 1 and are not described herein again.

In this embodiment, the recording detection apparatus may be divided into a plurality of functional modules according to the functions executed by the recording detection apparatus. The functional module may include: the device comprises an acquisition module 201, a first judgment module 202, an extraction module 203, an input module 204, a second judgment module 205, a third judgment module 206 and a determination module 207. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory.

An obtaining module 201, configured to obtain a target audio file of a service staff;

a first judging module 202, configured to judge whether voiceprint information of the target sound recording file includes voiceprint information of the service staff;

an extracting module 203, configured to extract a target keyword matched with a preset keyword from the target sound recording file if the voiceprint information of the target sound recording file includes the voiceprint information of the service person, where the preset keyword is related to a first scene type corresponding to the target sound recording file;

an input module 204, configured to input the sound features extracted from the target audio file into a sound scene recognition model, so as to obtain a second scene type;

the obtaining module 201 is further configured to obtain a first playing time of the target keyword in the target audio file if the first scene type is consistent with the second scene type;

a second determining module 205, configured to determine whether the target keyword is abnormal according to the first playing time and the time length of the target audio file;

the third determining module 206 is further configured to determine whether the target audio file is abnormal according to the first playing time, the target keyword and the preset keyword if the target keyword is not abnormal;

a determining module 207, configured to determine that the target sound recording file is a false sound recording file if the target sound recording file is abnormal.

As an optional implementation manner, the second determining module 205 determines whether the target keyword is abnormal according to the first playing time and the time length of the target audio file by:

As an optional implementation manner, the third determining module 206 includes:

the obtaining submodule is used for obtaining a plurality of historical sound recording files of the service personnel, wherein the scene types corresponding to the historical sound recording files are consistent with the first scene type;

the extraction submodule is used for extracting the historical keywords matched with the preset keywords from the historical sound recording files aiming at each historical sound recording file;

the obtaining submodule is further used for obtaining second playing time of the historical keywords in the historical sound recording file;

the determining submodule is used for determining the file similarity of the historical sound recording file and the target sound recording file according to the first playing time, the second playing time, the historical keywords and the target keywords;

the judgment submodule is used for judging whether a target historical sound recording file with the file similarity larger than a preset file similarity threshold exists in a plurality of historical sound recording files or not;

the determining submodule is further configured to determine that the target sound recording file is abnormal if the target sound recording file with the file similarity larger than a preset file similarity threshold exists in the plurality of history sound recording files.

As an optional implementation manner, the determining, by the determining submodule according to the first playing time, the second playing time, the history keyword, and the target keyword, a manner of determining the file similarity between the history sound recording file and the target sound recording file specifically is:

As an optional implementation manner, the determining, by the determining sub-module, according to the first playing time and the second playing time, a manner of determining the keyword similarity between the historical keyword group and the target keyword group specifically is as follows:

As an optional implementation manner, the determining, by the determining sub-module, the file similarity between the target sound recording file and the historical sound recording file according to the keyword similarity specifically includes:

As an optional implementation manner, the obtaining module 201 is further configured to obtain a video file corresponding to the target sound recording file;

the recording detection apparatus may further include:

the fourth judgment module is used for judging whether the face image of the client exists in the video file by using a face recognition technology;

the determining module 207 is further configured to determine that the target sound recording file is a false sound recording file if the video file does not have a face image of the client.

In the recording detection apparatus depicted in fig. 2, a target recording file of a service person may be acquired; judging whether the voiceprint information of the target recording file contains the voiceprint information of the service personnel; if the voiceprint information of the target recording file comprises the voiceprint information of the service personnel, extracting a target keyword matched with a preset keyword from the target recording file, wherein the preset keyword is related to a first scene type corresponding to the target recording file; inputting the sound features extracted from the target sound recording file into a sound scene recognition model to obtain a second scene type; if the first scene type is consistent with the second scene type, acquiring first playing time of the target keyword in the target sound recording file; judging whether the target keyword is abnormal or not according to the first playing time and the time length of the target sound recording file; if the target keyword is not abnormal, judging whether the target sound recording file is abnormal or not according to the first playing time, the target keyword and the preset keyword; and if the target sound recording file is abnormal, determining that the target sound recording file is a false sound recording file. Therefore, the sound recording can be detected in multiple directions by combining the voiceprint information of the service personnel, the scene type corresponding to the sound recording file, the time length of the sound recording file, the target keyword, the preset keyword, the first playing time and other elements, and the false sound recording file can be detected more accurately.

As shown in fig. 3, fig. 3 is a schematic structural diagram of an electronic device implementing a recording detection method according to a preferred embodiment of the present invention. The electronic device 3 comprises a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.

Those skilled in the art will appreciate that the schematic diagram shown in fig. 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 3 may further include an input/output device, a network access device, and the like.

The electronic device 3 may also include, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an Internet Protocol Television (IPTV), an intelligent wearable device, and the like. The Network where the electronic device 3 is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

The at least one Processor 32 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a transistor logic device, a discrete hardware component, etc. The processor 32 may be a microprocessor or the processor 32 may be any conventional processor or the like, and the processor 32 is a control center of the electronic device 3 and connects various parts of the whole electronic device 3 by various interfaces and lines.

The memory 31 may be used to store the computer program 33 and/or the module/unit, and the processor 32 may implement various functions of the electronic device 3 by running or executing the computer program and/or the module/unit stored in the memory 31 and calling data stored in the memory 31. The memory 31 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the electronic device 3, and the like. In addition, the memory 31 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash memory Card (FlashCard), at least one disk storage device, a flash memory device, and the like.

With reference to fig. 1, the memory 31 of the electronic device 3 stores a plurality of instructions to implement a recording detection method, and the processor 32 executes the plurality of instructions to implement:

acquiring a target recording file of a service worker;

Specifically, the processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.

In the electronic device 3 depicted in fig. 3, a target sound recording file of a service person may be acquired; judging whether the voiceprint information of the target recording file contains the voiceprint information of the service personnel; if the voiceprint information of the target recording file comprises the voiceprint information of the service personnel, extracting a target keyword matched with a preset keyword from the target recording file, wherein the preset keyword is related to a first scene type corresponding to the target recording file; inputting the sound features extracted from the target sound recording file into a sound scene recognition model to obtain a second scene type; if the first scene type is consistent with the second scene type, acquiring first playing time of the target keyword in the target sound recording file; judging whether the target keyword is abnormal or not according to the first playing time and the time length of the target sound recording file; if the target keyword is not abnormal, judging whether the target sound recording file is abnormal or not according to the first playing time, the target keyword and the preset keyword; and if the target sound recording file is abnormal, determining that the target sound recording file is a false sound recording file. Therefore, the sound recording can be detected in multiple directions by combining the voiceprint information of the service personnel, the scene type corresponding to the sound recording file, the time length of the sound recording file, the target keyword, the preset keyword, the first playing time and other elements, and the false sound recording file can be detected more accurately.

The integrated modules/units of the electronic device 3 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method for detecting a recorded sound, the method comprising:

acquiring a target recording file of a service worker;

2. The method of claim 1, wherein the determining whether the target keyword is abnormal according to the first playing time and the time length of the target audio file comprises:

3. The method of claim 1, wherein the determining whether the target audio file is abnormal according to the first playing time, the target keyword and the preset keyword comprises:

4. The method of claim 3, wherein determining the file similarity between the historical audio file and the target audio file according to the first playing time, the second playing time, the historical keyword, and the target keyword comprises:

5. The method according to claim 4, wherein the determining the keyword similarity between the historical keyword set and the target keyword set according to the first playing time and the second playing time comprises:

6. The method of claim 5, wherein determining the file similarity of the target audio file and the historical audio file according to the keyword similarity comprises:

7. The method according to any one of claims 1 to 6, further comprising:

acquiring a video file corresponding to the target sound recording file;

8. The recording detection apparatus, characterized in that, the recording detection apparatus includes:

9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the recording detection method according to any one of claims 1 to 7.

10. A computer-readable storage medium storing at least one instruction that when executed by a processor implements the audio recording detection method of any one of claims 1-7.