CN111326172A

CN111326172A - Conflict detection method and device, electronic equipment and readable storage medium

Info

Publication number: CN111326172A
Application number: CN201811543896.2A
Authority: CN
Inventors: 张辉; 彭一平; 高永虎
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2020-06-23

Abstract

The embodiment of the application provides a conflict detection method, a conflict detection device, electronic equipment and a readable storage medium, and the conflict detection method, the conflict detection device, the electronic equipment and the readable storage medium can be used for obtaining voice information collected in a process of travel service and dividing the voice information into a plurality of voice fragments. And performing frame feature extraction processing on the segmented voice segments to obtain the frame feature of each voice segment. And then, introducing the frame characteristics into a pre-established collision detection model for detection to obtain a detection result of each voice segment, and further determining whether each voice segment is a collision segment. The method can be used for carrying out conflict detection based on automatic recognition of the voice information in the travel service process, avoids the defects of high cost and low efficiency caused by manual detection in the prior art, and improves the objectivity and efficiency of the detection result.

Description

Conflict detection method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of speech recognition technologies, and in particular, to a collision detection method, apparatus, electronic device, and readable storage medium.

Background

With the rapid development of internet technology, travel APPs (applications) play an important role in people's daily life. However, along with the convenience brought by the travel APP, there are many potential safety hazards. For example, during a driving trip, a driver and passenger may conflict, and failure to timely detect and take effective intervention may ultimately lead to serious irreversible consequences. At present, whether conflicts occur in the taxi taking process is mainly detected in a manual detection mode. The customer service of the travel APP judges manually by listening to the recording in the travel process one by one. This way labor cost is high, inefficiency. Therefore, an efficient method for detecting whether a collision occurs during a trip is needed to prevent the occurrence of an accident through a proper intervention manner.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a collision detection method, apparatus, electronic device and readable storage medium, which can perform collision detection based on automatic recognition of voice information during a travel service process, and avoid the disadvantages of high cost and low efficiency caused by manual detection in the prior art.

According to an aspect of embodiments of the present application, there is provided an electronic device that may include one or more storage media and one or more processors in communication with the storage media. One or more storage media store machine-readable instructions executable by a processor. When the electronic device is operated, the processor communicates with the storage medium through the bus, and the processor executes the machine readable instructions to execute the conflict detection method.

According to another aspect of the embodiments of the present application, there is provided a collision detection method applied to an electronic device, the method including:

acquiring voice information acquired in the process of travel service, and dividing the voice information into a plurality of voice fragments;

performing frame feature extraction processing on each voice segment obtained by segmentation to obtain the frame feature of each voice segment;

and importing the extracted frame characteristics of each voice segment into a pre-established collision detection model for detection so as to obtain the detection result of each voice segment.

In some embodiments of the present application, the method may further comprise:

and judging whether a conflict occurs in the travel service process according to the detection result of each voice fragment, and marking the travel service as a conflict travel service and sending the conflict travel service to a customer service system when the conflict is judged to occur.

In some embodiments of the present application, the electronic device is a mobile terminal in communication with a server, and the step of acquiring voice information collected during a travel service may include:

acquiring voice information acquired within a preset time length at each interval through voice acquisition equipment of the mobile terminal; or

The electronic device is a server communicating with the mobile terminal, and the step of acquiring the voice information collected in the process of the travel service may include:

and acquiring voice information sent by the mobile terminal every preset time interval, wherein the voice information is the voice information collected by the mobile terminal in the preset time interval or the voice information sent by the mobile terminal after analyzing and processing the voice information collected by the mobile terminal in the preset time interval.

In some embodiments of the present application, the step of dividing the speech information into a plurality of speech segments may include:

and carrying out segmentation processing on the voice information according to a preset length by a preset step length so as to segment the voice information into a plurality of voice segments, wherein an overlapping part exists between two adjacent voice segments.

In some embodiments of the present application, the step of performing frame feature extraction processing on each segmented speech segment to obtain a frame feature of each speech segment may include:

framing each voice segment by a preset frame shift according to a preset window length to obtain a plurality of voice frames contained in each voice segment, wherein an overlapping part exists between every two adjacent voice frames;

extracting the basic features of each voice frame according to a preset algorithm, wherein the basic features are multi-dimensional features;

performing first-order difference processing on the extracted basic features to obtain first-order difference features of each voice frame;

carrying out second order differential processing on the first order differential characteristics to obtain second order differential characteristics of each voice frame;

and the basic feature, the first-order difference feature and the second-order difference feature corresponding to each voice frame form the frame feature of the voice frame.

In some embodiments of the present application, the step of importing the extracted frame feature of each speech segment into a pre-established collision detection model for detection to obtain a detection result of each speech segment may include:

importing the frame characteristics of the voice frame of each extracted voice segment into a pre-established collision detection model;

processing a plurality of frame features contained in the voice segments according to a preset processing method aiming at each voice segment contained in the voice information to obtain a mean value frame feature of the voice segment;

and classifying the mean frame characteristics corresponding to each voice segment, and obtaining the detection result of the voice segment to which the mean frame characteristics belong according to the classification result of the mean frame characteristics.

In some embodiments of the present application, the step of classifying the mean frame feature corresponding to each of the speech segments and obtaining the detection result of the speech segment to which the mean frame feature belongs according to the classification result of the mean frame feature may include:

carrying out classification detection on the mean frame characteristics corresponding to the voice segments to obtain classification scores of the mean frame characteristics;

comparing the obtained classification score with a preset threshold, and if the classification score is larger than the preset threshold, determining the voice segment to which the mean frame feature belongs as a conflict segment;

and if the classification score is smaller than or equal to the preset threshold, determining the voice segment to which the mean frame feature belongs as a non-conflict segment.

importing the frame characteristics of each voice frame in each extracted voice segment into a pre-established collision detection model;

carrying out classification detection on each frame feature to obtain a classification score of the frame feature;

comparing the obtained classification score with a preset threshold, and if the classification score is greater than the preset threshold, determining the voice frame corresponding to the frame characteristic as a conflict frame;

if the classification score is smaller than or equal to the preset threshold value, determining that the voice frame corresponding to the frame feature is a non-conflict frame;

and obtaining the detection result of the voice segment according to the detection result of the voice frame in each voice segment.

In some embodiments of the present application, the step of obtaining the detection result of the speech segment according to the detection result of the speech frame in each speech segment may include:

for each voice segment, when a voice frame with a detection result of a collision frame exists in the voice segment, judging the voice segment to be a collision segment; or

And for each voice segment, when the detection result that continuous first preset number of voice frames exist in the voice segment is a collision frame, judging that the voice segment is a collision segment.

In some embodiments of the present application, the pre-established collision detection model may be obtained by:

marking a conflict mark on a first voice segment with conflict information obtained in advance to serve as a positive sample, and marking a non-conflict mark on a second voice segment without conflict information obtained in advance to serve as a negative sample;

and leading a plurality of positive samples and a plurality of negative samples into a neural network model for training to obtain the conflict detection model.

In some embodiments of the present application, the step of introducing a plurality of positive samples and a plurality of negative samples into a neural network model for training to obtain the collision detection model may include:

sampling the marked positive samples and negative samples to make the number of positive samples consistent with that of negative samples;

and introducing the processed positive samples and negative samples into the neural network model for training to obtain the conflict detection model.

In some embodiments of the present application, the step of determining whether a conflict occurs in the travel service process according to the detection result of each voice segment may include:

judging whether the voice information containing a plurality of voice segments is conflict voice according to the detection result of each voice segment;

and judging whether the conflict occurs in the travel service process according to the judgment result of each voice message.

In some embodiments of the present application, the step of determining whether the voice information including a plurality of voice segments is collision information according to the detection result of each voice segment may include:

and when the detection result that continuous second preset number of voice segments exist in the plurality of voice segments is a conflict segment, judging that the voice information containing the plurality of voice segments is conflict voice.

In some embodiments of the present application, the step of determining whether a conflict occurs in the travel service process according to the determination result of the plurality of voice messages may include:

and when a third preset number of continuous voice information in the plurality of voice information is conflict voice, determining that a conflict occurs in the travel service process.

In some embodiments of the present application, the step of determining whether a conflict occurs in the travel service process according to the determination result of each voice segment may include:

and when the voice segments with the detection results of the conflict segments exist in the plurality of voice segments, judging that the conflict occurs in the travel service process.

According to another aspect of the embodiments of the present application, there is provided a collision detection apparatus applied to an electronic device, the apparatus including:

the acquisition module is used for acquiring voice information acquired in the process of travel service;

the segmentation module is used for segmenting the voice information into a plurality of voice segments;

the extraction module is used for extracting the frame characteristics of each voice segment obtained by segmentation to obtain the frame characteristics of each voice segment;

and the detection module is used for importing the extracted frame characteristics of each voice segment into a pre-established collision detection model for detection so as to obtain the detection result of each voice segment.

In some embodiments of the present application, the apparatus may further comprise:

and the judging module is used for judging whether a conflict occurs in the travel service process according to the detection result of each voice fragment, marking the travel service as a conflict travel service when the conflict is judged to occur, and sending the conflict travel service to the customer service system.

In some embodiments of the present application, the electronic device is a mobile terminal in communication with a server, and the obtaining module may be specifically configured to:

The electronic device is a server communicating with the mobile terminal, and the obtaining module is specifically configured to:

In some embodiments of the present application, the segmentation module may be specifically configured to:

In some embodiments of the present application, the extraction module may be specifically configured to:

In some embodiments of the present application, the detection module may be specifically configured to:

In some embodiments of the present application, the detection module may obtain a detection result of a speech segment to which the mean frame feature belongs by:

In some embodiments of the present application, the detection module may be further specifically configured to:

In some embodiments of the present application, the detection module may obtain the detection result of the speech segment by:

the marking module is used for marking a conflict mark on a first voice segment which is obtained in advance and has conflict information as a positive sample, and marking a non-conflict mark on a second voice segment which is obtained in advance and has no conflict information as a negative sample;

and the training module is used for leading the positive samples and the negative samples into a neural network model for training to obtain the conflict detection model.

In some embodiments of the present application, the training module may be specifically configured to:

In some embodiments of the present application, the determining module may be specifically configured to:

In some embodiments of the present application, the determining module may determine whether the voice information is a conflicting voice by:

and when the judgment result that the continuous second preset number of voice segments exist in the plurality of voice segments is the conflict segment, judging that the voice information containing the plurality of voice segments is conflict voice.

In some embodiments of the present application, the determining module may determine whether a conflict occurs in the travel service process by:

In some embodiments of the present application, the determining module may be further configured to:

According to another aspect of embodiments of the present application, a readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, may perform the steps of the collision detection method described above.

Based on any one of the above aspects, the embodiment of the application can divide the voice information into a plurality of voice fragments by acquiring the voice information acquired in the process of the travel service. And performing frame feature extraction processing on the segmented voice segments to obtain the frame feature of each voice segment. And then, introducing the frame characteristics into a pre-established collision detection model for detection to obtain a detection result of each voice segment, thereby determining whether each voice segment is a collision segment. The method can be used for carrying out conflict detection based on automatic recognition of the voice information in the travel service process, avoids the defects of high cost and low efficiency caused by manual detection in the prior art, and improves the objectivity and efficiency of the detection result.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a schematic block diagram illustrating an interaction of a conflict detection system provided by an embodiment of the present application;

FIG. 2 illustrates a schematic diagram of exemplary hardware and software components of an electronic device that may implement the server, the service requester terminal, and the service provider terminal of FIG. 1 provided by an embodiment of the present application;

fig. 3 is a schematic flow chart illustrating a collision detection method provided by an embodiment of the present application;

FIG. 4 shows a schematic flow chart of the substeps of step S120 in FIG. 3;

FIG. 5 shows one of the flow diagrams of the sub-steps of step S130 in FIG. 3;

fig. 6 shows a flow diagram of the sub-step of step S133A in fig. 5;

fig. 7 shows a second schematic flow chart of the sub-step of step S130 in fig. 3;

FIG. 8 is a block diagram of functional blocks of a conflict detection apparatus provided in an embodiment of the present application;

fig. 9 shows a second functional block diagram of a collision detection apparatus provided in the embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

To enable those skilled in the art to utilize the present disclosure, the following embodiments are presented in conjunction with a specific application scenario, "net appointment taxi taking scenario". It will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. Although the present application is described primarily in the context of a "net appointment taxi taking scenario," it should be understood that this is only one exemplary embodiment. The application can be applied to any other traffic type. For example, the present application may be applied to different transportation system environments, including terrestrial, marine, or airborne, among others, or any combination thereof. The vehicle of the transportation system may include a taxi, a private car, a windmill, a bus, a train, a bullet train, a high speed rail, a subway, a ship, an airplane, a spacecraft, a hot air balloon, or an unmanned vehicle, etc., or any combination thereof. The application can also comprise any service system for online taxi taking, for example, a system for sending and/or receiving express delivery, and a service system for business transaction of buyers and sellers. Applications of the system or method of the present application may include web pages, plug-ins for browsers, client terminals, customization systems, internal analysis systems, or artificial intelligence robots, among others, or any combination thereof.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

The term "user" in this application may refer to an individual, entity or tool that requests a service, subscribes to a service, provides a service, or facilitates the provision of a service. For example, the user may be a passenger, a driver, an operator, etc., or any combination thereof.

In order to solve at least one technical problem described in the background of the present application, an embodiment of the present application provides a conflict detection method, an apparatus, an electronic device, and a readable storage medium, which may be configured to obtain voice information acquired during a travel service and divide the voice information into a plurality of voice segments. And performing frame feature extraction processing on the segmented voice segments to obtain the frame feature of each voice segment. And then, introducing the frame characteristics into a pre-established collision detection model for detection to obtain a detection result of each voice segment, thereby determining whether each voice segment is a collision segment. The method can be used for carrying out conflict detection based on the voice information in the travel service process, avoids the defects of high cost and low efficiency caused by manual detection in the prior art, and improves the objectivity and efficiency of the detection result. The technical solution of the present application is explained below by means of possible implementations.

First embodiment

Fig. 1 is a schematic diagram of an architecture of a collision detection system 100 according to an alternative embodiment of the present application. For example, the conflict detection system 100 may be an online transportation service platform relied upon for transportation services such as taxi cab, designated driving service, express service, carpooling service, bus service, driver rental service, or regular service, or a combination of any of the above. The collision detection system 100 may include a server 110, a network 120, a service requester terminal 130, a service provider terminal 140, and a database 150, and the server 110 may include a processor therein that performs instruction operations. The collision detection system 100 shown in fig. 1 is only one possible example, and in other possible embodiments, the collision detection system 100 may include only one of the components shown in fig. 1 or may also include other components.

In some embodiments, the server 110 may be a single server or a group of servers. The set of servers can be centralized or distributed (e.g., the servers 110 can be a distributed system). In some embodiments, the server 110 may be local or remote to the terminal. For example, the server 110 may access data stored in the service requester terminal 130, the service provider terminal 140, and the database 150 via the network 120. As another example, the server 110 may be directly connected to at least one of the service requester terminal 130, the service provider terminal 140, and the database 150 to access data stored therein. In some embodiments, the server 110 may be implemented on a cloud platform; by way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud (community cloud), a distributed cloud, an inter-cloud, a multi-cloud, and the like, or any combination thereof. In some embodiments, the server 110 may also be implemented on an electronic device 200 having one or more of the components shown in FIG. 2 in the present application.

In some embodiments, the server 110, the service requester terminal 130 or the service provider terminal 140 may include a processor. The processor may process information and/or data in the travel service process to perform one or more of the functions described herein. For example, during the travel service, the processor may detect whether the voice information contains collision information based on processing the voice during the obtained travel service. A processor may include one or more processing cores (e.g., a single-core processor (S) or a multi-core processor (S)). Merely by way of example, a Processor may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an Application Specific Instruction Set Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller Unit, a Reduced Instruction Set computer (Reduced Instruction Set computer), a microprocessor, or the like, or any combination thereof.

Network 120 may be used for the exchange of information and/or data. In some embodiments, one or more components (e.g., server 110, service requestor terminal 130, service provider terminal 140, and database 150) in the collision detection system 100 may send information and/or data to other components. For example, the server 110 may acquire voice information from the service provider terminal 140 or the service request terminal 130 via the network 120. In some embodiments, the network 120 may be any type of wired or wireless network, or combination thereof. Merely by way of example, the Network 130 may include a wired Network, a Wireless Network, a fiber optic Network, a telecommunications Network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a WLAN, a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a bluetooth Network, a ZigBee Network, a Near Field Communication (NFC) Network, or the like, or any combination thereof. In some embodiments, network 120 may include one or more network access points. For example, network 120 may include wired or wireless network access points, such as base stations and/or network switching nodes, through which one or more components of collision detection system 100 may connect to network 120 to exchange data and/or information.

In some embodiments, "service requester" and "service requester terminal 130" may be used interchangeably, and "service provider" and "service provider terminal 140" may be used interchangeably.

In some embodiments, the service provider terminal 140 may comprise a voice capture enabled device, such as a mobile device, a tablet computer, a laptop computer, or a built-in device in a motor vehicle, or any combination thereof. In some embodiments, the mobile device may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart lace, smart glass, a smart helmet, a smart watch, a smart garment, a smart backpack, a smart accessory, and the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a Personal Digital Assistant (PDA), a gaming device, a navigation device, or a point of sale (POS) device, or the like, or any combination thereof. In some embodiments, the built-in devices in the motor vehicle may include an on-board computer, an on-board television, and the like.

Database 150 may store data and/or instructions. In some embodiments, the database 150 may store data obtained from the service requester terminal 130 and/or the service provider terminal 140. In some embodiments, database 150 may store data and/or instructions for the exemplary methods described herein. In some embodiments, database 150 may include mass storage, removable storage, volatile Read-write Memory, or Read-Only Memory (ROM), among others, or any combination thereof. By way of example, mass storage may include magnetic disks, optical disks, solid state drives, and the like; removable memory may include flash drives, floppy disks, optical disks, memory cards, zip disks, tapes, and the like; volatile read-write Memory may include Random Access Memory (RAM); the RAM may include Dynamic RAM (DRAM), Double data Rate Synchronous Dynamic RAM (DDR SDRAM); static RAM (SRAM), Thyristor-Based Random Access Memory (T-RAM), Zero-capacitor RAM (Zero-RAM), and the like. By way of example, ROMs may include Mask Read-Only memories (MROMs), Programmable ROMs (PROMs), Erasable Programmable ROMs (PERROMs), Electrically Erasable Programmable ROMs (EEPROMs), compact disk ROMs (CD-ROMs), digital versatile disks (ROMs), and the like. In some embodiments, database 150 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, across clouds, multiple clouds, or the like, or any combination thereof.

In some embodiments, a database 150 may be connected to the network 120 to communicate with one or more components in the collision detection system 100 (e.g., the server 110, the service requester terminal 130, the service provider terminal 140, etc.). One or more components in the conflict detection system 100 may access data or instructions stored in the database 150 via the network 120. The database 150 may be directly connected to one or more components in the collision detection system 100 (e.g., the server 110, the service requester terminal 130, the service provider terminal 140, etc.); or database 150 may be part of server 110.

In some embodiments, one or more components (e.g., server 110, service requestor terminal 130, service provider terminal 140, etc.) in conflict detection system 100 may have access to database 150. In some embodiments, one or more components in the conflict detection system 100 may read and/or modify information related to a service requestor, a service provider, or the public, or any combination thereof, when certain conditions are met. For example, server 110 may read and/or modify information for one or more users after receiving a service request.

Second embodiment

Fig. 2 illustrates a schematic diagram of exemplary hardware and software components of an electronic device 200 of a server 110, a service requester terminal 130, a service provider terminal 140, which may implement the concepts of the present application, according to some embodiments of the present application. For example, the processor 220 may be used on the electronic device 200 and to perform the functions herein.

The electronic device 200 may be a general purpose computer or a special purpose computer, both of which may be used to implement the conflict detection method of the present application. Although only a single computer is shown, for convenience, the functions described herein may be implemented in a distributed fashion across multiple similar platforms to balance processing loads.

For example, the electronic device 200 may include a network port 210 connected to a network, one or more processors 220 for executing program instructions, a communication bus 230, and a different form of storage medium 240, such as a disk, ROM, or RAM, or any combination thereof. Illustratively, the computer platform may also include program instructions stored in ROM, RAM, or other types of non-transitory storage media, or any combination thereof. The method of the present application may be implemented in accordance with these program instructions. The electronic device 200 also includes an Input/Output (I/O) interface 250 between the computer and other Input/Output devices (e.g., keyboard, display screen).

For ease of illustration, only one processor is depicted in the electronic device 200. However, it should be noted that the electronic device 200 in the present application may also comprise a plurality of processors, and thus the steps performed by one processor described in the present application may also be performed by a plurality of processors in combination or individually. For example, if the processor of the electronic device 200 executes steps a and B, it should be understood that steps a and B may also be executed by two different processors together or separately in one processor. For example, a first processor performs step a and a second processor performs step B, or the first processor and the second processor perform steps a and B together.

Third embodiment

Fig. 3 is a flowchart illustrating a collision detection method according to some embodiments of the present application, which may be applied to the server 110 or a mobile terminal communicatively connected to the server 110. The mobile terminal may be the service requester terminal 130 or the service provider terminal 140. It should be understood that, in other embodiments, the order of some steps in the collision detection method described in this embodiment may be interchanged according to actual needs, or some steps may be omitted or deleted. The detailed steps of the collision detection method are described below.

Step S110, acquiring voice information collected during the travel service, and dividing the voice information into a plurality of voice segments.

In this embodiment, the service requester terminal 130 may be a terminal device held by a passenger, and the service provider terminal 140 may be a terminal device held by a driver. And the server 110 may be a cloud server providing a travel service platform for the service requester terminal 130 and the service provider terminal 140. That is, in the process of the travel service, the voice collecting device in the service provider terminal 140 may be used to record in the process of the travel service to obtain the voice information of the driver and the passenger, and the service provider terminal 140 may analyze the collected voice information to detect whether a conflict exists. Or in the process of the travel service, recording is performed in the process of the travel service by using the voice acquisition device in the service requester terminal 130, and the obtained voice information is analyzed.

When the application is applied to the server 110, the server 110 may obtain the voice information sent by the service requester terminal 130 or the service provider terminal 140, for example, the voice information collected in a preset time interval is obtained from the service requester terminal 130 or the service provider terminal 140 at the preset time interval. The voice information may be transmitted without any processing after being collected by the service requester terminal 130 or the service provider terminal 140, or may be voice information which is preliminarily determined to have a collision after being collected by the service requester terminal 130 or the service provider terminal 140 and subjected to voice recognition. After receiving the voice message, the server 110 analyzes and processes the voice message to detect whether a conflict exists, so as to improve the accuracy of the determination result.

In this embodiment, in order to find the collision phenomenon in time, instead of analyzing the voice information in the travel service process after the whole travel is finished, the voice information in the preset duration is obtained at intervals of preset duration, for example, 5min, in the travel service process, and the voice information in the preset duration is analyzed in time.

The obtained voice information is firstly divided into a plurality of voice segments, and the obtained voice information can be divided into a plurality of voice segments according to a preset length and a preset step length. For example, 5min of speech information may be divided into a plurality of 10s speech segments. In this embodiment, in order to avoid missing signals between segments, during the segmentation process, the preset step length may be smaller than a preset length, for example, a preset length of 10s voice length, and the preset step length may be a preset length of 5s voice length, so that there is an overlapping portion between two adjacent voice segments.

And step S120, performing frame feature extraction processing on each voice segment obtained by segmentation to obtain the frame feature of each voice segment.

Since the speech signal is rapidly changed and is not beneficial to analysis processing, each speech segment can be divided into a plurality of speech frames on the basis of obtaining a plurality of speech segments by division. Each speech frame may be considered a smoother signal and subsequently may be operated on for each speech frame.

Referring to fig. 4, in the present embodiment, the step S120 may include the following sub-steps:

step S121, performing framing processing on each of the speech segments by using a preset frame shift according to a preset window length to obtain a plurality of speech frames included in each of the speech segments, wherein an overlapping portion exists between two adjacent speech frames.

And S122, extracting the basic features of the voice frames according to a preset algorithm, wherein the basic features are multi-dimensional features.

Step S123, performing first order difference processing on the extracted basic features to obtain first order difference features of each speech frame.

Step S124, performing second order difference processing on the first order difference feature to obtain a second order difference feature of each speech frame.

And step S125, forming the frame characteristics of the voice frame by using the basic characteristics, the first-order difference characteristics and the second-order difference characteristics corresponding to each voice frame.

Each speech segment may be framed with a predetermined frame shift, which may be 10-30ms, for example, 25ms window length, for a predetermined window length, which may be 10ms frame shift. In order to avoid the omission of the window boundary from the signal, when the frame is shifted, a frame overlapping part is required, i.e. an overlapping part exists between two adjacent frames.

On the basis of obtaining the speech frames contained in each speech segment, the basic features of each speech frame can be extracted according to a preset algorithm, wherein the preset algorithm can be a Filter Bank algorithm or an MFCC (Mel-frequency cepstralcoefficients) algorithm, and the extracted basic features are fbank features or MFCC features respectively. The fbank feature and the mfcc feature are extracted by a conventional method in the prior art, and are not described herein again.

After the basic features in the form of fbank features or mfcc features are obtained, first-order difference processing is performed on the basis of the basic features to obtain first-order difference features, and second-order difference processing is performed on the basis of the first-order difference to obtain second-order difference features. In this embodiment, the basic feature may be a multi-dimensional feature, for example, 40 dimensions, and the frame feature of the speech frame is formed by the basic feature, the first-order difference feature and the second-order difference feature, and the frame feature may be a 40 × 3 frame feature.

Step S130, importing the extracted frame features of each speech segment into a pre-established collision detection model for detection, so as to obtain a detection result of each speech segment.

Before performing step S130, pre-training is required to build the collision detection model:

a first speech segment with collision information obtained in advance may be marked with a collision flag as a positive sample, and a second speech segment without collision information obtained in advance may be marked with a non-collision flag as a negative sample. And leading a plurality of positive samples and a plurality of negative samples into a neural network model for training to obtain the conflict detection model.

In the present embodiment, it is considered that the number of positive samples having collision information is generally much smaller than the number of negative samples having no collision information. When the difference between the number of the positive samples and the number of the negative samples is large, the training of the neural network model is biased, and the accuracy of subsequent detection is reduced.

In view of the above, the marked positive samples and negative samples may be subjected to sampling processing so that the number of positive samples and the number of negative samples match. And then introducing the processed positive samples and negative samples into a neural network model for training to obtain the conflict detection model.

In the sampling process, the positive samples may be individually subjected to the up-sampling process to increase the number of positive samples, or the negative samples may be individually subjected to the down-sampling process to decrease the number of negative samples. Alternatively, the up-sampling process may be performed on the positive samples and the down-sampling process may be performed on the negative samples at the same time so that the number of positive samples and the number of negative samples match.

On the basis of pre-establishing and obtaining a collision detection model, when the frame characteristics of the voice frame in each voice segment are obtained, the frame characteristics can be introduced into the pre-established collision detection model for detection.

When the collision detection method is applied to the service provider terminal 140 or the service requester terminal 130, the recall rate of collision information may be considered, that is, a voice segment in which collision information exists may be detected as much as possible, and then, the voice segment which is preliminarily determined to have collision information may be transmitted to the server 110, and further determination may be performed by the server 110. The processing pressure on the server 110 side can be reduced by the terminal device performing the preliminary screening. Further, the service provider terminal 140 and the service requester terminal 130 may separately determine the collision information to feed back to the customer service system.

When applied to the service provider terminal 140 or the service requester terminal 130, referring to fig. 5, step S130 may include the following sub-steps:

step S131A, the frame characteristics of the extracted speech frame of each speech segment are imported into a collision detection model established in advance.

Step S132A, for each speech segment included in the speech information, processing a plurality of frame features included in the speech segment according to a preset processing method to obtain a mean frame feature of the speech segment.

Step S133A, classifying the mean frame features corresponding to the speech segments, and obtaining the detection result of the speech segment to which the mean frame features belong according to the classification result of the mean frame features.

When the collision detection method is applied to the service provider terminal 140 or the service requester terminal 130, the established collision detection model includes a CNN + FC layer, a posing or association layer, an FC layer, and an output result layer. The extracted frame characteristics of the speech frame of each speech segment are imported into the collision detection model, and the frame characteristics can be subjected to higher-dimensional characteristic extraction, down-sampling processing and the like through the CNN + FC layer, wherein the specific process of the CNN + FC layer for processing the frame characteristics can refer to a common method in the prior art, and is not repeated herein.

In the present embodiment, in consideration of the limitation of the processing capability of the terminal device, and when applied to the terminal device, the recall rate of the conflict information may be focused. Therefore, the posing or accounting layer can be used to process the frame features contained in each speech segment according to a preset processing method to obtain the average frame feature of the speech segment. The preset processing method may select a frame feature at an intermediate level from a plurality of frame features in a speech segment as an average frame feature, or perform average calculation on the plurality of frame features, and use a calculation result as the average frame feature, or perform average calculation after weighting the plurality of frame features, and use the calculation result as the average frame feature.

The mean frame characteristics of all the voice segments are obtained through calculation, the mean frame characteristics are used as the characteristics of the corresponding voice segments, and the mean frame characteristics are processed subsequently without processing each frame characteristic, so that the processing amount is reduced, and the processing efficiency is improved.

After the mean frame feature of each speech segment is obtained, feature compression, size conversion, and the like can be performed by using the FC layer. The specific processing procedure of the FC layer may refer to a processing method in the prior art, and is not described herein again.

And finally, classifying the mean frame characteristics corresponding to the voice segments by using an output result layer, and obtaining the detection result of the voice segments according to the classification result. When the method is applied to the terminal equipment, the input of the collision detection model is the frame characteristics of the voice frames in each voice segment, and the output is the detection result of each voice segment. Each frame in the voice segment does not need to be analyzed, the processing speed is improved, and the processing amount is reduced.

In this embodiment, referring to fig. 6, the detection result of each speech segment can be obtained by the following method:

step S1331A, performing classification detection on the mean frame features corresponding to each of the speech segments to obtain a classification score of the mean frame features.

Step S1332A, comparing the obtained classification score with a preset threshold, determining whether the classification score is greater than the preset threshold, if so, entering step S1333A, and if not, entering step S1334A.

Step S1333A, determining that the speech segment to which the mean frame feature belongs is a collision segment.

Step S1334A, determining that the speech segment to which the mean frame feature belongs is a non-collision segment.

An output result layer in the collision detection model plays a role of a classifier, and a correlation coefficient between the mean frame characteristic and the positive sample, namely a classification score, can be obtained. If the classification score is greater than a preset threshold, for example, 0.5, it indicates that the correlation between the mean frame feature and the positive sample containing the collision information is high, and it is determined that there is collision information in the mean frame feature, and the speech segment to which the mean frame feature belongs is a collision segment. And if the obtained classification score is smaller than or equal to a preset threshold value, the correlation degree between the mean frame feature and the positive sample containing the conflict information is low, and the mean frame feature is determined not to contain the conflict information, and the speech segment to which the mean frame feature belongs is determined to be a non-conflict segment.

The above procedure is a procedure of obtaining a detection result of a voice fragment by a collision detection model when applied to the service requester terminal 130 or the service provider terminal 140. The following explains the steps of the collision detection method applied to the server 110 in obtaining the detection result of the speech segment by using the collision detection model, please refer to fig. 7:

step S131B, the frame characteristics of each speech frame in each extracted speech segment are imported into a collision detection model established in advance.

Step S132B, performing classification detection on each frame feature to obtain a classification score of the frame feature.

Step S133B, comparing the obtained classification score with a preset threshold, and determining whether the classification branch is greater than the preset threshold. If the classification score is greater than the preset threshold, the process proceeds to step S134B. If the classification score is less than or equal to the preset threshold, the process proceeds to step S135B.

Step S134B, determining the speech frame corresponding to the frame feature as a collision frame, and entering step S136B.

Step S135B, determining that the speech frame corresponding to the frame feature is a non-collision frame.

Step S136B, obtaining the detection result of the speech segment according to the detection result of the speech frame in each speech segment.

When the server 110 detects the voice segment by using the collision detection model, the basic process is consistent with the process of detecting the voice segment by the terminal device. The difference is that the collision detection model applied in the server 110 lacks a posing or association layer compared to the collision detection model applied in the terminal device, and the server 110 does not calculate the mean frame feature of each speech segment, but processes each speech frame in the speech segment. The input of the collision detection model is the frame characteristics of each speech frame in the speech segment, and the output is the detection result of each frame characteristic. Similarly, after the higher-dimensional feature extraction and down-sampling processing is performed on the frame feature by the CNN + FC layer, the frame feature is compressed and subjected to the size conversion processing by the FC layer. And finally, obtaining a classification result of each frame feature by using an output result layer, and comparing the classification result with a preset threshold value to determine whether each frame feature is a collision frame.

Since the output of the collision detection model in the server 110 is the detection result of the frame feature, the detection result of the speech segment needs to be further obtained according to the result of the frame feature.

For each voice segment, when a voice frame with a detection result of a collision frame exists in the voice segment, the voice segment can be determined to be the collision segment. Or for each voice segment, when the detection result that there are continuous first preset number of voice frames in the voice segment is a collision frame, the voice segment can be determined to be a collision segment. For example, for a 10s speech segment, if a continuous 20-frame speech frame is determined to be a collision frame in the 10s speech segment, the speech segment is determined to be a collision segment.

In summary, the collision detection model applied to the service provider terminal 140 or the service requester terminal 130 focuses on finding more voice segments with collision information, and excludes voices with no collision obviously, so that the processing speed can be increased and the processing amount can be reduced by obtaining the average frame feature to output the detection result of the voice segments. The collision detection model applied to the server 110 focuses more on the accuracy of the detection result, and therefore, the collision detection model is processed for each frame to output the detection result of each frame, so as to improve the accuracy.

In some embodiments of the present application, the service provider terminal 140 and the service requester terminal 130 may only perform preliminary screening work to upload the screened voice segments determined as conflicting voices to the server 110, so that the server 110 performs further detection determination.

Or the service provider terminal 140 and the service requester terminal 130 may also perform detection judgment independently and directly, in this case, the method is consistent with the method of performing subsequent detection judgment after the server 110 obtains the detection result of each voice segment, and specifically may be as follows:

whether conflict occurs in the travel service process can be judged according to the detection result of each voice fragment. Alternatively, it may be determined whether the voice information including a plurality of voice segments is a conflicting voice according to the detection result of each voice segment. When the judgment result that continuous second preset number of voice segments exist in the plurality of voice segments is the conflict segment, judging that the voice information containing the plurality of voice segments is conflict voice. For example, for a voice message with a length of 5min, if 6 consecutive voice segments included in the voice message are collision segments, the voice message is determined to be a collision voice.

And judging whether conflict occurs in the travel service process according to the judgment result of each voice message. And when a third preset number of continuous voice information in the plurality of voice information is conflict voice, determining that conflict occurs in the travel service process. For example, if 7 continuous 5min voice messages exist in the whole order record corresponding to the travel service process and are determined to be conflict voices, it is determined that the order record is a conflict order, that is, a conflict occurs in the travel service process.

In addition, in some embodiments of the present application, in addition to the above-described sequential determination manner, when a voice segment whose detection result is a collision segment exists among a plurality of voice segments, it may be directly determined that a collision occurs in the journey service process.

In some embodiments of the present application, upon determining that a conflict occurs in the travel service process, the travel service may be flagged as a conflicting travel service and sent to the customer service system.

Fourth embodiment

Fig. 8 shows a functional block diagram of a collision detection apparatus 800 according to some embodiments of the present application, where the functions implemented by the collision detection apparatus 800 correspond to the steps executed by the above-described method. The apparatus may be understood as the electronic device 200 or the processor 220 of the electronic device 200, or may be understood as a component that is independent from the electronic device 200 or the processor 220 and implements the functions of the present application under the control of the electronic device 200, as shown in fig. 8, the collision detection apparatus 800 may include an obtaining module 801, a dividing module 802, an extracting module 803, and a detecting module 804.

An obtaining module 801, configured to obtain voice information collected in a process of a travel service. A dividing module 802, configured to divide the voice information into a plurality of voice segments. It is understood that the obtaining module 801 and the dividing module 802 can be used to execute the step S110, and for the detailed implementation of the obtaining module 801 and the dividing module 802, reference can be made to the contents related to the step S110.

The extracting module 803 is configured to perform frame feature extraction processing on each segmented speech segment to obtain a frame feature of each speech segment. It is understood that the extracting module 803 may be configured to perform the step S120, and for detailed implementation of the extracting module 803, reference may be made to the content related to the step S120.

The detecting module 804 is configured to import the extracted frame feature of each speech segment into a pre-established collision detection model for detection, so as to obtain a detection result of each speech segment. It is understood that the detecting module 804 can be used to execute the step S130, and for the detailed implementation of the detecting module 804, reference can be made to the contents related to the step S130.

In one possible implementation, please refer to fig. 9 in combination, the collision detection apparatus 800 may further include a determination module 805.

The determining module 805 is configured to determine whether a conflict occurs in the travel service process according to a detection result of each voice segment, and when it is determined that a conflict occurs, mark the travel service as a conflict travel service and send the conflict travel service to the customer service system.

In a possible implementation, the obtaining module 801 may specifically be configured to:

and acquiring voice information acquired in the preset duration through voice acquisition equipment of the mobile terminal at each interval of the preset duration.

In a possible implementation manner, the obtaining module may be further specifically configured to:

In a possible implementation, the segmentation module 802 may be specifically configured to:

In a possible implementation, the extracting module 803 may be specifically configured to:

In a possible implementation, the detecting module 804 may be specifically configured to:

In one possible implementation, the detection module 804 obtains the detection result of the speech segment to which the mean frame feature belongs by:

In a possible implementation, the detecting module 804 is specifically further configured to:

In one possible implementation, the detection module 804 obtains the detection result of the speech segment by:

In a possible implementation, the collision detection apparatus 800 may further include:

a marking module 806, configured to mark a collision identifier on a first voice segment with collision information obtained in advance to serve as a positive sample, and mark a non-collision identifier on a second voice segment without collision information obtained in advance to serve as a negative sample;

a training module 807, configured to introduce the positive samples and the negative samples into a neural network model for training to obtain the collision detection model.

In a possible implementation, the training module 807 is specifically configured to:

On the basis of obtaining the detection result of each voice segment, in a possible implementation, the determining module 805 determines whether the voice information is a conflicting voice by:

In one possible embodiment, the determining module 805 determines whether a conflict occurs in the travel service process by:

In a possible implementation manner, the determining module 805 may be further specifically configured to:

The modules may be connected or in communication with each other via a wired or wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, etc., or any combination thereof. The wireless connection may comprise a connection over a LAN, WAN, bluetooth, ZigBee, NFC, or the like, or any combination thereof. Two or more modules may be combined into a single module, and any one module may be divided into two or more units.

The embodiment of the present application further provides a readable storage medium, where the readable storage medium stores computer-executable instructions, and the computer-executable instructions can execute the short domain name management method in any method embodiment described above.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A collision detection method applied to an electronic device, the method comprising:

2. The collision detection method according to claim 1, characterized in that the method further comprises:

3. The conflict detection method according to claim 1, wherein the electronic device is a mobile terminal communicating with a server, and the step of acquiring the voice information collected during the travel service comprises:

acquiring voice information in the preset duration through voice acquisition equipment of the mobile terminal at each interval of the preset duration; or

The electronic device is a server communicating with the mobile terminal, and the step of acquiring the voice information acquired in the process of the travel service comprises the following steps:

4. The collision detection method according to claim 1, wherein the step of dividing the speech information into a plurality of speech segments comprises:

5. The collision detection method according to claim 1, wherein the step of performing frame feature extraction processing on each segmented speech segment to obtain the frame feature of each speech segment includes:

6. The method according to claim 1, wherein the step of introducing the extracted frame features of each speech segment into a pre-established collision detection model for detection to obtain the detection result of each speech segment includes:

7. The method according to claim 6, wherein the step of classifying the mean frame feature corresponding to each of the speech segments and obtaining the detection result of the speech segment to which the mean frame feature belongs according to the classification result of the mean frame feature comprises:

8. The method according to claim 1, wherein the step of introducing the extracted frame features of each speech segment into a pre-established collision detection model for detection to obtain the detection result of each speech segment includes:

9. The method according to claim 8, wherein the step of obtaining the detection result of the speech segment according to the detection result of the speech frame in each speech segment comprises:

10. The collision detection method according to claim 1, characterized in that the pre-established collision detection model is obtained by:

11. The method according to claim 10, wherein the step of introducing the positive samples and the negative samples into a neural network model for training to obtain the collision detection model comprises:

12. The collision detection method according to claim 2, wherein the step of determining whether a collision occurs in the travel service process according to the detection result of each voice segment includes:

13. The method according to claim 12, wherein the step of determining whether the speech information including a plurality of speech segments is collision information according to the detection result of each speech segment includes:

14. The collision detection method according to claim 12, wherein the step of determining whether a collision occurs in the travel service process according to the determination result of each piece of voice information includes:

15. The collision detection method according to claim 2, wherein the step of determining whether a collision occurs in the travel service process according to the detection result of each voice segment includes:

16. A collision detection apparatus, applied to an electronic device, the apparatus comprising:

17. The collision detection apparatus according to claim 16, characterized in that the apparatus further comprises:

18. The collision detection apparatus according to claim 16, wherein the electronic device is a mobile terminal in communication with a server, and the obtaining module is specifically configured to:

19. The collision detection apparatus according to claim 16, wherein the segmentation module is specifically configured to:

20. The collision detection apparatus according to claim 16, wherein the extraction module is specifically configured to:

21. The collision detection apparatus according to claim 16, wherein the detection module is specifically configured to:

22. The apparatus according to claim 21, wherein the detection module obtains the detection result of the speech segment to which the mean frame feature belongs by:

23. The collision detection apparatus according to claim 16, wherein the detection module is further specifically configured to:

24. The collision detection apparatus according to claim 23, wherein the detection module obtains the detection result of the voice segment by:

25. The collision detection apparatus according to claim 16, characterized in that the apparatus further comprises:

26. The collision detection apparatus according to claim 25, wherein the training module is specifically configured to:

27. The collision detection apparatus according to claim 17, wherein the determining module is specifically configured to:

28. The apparatus according to claim 27, wherein the determining module determines whether the voice message is a conflicting voice by:

29. The conflict detection apparatus according to claim 27, wherein the determining module determines whether a conflict occurs during the travel service by:

30. The collision detection apparatus according to claim 17, wherein the determining module is further specifically configured to:

31. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the collision detection method according to any one of claims 1-15.

32. A readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the collision detection method according to any one of claims 1-15.