WO2022073507A1 - 电话未接通类型的区分方法、装置、电子设备及存储介质 - Google Patents

电话未接通类型的区分方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022073507A1
WO2022073507A1 PCT/CN2021/122834 CN2021122834W WO2022073507A1 WO 2022073507 A1 WO2022073507 A1 WO 2022073507A1 CN 2021122834 W CN2021122834 W CN 2021122834W WO 2022073507 A1 WO2022073507 A1 WO 2022073507A1
Authority
WO
WIPO (PCT)
Prior art keywords
stage
call
type
audio file
communication connection
Prior art date
Application number
PCT/CN2021/122834
Other languages
English (en)
French (fr)
Inventor
余自雷
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2022073507A1 publication Critical patent/WO2022073507A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2218Call detail recording

Definitions

  • the present application relates to the technical field of artificial intelligence and the technical field of data processing, and in particular, to a method, apparatus, electronic device, and computer-readable storage medium for distinguishing between types of unconnected calls.
  • a method for distinguishing the type of unconnected calls provided by this application includes:
  • phase in which the communication connection request is located is the call phase, recording the type of telephone not being connected as a line party fault;
  • the stage at which the communication connection request is located is the early media stage, acquire the audio file of the early media stage, analyze the audio file, and obtain the corresponding call disconnection type;
  • the call voice is acquired, the call voice is analyzed, and the corresponding call disconnection type is obtained.
  • the present application also provides a device for distinguishing a telephone disconnected type, the device comprising:
  • a request stage judgment module used for initiating a communication connection request, and judging the stage at which the communication connection request is located according to the response of the called client;
  • a call stage judging module configured to record the unconnected type as the line party fault when the stage in which the communication connection request is located is the call stage;
  • an early media stage judgment module configured to obtain an audio file of the early media stage when the stage at which the communication connection request is located is an early media stage, analyze the audio file, and obtain a corresponding call disconnection type
  • the connection stage judgment module is configured to obtain the call voice when the stage of the communication connection request is the connection stage, analyze the call voice, and obtain the corresponding call disconnection type.
  • the present application also provides an electronic device, the electronic device comprising:
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to cause the at least one processor to perform the steps of:
  • phase in which the communication connection request is located is the call phase, recording the type of telephone not being connected as a line party fault;
  • the stage at which the communication connection request is located is the early media stage, acquire the audio file of the early media stage, analyze the audio file, and obtain the corresponding call disconnection type;
  • the call voice is acquired, the call voice is analyzed, and the corresponding call disconnection type is obtained.
  • the present application also provides a computer-readable storage medium storing a computer program, and the computer program implements the following steps when executed by a processor:
  • phase in which the communication connection request is located is the call phase, recording the type of telephone not being connected as a line party fault;
  • the stage at which the communication connection request is located is the early media stage, acquire the audio file of the early media stage, analyze the audio file, and obtain the corresponding call disconnection type;
  • the call voice is acquired, the call voice is analyzed, and the corresponding call disconnection type is obtained.
  • FIG. 1 is a schematic flowchart of a method for distinguishing a telephone disconnected type according to an embodiment of the present application
  • FIG. 2 is a schematic block diagram of a method for distinguishing telephone disconnection types provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of an internal structure of an electronic device for implementing a method for distinguishing a telephone disconnected type according to an embodiment of the present application.
  • the embodiments of the present application may acquire and process related data based on artificial intelligence technology.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • the basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the execution subject of the method for distinguishing the unconnected phone type provided by the embodiment of the present application includes, but is not limited to, at least one of the electronic devices that can be configured to execute the method provided by the embodiment of the present application, such as a server and a terminal.
  • the method for distinguishing the type of unconnected calls may be performed by software or hardware installed in the terminal device or the server device, and the software may be a blockchain platform.
  • the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
  • FIG. 1 it is a schematic flowchart of a method for distinguishing a telephone disconnected type according to an embodiment of the present application.
  • the method for distinguishing the type of telephone not connected includes:
  • the communication connection request may be a telephone request
  • the calling party generates the communication connection request by sending an INVITE message to the called user.
  • the communication connection request can generate two states, including a communication disconnected state and a communication connected state.
  • the communication disconnected state means that the called user has not answered the call, and a status code will be returned to the calling party at this time.
  • the communication disconnected state further includes a call phase and an early media phase.
  • the call phase means that the communication connection request has not yet reached the called user
  • the early media phase means that the communication connection request has reached the called user, but the called user has not answered it, and is temporarily answering under the circumstances.
  • the temporary answer includes, but is not limited to, ringing, call forwarding, color ringtone, and the like.
  • the communication connection state is the connection stage, which means that the called user answers the call, and at this time, the call channel between the calling party and the called user is connected.
  • the step of judging the stage of the communication connection request according to the response of the called user terminal includes:
  • determining the phase in which the communication connection request is located is the call phase or the early media phase according to the identifier in the received status code
  • the status code is a 3-digit code, which describes the classification of the reasons why the call is not connected, and usually includes identifiers in the form of 18X and 4XX.
  • the identifier in the received status code is 4XX, it is determined that the communication connection request is in the calling phase, and when the identifier in the received status code is 18X, it is determined that the communication connection request is in the early media phase ( early media).
  • the identifier in the received status code is 4XX
  • the first digit "4" of the status code 4XX is used to indicate the corresponding type
  • the last two digits represent the specific response type
  • the range of "4XX” includes "400-499”. 4XX indicates that the request failed, the request message contains syntax error information or the server was unable to complete the client request. Therefore, when the identified status code is 4XX, the type of unconnected telephone is determined as the type of fault on the line side.
  • stage of the communication connection request is the early media stage
  • acquire the audio file of the early media stage analyze the audio file, and obtain the corresponding call disconnection type.
  • the identifier in the received status code is 18X, it is determined that the communication connection request is in the early media stage.
  • the "X" in the status code 18X can usually be any value from 0 to 9.
  • the S3 includes:
  • Step A parsing the status code to obtain the identifier in the status code
  • Step B if the identifier is the first identifier, record the unanswered type of the call as unanswered;
  • the first identifier includes 18X, where X is not 0 and not 3, for example, the first identifier is 181, 182, and so on.
  • Step C if the identifier is the second identifier, the following steps are further performed:
  • Step c1 performing dynamic planning processing on the audio file to obtain the cumulative path distance
  • Step c2 if the cumulative distance of the path is greater than or equal to the preset distance threshold, record the type of unconnected call as the call-connected party is talking;
  • Step c3 if the cumulative distance of the path is less than the preset distance threshold, record the unconnected type as unanswered;
  • the second identifier is 180
  • the sound heard at this time is a ringback tone
  • the early media expressed in the form of the ringback tone is recorded as the audio document.
  • the dynamic programming processing on the audio file includes:
  • a target path in the audio network is searched, and a path cumulative distance of the target path is calculated.
  • the frame number refers to the time sequence label of the audio frame in the audio file
  • the preset reference audio file refers to a preset standard audio file.
  • N represents the maximum value of each frame number of the audio file
  • M represents the maximum value of each frame number of the audio file. The maximum value of each frame number of the reference audio file.
  • the target path is a path passing through several grid points in the audio network, and the grid points are frame numbers in the audio file and the reference audio file.
  • the path is not randomly selected, the pronunciation speed of the voice in any audio file may change, but the sequence of its parts cannot be changed, so the selected path must be from the preset two-dimensional It starts at the lower left corner of the Cartesian coordinate system and ends at the upper right corner.
  • the slope of the path can be constrained to be in the range of 0.5-2.
  • the next lattice point (nN, mM) to pass through may be one of the following three cases:
  • D[(n,m)] is the cumulative distance of the path
  • T is the audio file
  • R is the reference audio file
  • T(n) is the speech feature vector of the nth frame in the audio file
  • R(m) is the speech feature vector of the mth frame in the reference audio file
  • d[T(n), R(m)] represents the distance between the T(n) and the R(m).
  • This embodiment of the present application compares the calculated cumulative path distance D[(n,m)] with a preset second threshold, if D[(n,m)] is greater than or equal to the preset second threshold If D[(n,m)] is less than the preset second threshold, it is considered that the type of unanswered call is being made.
  • Performing dynamic programming processing on the audio file solves the matching problem of different speech lengths in the audio file, and does not require additional calculation to accurately calculate the cumulative distance of the path.
  • Step D if the identifier is the third identifier, the following steps are further performed:
  • Step d1 determine whether the audio file contains human voice
  • Step d2 If there is no voice, record the unanswered type of the call as no answer;
  • Step d3 if there is a human voice, then carry out ASR (Automatic Speech Recognition, automatic speech recognition) recognition processing to the human voice to obtain the corresponding text of the human voice;
  • ASR Automatic Speech Recognition, automatic speech recognition
  • Step c4 performing text similarity matching processing on the corresponding text and the standard vocabulary in the standard vocabulary set to obtain the standard similarity
  • Step c5 The standard speech with the highest standard similarity is regarded as the unanswered type, and if the standard similarity is less than the preset text threshold, the unanswered type is recorded.
  • the third identifier is 183. At this time, music or a CRBT will be played, and the played music or the CRBT is named the audio file.
  • the embodiment of the present application determines whether the audio file contains a human voice by performing frequency identification on the audio file.
  • the embodiment of the present application identifies whether there is a human voice frequency (frequency between 65Hz-1500Hz) in the audio file, and if no human voice frequency is identified, the unanswered type of the phone is recorded as unanswered. If the human voice frequency is recognized, it is considered that there is human voice, and the voice information in the audio file is converted;
  • the conversion processing of the voice information in the audio file includes:
  • the voice information is converted into text information by using the trained text information conversion model.
  • described building a text information conversion model, and training the text information conversion model including:
  • Step 1 randomly generating a training voice dialogue set and standard text information corresponding to the training voice dialogue set;
  • Step II using a text information conversion model to convert the training voice dialogue set to obtain converted text information
  • Step III comparing and judging the converted text information and the standard text information to obtain the difference between the converted text information and the standard text information;
  • Step IV when the difference between the converted text information and the standard text information is greater than a preset threshold, after adjusting the parameters of the text information conversion model, return to step II to continue to perform the conversion of text information;
  • Step V when the difference between the converted text information and the standard text information is less than or equal to the preset threshold, the training is completed, and a trained text information conversion model is generated.
  • Y represents the standard text information
  • Y represents the standard text information
  • Y represents the difference operation
  • the loss function When it is greater than or equal to the preset loss threshold, it means that there is a difference between the converted text information and the standard text information, and then re-execute the text information conversion after adjusting the parameters of the text information conversion model;
  • the loss function When it is less than the loss threshold, it means that there is no difference between the converted text information and the standard text information, then the training is completed, and a trained text information conversion model is generated.
  • the conversion efficiency is improved.
  • the text similarity matching is performed on the text information and the standard words in the standard speech set, the standard similarity between the text information and the standard words in the standard speech set is calculated, and the standard words are compared.
  • the similarity is arranged to determine the type of unconnected calls, including:
  • the standard similarity between the text information and the standard speech in the standard speech set is calculated by using the corresponding feature vector.
  • the standard dialects in the standard dialect set include, but are not limited to, the mobile phone is turned off, cannot be connected, the telephone number is empty, the call is in progress, the telephone is stopped, no one answers, no 0 is dialed, and the call is restricted.
  • the embodiment of the present application uses the cosine similarity to calculate the standard similarity of the two based on the feature vectors of the two texts:
  • similarity refers to the standard similarity
  • A is the feature vector of the text information
  • B is the feature vector of the standard language.
  • is the eigenvector value of the text information
  • is the eigenvector value of the standard vocabulary
  • i is the vector, and each dimension of the vector is the occurrence of the word at that position in the dictionary in the text The number of times, or 0 if it does not appear in the text.
  • the similarity between the text information and the phone number that is turned off cannot be connected, and the telephone number is empty, and the corresponding similarity is obtained as 0.6, 0.9, and 0.4, and the similarity is arranged, wherein the similarity value is the largest The number is 0.9, and the corresponding voice is unable to connect.
  • the unconnected type of the phone is recorded as unconnected. If the calculated similarity is 0.4, 0.3, and 0.2, the preset text threshold is 0.5, and the similarity is lower than the preset text threshold, and the unanswered type is recorded as unanswered.
  • the standard language corresponding to the standard similarity is used as the type of unconnected call. If the standard similarity is less than the preset text threshold Text threshold, it is classified as unanswered type.
  • the text similarity is calculated by using the cosine similarity calculation formula, which improves the accuracy of the subsequent classification and judgment of the type of unconnected calls.
  • the stage in which the communication connection request is located is the connection stage, acquire the voice of the call, and obtain the type of call not connected by analyzing the voice of the call.
  • ASR Automatic Speech Recognition, automatic speech recognition technology
  • recognition processing is performed on the voice of the call, and the result is obtained
  • text similarity matching processing is performed on the text and a preset IVR (Interactive Voice Response, Interactive Voice Response) vocabulary to obtain the text similarity, if the text similarity is greater than or equal to If the text similarity is less than the preset threshold, the unconnected type is recorded as the type of the called user hanging up.
  • the agent's opening remarks refers to the voice information after the call is connected, and the preset IVR speeches include but are not limited to the main station, the extension to be dialed, and the number search.
  • the ASR identification processing process and the text similarity matching process are the same as S32, which will not be repeated here. If the text similarity is greater than or equal to a preset threshold, the call disconnection type is recorded as machine automatic Answer type, if the text similarity is less than the preset threshold, record the call disconnection type as the called user hangup type.
  • the automatic answering type of the machine is generated because the automatic voice answering system is connected.
  • the automatic voice answering system may issue a voice to remind the calling user that the Dial the extension and automatically answer, need to call the main station, etc., which belongs to the type of phone that has not yet been connected;
  • the called user hang-up type refers to when the communication connection request is in the connection stage, but the called user within the preset time. Generated by hanging up the phone for some reason.
  • FIG. 2 it is a schematic block diagram of the device for distinguishing the unconnected telephone type of the present application.
  • the device 100 for distinguishing a telephone disconnection type described in this application may be installed in an electronic device.
  • the device 100 for distinguishing the unconnected type of calls may include a request stage judging module 101 , a call stage judging module 102 , an early media stage judging module 103 and a connection stage judging module 104 .
  • the modules described in this application may also be referred to as units, which refer to a series of computer program segments that can be executed by the processor of an electronic device and can perform fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the request stage judging module 101 is used to initiate a communication connection request, and according to the response of the called client terminal, judge the stage of the communication connection request according to the response of the called client terminal;
  • the call stage judging module 102 is configured to record the unconnected type as the line party fault when the stage in which the communication connection request is located is the call stage;
  • the early media stage judgment module 103 is configured to obtain the audio file of the early media stage when the stage of the communication connection request is the early media stage, and obtain the corresponding missed call by analyzing the audio file. pass type;
  • the connection stage judging module 104 is configured to obtain the call voice when the stage of the communication connection request is the connection stage, and obtain the corresponding call disconnection type by analyzing the call voice.
  • the apparatus 100 for distinguishing the unconnected type of the phone implements the following method for distinguishing the unconnected phone type through the above-mentioned modules:
  • Step 1 The request stage judgment module 101 initiates a communication connection request, and judges the stage of the communication connection request according to the response of the called user terminal according to the response of the called user terminal.
  • the communication connection request may be a telephone request
  • the calling party generates the communication connection request by sending an INVITE message to the called user.
  • the communication connection request can generate two states, including a communication disconnected state and a communication connected state.
  • the communication disconnected state means that the called user has not answered the call, and a status code will be returned to the calling party at this time.
  • the communication disconnected state further includes a call phase and an early media phase.
  • the call phase means that the communication connection request has not yet reached the called user
  • the early media phase means that the communication connection request has reached the called user, but the called user has not answered it, and is temporarily answering under the circumstances.
  • the temporary answer includes, but is not limited to, ringing, call forwarding, color ringtone, and the like.
  • the communication connection state is the connection stage, which means that the called user answers the call, and at this time, the call channel between the calling party and the called user is connected.
  • the request stage judging module 101 judges the stage of the communication connection request according to the response of the called client through the following methods, including:
  • determining the phase in which the communication connection request is located is the call phase or the early media phase according to the identifier in the received status code
  • the status code is a 3-digit code, which describes the classification of the reasons why the call is not connected, and usually includes identifiers in the form of 18X and 4XX.
  • the identifier in the received status code is 4XX, it is determined that the communication connection request is in the calling phase, and when the identifier in the received status code is 18X, it is determined that the communication connection request is in the early media phase ( early media).
  • Step 2 When the stage of the communication connection request is the call stage, the call stage judgment module 102 records the unconnected type as the line party failure.
  • the call stage judging module 102 judges the type of the phone not connected as the type of the line party's fault.
  • Step 3 When the stage of the communication connection request is the early media stage, the early media stage judgment module 103 obtains the audio file of the early media stage, analyzes the audio file, and obtains that the corresponding phone is not connected. type.
  • the identifier in the received status code is 18X, it is determined that the communication connection request is in the early media stage.
  • the "X" in the status code 18X can usually be any value from 0 to 9.
  • the early media stage judging module 103 obtains the corresponding call disconnection type by the following method:
  • Step A parsing the status code to obtain the identifier in the status code
  • Step B If the identifier is the first identifier, record the unanswered type of the call as unanswered.
  • the first identifier includes 18X, where X is not 0 and not 3, for example, the first identifier is 181, 182, and so on.
  • Step C if the identifier is the second identifier, the following steps are further performed:
  • Step c1 Perform dynamic planning processing on the audio file to obtain the cumulative path distance:
  • Step c2 if the cumulative distance of the path is greater than or equal to the preset distance threshold, record the type of unconnected call as the call-connected party is talking;
  • Step c3 If the cumulative distance of the path is less than the preset distance threshold, record the unanswered type of the call as unanswered.
  • the second identifier is 180
  • the sound heard at this time is a ringback tone
  • the early media expressed in the form of the ringback tone is recorded as the audio document.
  • the dynamic programming processing on the audio file includes:
  • a target path in the audio network is searched, and a path cumulative distance of the target path is calculated.
  • the frame number refers to the time sequence label of the audio frame in the audio file
  • the preset reference audio file refers to a preset standard audio file.
  • N represents the maximum value of each frame number of the audio file
  • M represents the maximum value of each frame number of the audio file. The maximum value of each frame number of the reference audio file.
  • the target path is a path passing through several grid points in the audio network, and the grid points are frame numbers in the audio file and the reference audio file.
  • the path is not randomly selected, the pronunciation speed of the voice in any audio file may change, but the sequence of its parts cannot be changed, so the selected path must be from the preset two-dimensional It starts at the lower left corner of the Cartesian coordinate system and ends at the upper right corner.
  • the slope of the path can be constrained to be in the range of 0.5-2.
  • the next lattice point (nN, mM) to pass through may be one of the following three cases:
  • D[(n,m)] is the cumulative distance of the path
  • T is the audio file
  • R is the reference audio file
  • T(n) is the speech feature vector of the nth frame in the audio file
  • R(m) is the speech feature vector of the mth frame in the reference audio file
  • d[T(n), R(m)] represents the distance between the T(n) and the R(m).
  • This embodiment of the present application compares the calculated cumulative path distance D[(n,m)] with a preset second threshold, if D[(n,m)] is greater than or equal to the preset second threshold If D[(n,m)] is less than the preset second threshold, it is considered that the type of unanswered call is being made.
  • Performing dynamic programming processing on the audio file solves the matching problem of different speech lengths in the audio file, and does not require additional calculation to accurately calculate the cumulative distance of the path.
  • Step D if the identifier is the third identifier, the following steps are further performed:
  • Step d1 determine whether the audio file contains human voice
  • Step d2 If there is no voice, record the unanswered type of the call as no answer;
  • Step d3 if there is a human voice, then carry out ASR (Automatic Speech Recognition, automatic speech recognition) recognition processing to the human voice to obtain the corresponding text of the human voice;
  • ASR Automatic Speech Recognition, automatic speech recognition
  • Step d4 performing text similarity matching processing on the corresponding text and the standard vocabulary in the standard vocabulary set to obtain the standard similarity
  • Step d5 The standard speech with the highest standard similarity is regarded as the unanswered type, and if the standard similarity is less than the preset text threshold, the unanswered type is recorded.
  • the third identifier is 183. At this time, music or a CRBT will be played, and the played music or the CRBT is named the audio file.
  • the embodiment of the present application determines whether the audio file contains a human voice by performing frequency identification on the audio file.
  • the embodiment of the present application identifies whether there is a human voice frequency (frequency between 65Hz-1500Hz) in the audio file, and if no human voice frequency is identified, the unanswered type of the phone is recorded as unanswered. If the human voice frequency is recognized, it is considered that there is human voice, and the voice information in the audio file is converted;
  • the conversion processing of the voice information in the audio file includes:
  • the voice information is converted into text information by using the trained text information conversion model.
  • described building a text information conversion model, and training the text information conversion model including:
  • Step 1 randomly generating a training voice dialogue set and standard text information corresponding to the training voice dialogue set;
  • Step II using a text information conversion model to convert the training voice dialogue set to obtain converted text information
  • Step III comparing and judging the converted text information and the standard text information to obtain the difference between the converted text information and the standard text information;
  • Step IV when the difference between the converted text information and the standard text information is greater than a preset threshold, after adjusting the parameters of the text information conversion model, return to step II to continue to perform the conversion of text information;
  • Step V when the difference between the converted text information and the standard text information is less than or equal to the preset threshold, the training is completed, and a trained text information conversion model is generated.
  • Y represents the standard text information
  • Y represents the standard text information
  • Y represents the difference operation
  • the loss function When it is greater than or equal to the preset loss threshold, it means that there is a difference between the converted text information and the standard text information, and then re-execute the text information conversion after adjusting the parameters of the text information conversion model;
  • the loss function When it is less than the loss threshold, it means that there is no difference between the converted text information and the standard text information, then the training is completed, and a trained text information conversion model is generated.
  • the conversion efficiency is improved.
  • the text similarity matching is performed on the text information and the standard words in the standard speech set, the standard similarity between the text information and the standard words in the standard speech set is calculated, and the standard words are compared.
  • the similarity is arranged to determine the type of unconnected calls, including:
  • the standard similarity between the text information and the standard speech in the standard speech set is calculated by using the corresponding feature vector.
  • the standard dialects in the standard dialect set include, but are not limited to, the mobile phone is turned off, cannot be connected, the telephone number is empty, the call is in progress, the telephone is stopped, no one answers, no 0 is dialed, and the call is restricted.
  • the embodiment of the present application uses the cosine similarity to calculate the standard similarity of the two based on the feature vectors of the two texts:
  • similarity refers to the standard similarity
  • A is the feature vector of the text information
  • B is the feature vector of the standard language.
  • is the eigenvector value of the text information
  • is the eigenvector value of the standard vocabulary
  • i is the vector, and each dimension of the vector is the occurrence of the word at that position in the dictionary in the text The number of times, or 0 if it does not appear in the text.
  • the similarity between the text information and the phone number that is turned off cannot be connected, and the telephone number is empty, and the corresponding similarity is obtained as 0.6, 0.9, and 0.4, and the similarity is arranged, wherein the similarity value is the largest The number is 0.9, and the corresponding voice is unable to connect.
  • the unconnected type of the phone is recorded as unconnected. If the calculated similarity is 0.4, 0.3, and 0.2, the preset text threshold is 0.5, and the similarity is lower than the preset text threshold, and the unanswered type is recorded as unanswered.
  • the standard language corresponding to the standard similarity is used as the type of unconnected call. If the standard similarity is less than the preset text threshold Text threshold, it is classified as unanswered type.
  • the text similarity is calculated by using the cosine similarity calculation formula, which improves the accuracy of the subsequent classification and judgment of the type of unconnected calls.
  • Step 4 When the stage of the communication connection request is the connection stage, the connection stage judging module 104 obtains the voice of the call, and obtains the unconnected type by analyzing the voice of the call.
  • the connection stage judgment module 104 performs ASR (Automatic Speech Recognition) on the voice of the call. Recognition technology) recognition processing, obtain the text corresponding to the call voice, and perform text similarity matching processing on the text and the preset IVR (Interactive Voice Response, Interactive Voice Response) vocabulary to obtain the text similarity. If the text similarity is greater than or equal to the preset threshold, then the unconnected type is recorded as the machine automatic answering type, and if the text similarity is less than the preset threshold, then the unconnected type is recorded as the unconnected type. Call the user to hang up type.
  • ASR Automatic Speech Recognition
  • the agent's opening remarks refers to the voice information after the call is connected, and the preset IVR speeches include but are not limited to the main station, the extension to be dialed, and the number search.
  • the ASR identification processing process and the text similarity matching process are the same as step 3, which will not be repeated here. If the text similarity is greater than or equal to a preset threshold, the unconnected type is recorded as a machine Automatic answering type, if the text similarity is less than the preset threshold, record the call disconnection type as the called user hangup type.
  • the automatic answering type of the machine is generated because the automatic voice answering system is connected.
  • the automatic voice answering system may issue a voice to remind the calling user that the Dial the extension and automatically answer, need to call the main station, etc., which belongs to the type of phone that has not yet been connected;
  • the called user hang-up type refers to when the communication connection request is in the connection stage, but the called user within the preset time. Generated by hanging up the phone for some reason.
  • FIG. 3 it is a schematic structural diagram of an electronic device implementing a method for distinguishing a telephone disconnected type according to the present application.
  • the electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as a distinguishing program 12 for the type of telephone not connected .
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 .
  • the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash memory card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can not only be used to store application software installed in the electronic device 1 and various types of data, such as the code of the distinguishing program 12 for the type of unconnected calls, etc., but also can be used to temporarily store data that has been output or will be output. .
  • the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits.
  • Central Processing Unit CPU
  • microprocessor digital processing chip
  • graphics processor and combination of various control chips, etc.
  • the processor 10 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing the program or module (for example, executing the program) stored in the memory 11. A distinguishing program of the type of telephone not connected, etc.), and calling the data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (Extended industry standard architecture, EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA Extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 3 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the figure. components, or a combination of certain components, or a different arrangement of components.
  • the electronic device 1 may also include a power supply (such as a battery) for powering the various components, preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that the power management
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components.
  • the electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • a network interface optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the program 12 for distinguishing the unconnected type of calls stored in the memory 11 in the electronic device 1 is a combination of multiple instructions, and when running in the processor 10, it can realize:
  • phase in which the communication connection request is located is the call phase, recording the type of telephone not being connected as a line party fault;
  • the stage at which the communication connection request is located is the early media stage, acquire the audio file of the early media stage, analyze the audio file, and obtain the corresponding call disconnection type;
  • the call voice is acquired, the call voice is analyzed, and the corresponding call disconnection type is obtained.
  • the modules/units integrated in the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium, and the computer-readable storage medium can be stored in a computer-readable storage medium. It is volatile and can also be non-volatile.
  • the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) ).
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, and the like; The data created by the use of the node, etc.
  • a computer program is stored in the computer-readable storage medium, and when the computer program is executed by the processor, the following steps are implemented:
  • phase in which the communication connection request is located is the call phase, recording the type of telephone not being connected as a line party fault;
  • the stage at which the communication connection request is located is the early media stage, acquire the audio file of the early media stage, analyze the audio file, and obtain the corresponding call disconnection type;
  • the call voice is acquired, the call voice is analyzed, and the corresponding call disconnection type is obtained.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请涉及人工智能技术及数据处理技术,揭露一种电话未接通类型的区分方法,包括:根据被叫用户端的响应判断通信连接请求所处的阶段;当所述通信连接请求所处的阶段为呼叫阶段,将电话未接通类型记录为线路方故障;当所述通信连接请求所处的阶段为早期媒体阶段,获取所述早期媒体阶段的音频文件,分析所述音频文件得到对应的电话未接通类型;当所述通信连接请求所处的阶段为连通阶段,获取通话语音,分析所述通话语音得到对应的电话未接通类型。本申请还涉及区块链技术,所述电话未接通类型可以存储在区块链中。本申请还揭露一种电话未接通类型的区分处理装置、电子设备及存储介质。本申请可以提高区分电话未接通类型过程的准确性。

Description

电话未接通类型的区分方法、装置、电子设备及存储介质
本申请要求于2020年10月9日提交中国专利局、申请号为CN202011073539.1、名称为“电话未接通类型的区分方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域及数据处理技术领域,尤其涉及一种电话未接通类型的区分方法、装置、电子设备及计算机可读存储介质。
背景技术
随着通信技术的发展,越来越多的企业需要借助电话外呼执行业务,在各种电话外呼场景下,电话未接通的场景是非常频繁的,而且每通电话未接通的原因也是各种各样的,企业有必要针对各种未接通电话的情况进行差异化处理。
发明人意识到,在自动外呼场景下进行外呼时,外呼系统只能根据电话线路供应商接口返回的未接通代码来粗略区分未接通原因,因此,对于电话未接通类型的统计分析结果可能并不准确。
发明内容
本申请提供的一种电话未接通类型的区分方法,包括:
发起通信连接请求,根据被叫用户端的响应判断所述通信连接请求所处的阶段;
当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型;
当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到对应的电话未接通类型。
本申请还提供一种电话未接通类型的区分装置,所述装置包括:
请求阶段判断模块,用于发起通信连接请求,根据被叫用户端的响应判断所述通信连接请求所处的阶段;
呼叫阶段判断模块,用于当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
早期媒体阶段判断模块,用于当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型;
连通阶段判断模块,用于当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到对应的电话未接通类型。
本申请还提供一种电子设备,所述电子设备包括:
至少一个处理器;以及,
与所述至少一个处理器通信连接的存储器;其中:
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器执行如下步骤:
发起通信连接请求,根据被叫用户端的响应判断所述通信连接请求所处的阶段;
当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型;
当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到对应的电话未接通类型。
本申请还提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:
发起通信连接请求,根据被叫用户端的响应判断所述通信连接请求所处的阶段;
当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型;
当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到对应的电话未接通类型。
附图说明
图1为本申请一实施例提供的电话未接通类型的区分方法的流程示意图;
图2为本申请一实施例提供的电话未接通类型的区分方法的模块示意图;
图3为本申请一实施例提供的实现电话未接通类型的区分方法的电子设备的内部结构示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
本申请实施例提供的电话未接通类型的区分方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之,所述电话未接通类型的区分方法可以由安装在终端设备或服务端设备的软件或硬件来执行,所述软件可以是区块链平台。所述服务端包括但不限于:单台服务器、服务器集群、云端服务器或云端服务器集群等。
参照图1所示,为本申请实施例提供的电话未接通类型的区分方法的流程示意图。在本实施例中,所述电话未接通类型的区分方法包括:
S1、发起通信连接请求,并根据被叫用户端的响应判断所述通信连接请求所处的阶段。
在本申请实施例中,所述通信连接请求可以是一个电话请求,主叫方通过发起INVITE消息到被叫用户生成所述通信连接请求。
进一步地,所述通信连接请求可以产生两种状态,包括通讯未连接状态,以及通讯连接状态。其中,所述通讯未连接状态是指被叫用户尚未接听电话,此时会回复状态码给主叫方。进一步地,所述通讯未连接状态又包括呼叫阶段及早期媒体阶段。其中,所述呼叫阶段是指所述通信连接请求尚未到达被叫用户,以及所述早期媒体阶段是指所述通信连接请求已经到达被叫用户,而被叫用户尚未接听,此时处于临时应答的情境下。其中,临时应答包括但不限于振铃、呼叫正在前向、彩铃等。
所述通讯连接状态即连通阶段,是指被叫用户接听了电话,此时主叫方与被叫用户之间的通话信道连通。
较佳地,所述根据被叫用户端的响应判断所述通信连接请求所处的阶段,包括:
判断是否接收到所述被叫用户端返回的状态码;
在接收到所述被叫用户端返回的状态码时,根据接收到的所述状态码中的标识符确定所述通信连接请求所处的阶段为所述呼叫阶段或者所述早期媒体阶段;
在没有接收到所述被叫用户端返回的状态码时,确定所述通信连接请求所处的阶段为所述连通阶段。
其中,所述状态码是3位数字代码,描述电话未接通的原因分类,通常包括18X及4XX两种形式的标识符。其中,当所接收的状态码中的标识符为4XX时,判定所述通信连接请求处于呼叫阶段,以及当所接收的状态码中的标识符为18X时,判定所述通信连接请求处于早期媒体阶段(early media)。
S2、当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障。
如上所述,当所接收的状态码中的标识符为4XX时,判定所述通信连接请求处于呼叫阶段。其中,所述状态码4XX的第一位数字“4”用于指示相应类型,后面两位数字表示具体响应类型,“4XX”的范围包括“400-499”。4XX表示的是请求失败,请求消息中包含语法错误信息或服务器无法完成客户机请求。故当识别到的状态码为4XX,将电话未接通类型判定为线路方故障类型。
S3、当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型。
如上所述,当所接收的状态码中的标识符为18X时,判定所述通信连接请求处于早期媒体阶段。其中,状态码18X中的“X”通常可为0-9中的任意值。
详细地,所述S3包括:
步骤A:对所述状态码进行解析,得到所述状态码中的所述标识符;
步骤B:若所述标识符为第一标识符,则将所述电话未接通类型记录为无人接听;
本申请其中一个实施例中,所述第一标识符包括18X,其中的X不为0且不为3,如所述第一标识符为181、182等。
步骤C:若所述标识符为第二标识符,进一步执行下述步骤:
步骤c1:对所述音频文件进行动态规划处理,得到路径累计距离;
步骤c2:若所述路径累计距离大于或者等于预设的距离阈值,则将电话未接通类型记录为电话接通方正在通话;
步骤c3:若所述路径累计距离小于所述预设的距离阈值,则将电话未接通类型记录为无人接听;
在本申请其中一个实施例中,所述第二标识符为180,此时听到的声音为回铃音,并将以所述回铃音形式表现出来的所述早期媒体记录为所述音频文件。
详细地,所述对所述音频文件进行动态规划处理,包括:
计算所述音频文件的各个帧号n=1-N在预设的二维直角坐标系中的横轴上的位置,以及计算预设的参考音频文件的各个帧号m=1-M在所述预设的二维直角坐标系中的纵轴上的位置,通过这些表示帧号的整数坐标画出一些纵横线,由此形成音频网络;
搜索所述音频网络中的目标路径,计算所述目标路径的路径累计距离。
其中,帧号是指音频文件中音频帧的时序标号,所述预设的参考音频文件是指一个预先设置好的标准音频文件N代表所述音频文件的各个帧号的最大取值,M代表所述参考音频文件的各个帧号的最大取值。
具体地,所述目标路径为通过所述音频网络中若干格点的路径,所述格点为所述音频文件和所述参考音频文件中帧号。其中,所述路径不是随意选择的,任何一种音频文件中的语音的发音快慢都有可能变化,但是其各部分的先后次序不可能改变,因此所选的路径必定是从预设的二维直角坐标系的左下角出发,在右上角结束。
详细地,假设所述路径通过的所有格点依次为(n1,m1),……,(ni,mj),……, (nN,mM),其中(n1,m1)=(1,1),(nN,mM)=(N,M)。
优选地,为了避免路径过于倾斜,可以约束所述路径的斜率在0.5~2的范围内。例如,如果路径已经通过了格点(n,m),那么下一个通过的格点(nN,mM)可能是下列三种情况之一:
(nN,mM)=(n+1,m)
(nN,mM)=(n+1,m+1)
(nN,mM)=(n,m+1)
这时此路径累计距离为:
D[(n,m)]=d[T(n),R(m)]+min{D(n-1,m),D(n-1,m-1),D(n,m-1)}
Figure PCTCN2021122834-appb-000001
其中,D[(n,m)]为所述路径累计距离,T为所述音频文件,R为所述参考音频文件,T(n)为所述音频文件中第n帧的语音特征矢量,R(m)为所述参考音频文件中第m帧的语音特征矢量,d[T(n),R(m)]表示所述T(n)与所述R(m)之间的距离。
本申请实施例将计算出来的所述路径累计距离D[(n,m)]与预设的第二阈值进行对比,若D[(n,m)]大于或者等于所述的预设的第二阈值,则判定电话接通方正在通话类型,若D[(n,m)]小于所述的预设的第二阈值,则认为无人接听类型。
对所述音频文件进行动态规划处理解决了所述音频文件中语音长短不一的匹配问题,不需要额外的计算,精确地计算出所述路径累计距离。
步骤D:若所述标识符为第三标识符,进一步执行下述步骤:
步骤d1:判断所述音频文件中是否包含人声;
步骤d2:若无人声,则将电话未接通类型记录为无人接听;
步骤d3:若存在人声,则对所述人声进行ASR(Automatic Speech Recognition,自动语音识别)识别处理,得到所述人声的对应文字;
步骤c4:对所述对应文字和标准话术集中的标准话术进行文本相似度匹配处理,得到标准相似度;
步骤c5:将所述标准相似度最大的标准话术作为电话未接通类型,若所述标准相似度均小于预设的文本阈值,则将电话未接通类型记录为无人接听。
本申请其中一个实施例中,所述第三标识符为183,此时会播放音乐或者彩铃,将所述播放音乐或者所述彩铃命名为所述音频文件。
本申请实施例通过对所述音频文件进行频率识别以判断所述音频文件中是否包含人声。详细地,本申请实施例识别所述音频文件中是否存在人声频率(频率在65Hz-1500Hz),若未识别到人声频率,将电话未接通类型记录为无人接听。若识别到人声频率,则认为存在人声,对所述音频文件中的语音信息进行转换处理;
详细地,所述对所述音频文件中的语音信息进行转换处理,包括:
构建一个文字信息转化模型,并对所述文字信息转化模型进行训练;
利用训练后的文字信息转化模型将所述语音信息转化为文字信息。
可选地,所述构建一个文字信息转化模型,并对所述文字信息转化模型进行训练,包括:
步骤Ⅰ:随机生成训练语音对话集,以及所述训练语音对话集对应的标准文字信息;
步骤Ⅱ:利用文字信息转化模型对所述训练语音对话集进行转化,得到转化文字信息;
步骤Ⅲ:将所述转化文字信息和所述标准文字信息进行对比判断,得到所述转化文字信息和所述标准文字信息之间的差异;
步骤Ⅳ:在所述转化文字信息和所述标准文字信息之间的差异大于预设阈值时,调整所述文字信息转化模型的参数后返回步骤Ⅱ继续执行文字信息的转化;
步骤Ⅴ:在所述转化文字信息和所述标准文字信息之间的差异小于或等于所述预设阈值时,则完成所述训练,生成训练后的文字信息转化模型。
进一步地,利用如下预设的损失函数
Figure PCTCN2021122834-appb-000002
计算所述转化文字信息和所述标准文字信息之间的差异:
Figure PCTCN2021122834-appb-000003
其中,
Figure PCTCN2021122834-appb-000004
表示所述转化文字信息,Y表示所述标准文字信息,
Figure PCTCN2021122834-appb-000005
表示求差异运算,
Figure PCTCN2021122834-appb-000006
表示所述转化文字信息和所述标准文字信息之间的差异。
进一步地,当所述损失函数
Figure PCTCN2021122834-appb-000007
大于或等于预设的损失阈值时,说明所述转化文字信息和所述标准文字信息存在差异,则调整所述文字信息转化模型的参数后重新执行文字信息的转化;
当所述损失函数
Figure PCTCN2021122834-appb-000008
小于所述损失阈值时,说明所述转化文字信息和所述标准文字信息不存在差异,则完成所述训练,生成训练后的文字信息转化模型。
具体地,利用训练后的文字信息转化模型将所述音频文件中的语音信息转化为文字信息。
利用所述训练后的文字信息转化模型将所述音频文件中的语音信息转化为文字信息,提高了转化效率。
进一步地,对所述文字信息和标准话术集中的标准话术进行文本相似度匹配,计算出所述文字信息和所述标准话术集中的标准话术的标准相似度,并对所述标准相似度进行排列,判断电话未接通类型,包括:
将所述文字信息和所述标准话术集中的标准话术转换为对应的特征向量;
利用所述对应的特征向量计算所述文字信息和所述标准话术集中的标准话术的标准相似度。
其中,所述标准话术集中的标准话术包括但不限于手机关机,无法接通,电话空号,正在通话,电话停机,无人接听,未加拨0,呼叫限制。
进一步地,本申请实施例使用余弦相似度,基于两个文本的特征向量,计算两者的标准相似度:
Figure PCTCN2021122834-appb-000009
其中,similarity是指所述标准相似度,A为所述文字信息的特征向量,B为所述标准话术的特征向量。||A||为所述文字信息的特征向量值,||B||为所述标准话术的特征向量值,i为向量的每一维是词典中该位置的词在文本中的出现次数,未在文本中出现则为0。
将所述文字信息与所述标准话术集中的标准话术进行相似度匹配,得到标准相似度A、标准相似度B、标准相似度C等,对计算得出的标准相似度A、标准相似度B、标准相似度C进行从高到低排列,根据所述标准相似度最大的所述标准话术确定电话未接通类型,当所述标准相似度A、标准相似度B、标准相似度C均低于所预设的文本阈值,将电话未接通类型记录为无人接听。
例如,将所述文字信息与话术手机关机、无法接通、电话空号进行相似度匹配,得到对应的相似度为0.6、0.9、0.4,将所述相似度进行排列,其中相似度值最大的为0.9,对应话术为无法接通,此时将电话未接通类型记录为无法接通。若计算出的相似度为0.4、0.3、0.2,所述预设的文本阈值为0.5,所述相似度均低于预设的文本阈值,将电话未接通类型记录为无人接听。
若计算出来的所述标准相似度大于或者等于预设的文本阈值,则根据所述标准相似度对应的标准话术作为电话未接通类型,若所述标准相似度小于所述的预设的文本阈值,则归为无人接听类型。
利用余弦相似度计算公式计算文本相似度,提高了后续对电话未接通类型分类判断的准确度。
S4、当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,通过分析所述通话语音,得到电话未接通类型。
本申请实施例中,当电话接通后,若坐席开场白未播报完成时电话挂断,此时处于连通阶段,对所述通话语音进行ASR(Automatic Speech Recognition,自动语音识别技术)识别处理,得到所述通话语音对应的文字,对所述文字与预设的IVR(Interactive Voice Response,互动式语音应答)话术进行文本相似度匹配处理,得到文本相似度,若所述文本相似度大于或者等于预设的阈值,则将电话未接通类型记录为机器自动应答类型,若所述文本相似度小于所述预设的阈值,则将电话未接通类型记录为被叫用户挂断类型。
其中,所述坐席开场白是指电话接通后的语音信息,所述预设的IVR话术包括但不限于总台、需要拨打分机、查号等。
具体地,所述ASR识别处理过程和所述文本相似度匹配过程与S32相同,这里不再赘述,若所述文本相似度大于或者等于预设的阈值,将电话未接通类型记录为机器自动应答类型,若所述文本相似度小于所述预设的阈值,将电话未接通类型记录为被叫用户挂断类型。
所述机器自动应答类型是由于接通了语音自动应答系统产生,当所述通信连接请求处于连通阶段时对呼叫用户进行自动回应,此时所述语音自动应答系统可能发出语音提醒主叫用户需要拨打分机自动应答,需要呼叫总台等,属于电话尚未接通类型的一种;所述被叫用户挂断类型是指当所述通信连接请求处于连接阶段,但被叫用户在预设时间内由于由于某种原因挂断电话产生。
如图2所示,是本申请电话未接通类型的区分装置的模块示意图。
本申请所述电话未接通类型的区分装置100可以安装于电子设备中。根据实现的功能,所述电话未接通类型的区分装置100可以包括请求阶段判断模块101、呼叫阶段判断模块102、早期媒体阶段判断模块103和连通阶段判断模块104。本申请所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。
在本实施例中,关于各模块/单元的功能如下:
所述请求阶段判断模块101,用于发起通信连接请求,并根据被叫用户端的响应根据被叫用户端的响应判断所述通信连接请求所处的阶段;
所述呼叫阶段判断模块102,用于当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
所述早期媒体阶段判断模块103,用于当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,通过分析所述音频文件,得到对应的电话未接通类型;
所述连通阶段判断模块104,用于当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,通过分析所述通话语音,得到对应的电话未接通类型。
详细地,所述电话未接通类型的区分装置100通过上述各模块实现下述电话未接通类型的区分方法:
步骤一、所述请求阶段判断模块101发起通信连接请求,并根据被叫用户端的响应根据被叫用户端的响应判断所述通信连接请求所处的阶段。
在本申请实施例中,所述通信连接请求可以是一个电话请求,主叫方通过发起INVITE 消息到被叫用户生成所述通信连接请求。
进一步地,所述通信连接请求可以产生两种状态,包括通讯未连接状态,以及通讯连接状态。其中,所述通讯未连接状态是指被叫用户尚未接听电话,此时会回复状态码给主叫方。进一步地,所述通讯未连接状态又包括呼叫阶段及早期媒体阶段。其中,所述呼叫阶段是指所述通信连接请求尚未到达被叫用户,以及所述早期媒体阶段是指所述通信连接请求已经到达被叫用户,而被叫用户尚未接听,此时处于临时应答的情境下。其中,临时应答包括但不限于振铃、呼叫正在前向、彩铃等。
所述通讯连接状态即连通阶段,是指被叫用户接听了电话,此时主叫方与被叫用户之间的通话信道连通。
较佳地,所述请求阶段判断模块101通过下述方法根据被叫用户端的响应判断所述通信连接请求所处的阶段,包括:
判断是否接收到所述被叫用户端返回的状态码;
在接收到所述被叫用户端返回的状态码时,根据接收到的所述状态码中的标识符确定所述通信连接请求所处的阶段为所述呼叫阶段或者所述早期媒体阶段;
在没有接收到所述被叫用户端返回的状态码时,确定所述通信连接请求所处的阶段为所述连通阶段。
其中,所述状态码是3位数字代码,描述电话未接通的原因分类,通常包括18X及4XX两种形式的标识符。其中,当所接收的状态码中的标识符为4XX时,判定所述通信连接请求处于呼叫阶段,以及当所接收的状态码中的标识符为18X时,判定所述通信连接请求处于早期媒体阶段(early media)。
步骤二、当所述通信连接请求所处的阶段为呼叫阶段时,所述呼叫阶段判断模块102将电话未接通类型记录为线路方故障。
如上所述,当所接收的状态码中的标识符为4XX时,判定所述通信连接请求处于呼叫阶段。其中,所述状态码4XX的第一位数字“4”用于指示相应类型,后面两位数字表示具体响应类型,“4XX”的范围包括“400-499”。4XX表示的是请求失败,请求消息中包含语法错误信息或服务器无法完成客户机请求。故当识别到的状态码为4XX,所述呼叫阶段判断模块102将电话未接通类型判定为线路方故障类型。
步骤三、当所述通信连接请求所处的阶段为早期媒体阶段时,所述早期媒体阶段判断模块103获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型。
如上所述,当所接收的状态码中的标识符为18X时,判定所述通信连接请求处于早期媒体阶段。其中,状态码18X中的“X”通常可为0-9中的任意值。
详细地,所述早期媒体阶段判断模块103通过下述方法得到对应的电话未接通类型:
步骤A:对所述状态码进行解析,得到所述状态码中的标识符;
步骤B:若所述标识符为第一标识符,则将所述电话未接通类型记录为无人接听。
本申请其中一个实施例中,所述第一标识符包括18X,其中的X不为0且不为3,如所述第一标识符为181、182等。
步骤C:若所述标识符为第二标识符,进一步执行下述步骤:
步骤c1:对所述音频文件进行动态规划处理,得到路径累计距离:
步骤c2:若所述路径累计距离大于或者等于预设的距离阈值,则将电话未接通类型记录为电话接通方正在通话;
步骤c3:若所述路径累计距离小于所述预设的距离阈值,则将电话未接通类型记录为无人接听。
在本申请其中一个实施例中,所述第二标识符为180,此时听到的声音为回铃音,并将以所述回铃音形式表现出来的所述早期媒体记录为所述音频文件。
详细地,所述对所述音频文件进行动态规划处理,包括:
计算所述音频文件的各个帧号n=1-N在预设的二维直角坐标系中的横轴上的位置,以及计算预设的参考音频文件的各个帧号m=1-M在所述预设的二维直角坐标系中的纵轴上的位置,通过这些表示帧号的整数坐标画出一些纵横线,由此形成音频网络;
搜索所述音频网络中的目标路径,计算所述目标路径的路径累计距离。
其中,帧号是指音频文件中音频帧的时序标号,所述预设的参考音频文件是指一个预先设置好的标准音频文件N代表所述音频文件的各个帧号的最大取值,M代表所述参考音频文件的各个帧号的最大取值。
具体地,所述目标路径为通过所述音频网络中若干格点的路径,所述格点为所述音频文件和所述参考音频文件中帧号。其中,所述路径不是随意选择的,任何一种音频文件中的语音的发音快慢都有可能变化,但是其各部分的先后次序不可能改变,因此所选的路径必定是从预设的二维直角坐标系的左下角出发,在右上角结束。
详细地,假设所述路径通过的所有格点依次为(n1,m1),……,(ni,mj),……,(nN,mM),其中(n1,m1)=(1,1),(nN,mM)=(N,M)。
优选地,为了避免路径过于倾斜,可以约束所述路径的斜率在0.5~2的范围内。例如,如果路径已经通过了格点(n,m),那么下一个通过的格点(nN,mM)可能是下列三种情况之一:
(nN,mM)=(n+1,m)
(nN,mM)=(n+1,m+1)
(nN,mM)=(n,m+1)
这时此路径累计距离为:
D[(n,m)]=d[T(n),R(m)]+min{D(n-1,m),D(n-1,m-1),D(n,m-1)}
Figure PCTCN2021122834-appb-000010
其中,D[(n,m)]为所述路径累计距离,T为所述音频文件,R为所述参考音频文件,T(n)为所述音频文件中第n帧的语音特征矢量,R(m)为所述参考音频文件中第m帧的语音特征矢量,d[T(n),R(m)]表示所述T(n)与所述R(m)之间的距离。
本申请实施例将计算出来的所述路径累计距离D[(n,m)]与预设的第二阈值进行对比,若D[(n,m)]大于或者等于所述的预设的第二阈值,则判定电话接通方正在通话类型,若D[(n,m)]小于所述的预设的第二阈值,则认为无人接听类型。
对所述音频文件进行动态规划处理解决了所述音频文件中语音长短不一的匹配问题,不需要额外的计算,精确地计算出所述路径累计距离。
步骤D:若所述标识符为第三标识符,进一步执行下述步骤:
步骤d1:判断所述音频文件中是否包含人声;
步骤d2:若无人声,则将电话未接通类型记录为无人接听;
步骤d3:若存在人声,则对所述人声进行ASR(Automatic Speech Recognition,自动语音识别)识别处理,得到所述人声的对应文字;
步骤d4:对所述对应文字和标准话术集中的标准话术进行文本相似度匹配处理,得到标准相似度;
步骤d5:将所述标准相似度最大的标准话术作为电话未接通类型,若所述标准相似度均小于预设的文本阈值,则将电话未接通类型记录为无人接听。
本申请其中一个实施例中,所述第三标识符为183,此时会播放音乐或者彩铃,将所述播放音乐或者所述彩铃命名为所述音频文件。
本申请实施例通过对所述音频文件进行频率识别以判断所述音频文件中是否包含人声。详细地,本申请实施例识别所述音频文件中是否存在人声频率(频率在65Hz-1500Hz),若未识别到人声频率,将电话未接通类型记录为无人接听。若识别到人声频率,则认为存在人声,对所述音频文件中的语音信息进行转换处理;
详细地,所述对所述音频文件中的语音信息进行转换处理,包括:
构建一个文字信息转化模型,并对所述文字信息转化模型进行训练;
利用训练后的文字信息转化模型将所述语音信息转化为文字信息。
可选地,所述构建一个文字信息转化模型,并对所述文字信息转化模型进行训练,包括:
步骤Ⅰ:随机生成训练语音对话集,以及所述训练语音对话集对应的标准文字信息;
步骤Ⅱ:利用文字信息转化模型对所述训练语音对话集进行转化,得到转化文字信息;
步骤Ⅲ:将所述转化文字信息和所述标准文字信息进行对比判断,得到所述转化文字信息和所述标准文字信息之间的差异;
步骤Ⅳ:在所述转化文字信息和所述标准文字信息之间的差异大于预设阈值时,调整所述文字信息转化模型的参数后返回步骤Ⅱ继续执行文字信息的转化;
步骤Ⅴ:在所述转化文字信息和所述标准文字信息之间的差异小于或等于所述预设阈值时,则完成所述训练,生成训练后的文字信息转化模型。
进一步地,利用如下预设的损失函数
Figure PCTCN2021122834-appb-000011
计算所述转化文字信息和所述标准文字信息之间的差异:
Figure PCTCN2021122834-appb-000012
其中,
Figure PCTCN2021122834-appb-000013
表示所述转化文字信息,Y表示所述标准文字信息,
Figure PCTCN2021122834-appb-000014
表示求差异运算,
Figure PCTCN2021122834-appb-000015
表示所述转化文字信息和所述标准文字信息之间的差异。
进一步地,当所述损失函数
Figure PCTCN2021122834-appb-000016
大于或等于预设的损失阈值时,说明所述转化文字信息和所述标准文字信息存在差异,则调整所述文字信息转化模型的参数后重新执行文字信息的转化;
当所述损失函数
Figure PCTCN2021122834-appb-000017
小于所述损失阈值时,说明所述转化文字信息和所述标准文字信息不存在差异,则完成所述训练,生成训练后的文字信息转化模型。
具体地,利用训练后的文字信息转化模型将所述音频文件中的语音信息转化为文字信息。
利用所述训练后的文字信息转化模型将所述音频文件中的语音信息转化为文字信息,提高了转化效率。
进一步地,对所述文字信息和标准话术集中的标准话术进行文本相似度匹配,计算出所述文字信息和所述标准话术集中的标准话术的标准相似度,并对所述标准相似度进行排列,判断电话未接通类型,包括:
将所述文字信息和所述标准话术集中的标准话术转换为对应的特征向量;
利用所述对应的特征向量计算所述文字信息和所述标准话术集中的标准话术的标准相似度。
其中,所述标准话术集中的标准话术包括但不限于手机关机,无法接通,电话空号,正在通话,电话停机,无人接听,未加拨0,呼叫限制。
进一步地,本申请实施例使用余弦相似度,基于两个文本的特征向量,计算两者的标准相似度:
Figure PCTCN2021122834-appb-000018
其中,similarity是指所述标准相似度,A为所述文字信息的特征向量,B为所述标准话术的特征向量。||A||为所述文字信息的特征向量值,||B||为所述标准话术的特征向量值,i为向量的每一维是词典中该位置的词在文本中的出现次数,未在文本中出现则为0。
将所述文字信息与所述标准话术集中的标准话术进行相似度匹配,得到标准相似度A、标准相似度B、标准相似度C等,对计算得出的标准相似度A、标准相似度B、标准相似度C进行从高到低排列,根据所述标准相似度最大的所述标准话术确定电话未接通类型,当所述标准相似度A、标准相似度B、标准相似度C均低于所预设的文本阈值,将电话未接通类型记录为无人接听。
例如,将所述文字信息与话术手机关机、无法接通、电话空号进行相似度匹配,得到对应的相似度为0.6、0.9、0.4,将所述相似度进行排列,其中相似度值最大的为0.9,对应话术为无法接通,此时将电话未接通类型记录为无法接通。若计算出的相似度为0.4、0.3、0.2,所述预设的文本阈值为0.5,所述相似度均低于预设的文本阈值,将电话未接通类型记录为无人接听。
若计算出来的所述标准相似度大于或者等于预设的文本阈值,则根据所述标准相似度对应的标准话术作为电话未接通类型,若所述标准相似度小于所述的预设的文本阈值,则归为无人接听类型。
利用余弦相似度计算公式计算文本相似度,提高了后续对电话未接通类型分类判断的准确度。
步骤四、当所述通信连接请求所处的阶段为连通阶段时,所述连通阶段判断模块104获取通话语音,通过分析所述通话语音,得到电话未接通类型。
本申请实施例中,当电话接通后,若坐席开场白未播报完成时电话挂断,此时处于连通阶段,所述连通阶段判断模块104对所述通话语音进行ASR(Automatic Speech Recognition,自动语音识别技术)识别处理,得到所述通话语音对应的文字,对所述文字与预设的IVR(Interactive Voice Response,互动式语音应答)话术进行文本相似度匹配处理,得到文本相似度,若所述文本相似度大于或者等于预设的阈值,则将电话未接通类型记录为机器自动应答类型,若所述文本相似度小于所述预设的阈值,则将电话未接通类型记录为被叫用户挂断类型。
其中,所述坐席开场白是指电话接通后的语音信息,所述预设的IVR话术包括但不限于总台、需要拨打分机、查号等。
具体地,所述ASR识别处理过程和所述文本相似度匹配过程与步骤三相同,这里不再赘述,若所述文本相似度大于或者等于预设的阈值,将电话未接通类型记录为机器自动应答类型,若所述文本相似度小于所述预设的阈值,将电话未接通类型记录为被叫用户挂断类型。
所述机器自动应答类型是由于接通了语音自动应答系统产生,当所述通信连接请求处于连通阶段时对呼叫用户进行自动回应,此时所述语音自动应答系统可能发出语音提醒主叫用户需要拨打分机自动应答,需要呼叫总台等,属于电话尚未接通类型的一种;所述被叫用户挂断类型是指当所述通信连接请求处于连接阶段,但被叫用户在预设时间内由于由于某种原因挂断电话产生。如图3所示,是本申请实现电话未接通类型的区分方法的电子设备的结构示意图。
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序,如电话未接通类型的区分程序12。
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例 如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如电话未接通类型的区分程序12的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如执行电话未接通类型的区分程序等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。
图3仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图3示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。
所述电子设备1中的所述存储器11存储的电话未接通类型的区分程序12是多个指令的组合,在所述处理器10中运行时,可以实现:
发起通信连接请求,并根据被叫用户端的响应根据被叫用户端的响应判断所述通信连接请求所处的阶段;
当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型;
当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到对应的电话未接通类型。
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独 立的产品销售或使用时,可以存储在一个计算机可读存储介质中,所述计算机可读存储介质可以是易失性的,也可以是非易失性的。所述计算机可读存储介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。
进一步地,所述计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
所述计算机可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:
发起通信连接请求,并根据被叫用户端的响应根据被叫用户端的响应判断所述通信连接请求所处的阶段;
当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型;
当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到对应的电话未接通类型。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图表记视为限制所涉及的权利要求。
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种电话未接通类型的区分方法,其中,所述方法包括:
    发起通信连接请求,并根据被叫用户端的响应判断所述通信连接请求所处的阶段;
    当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
    当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型;
    当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到对应的电话未接通类型。
  2. 如权利要求1所述的电话未接通类型的区分方法,其中,所述根据被叫用户端的响应判断所述通信连接请求所处的阶段,包括:
    判断是否接收到所述被叫用户端返回的状态码;
    在接收到所述被叫用户端返回的状态码时,根据接收到的所述状态码中的标识符确定所述通信连接请求所处的阶段为所述呼叫阶段或者所述早期媒体阶段;
    在没有接收到所述被叫用户端返回的状态码时,确定所述通信连接请求所处的阶段为所述连通阶段。
  3. 如权利要求2所述的电话未接通类型的区分方法,其中,所述当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型,包括:
    对所述状态码进行解析,得到所述状态码中的所述标识符;
    若所述标识符为第一标识符,则将所述电话未接通类型记录为无人接听;
    若所述标识符为第二标识符,则对所述音频文件进行动态规划处理,得到路径累计距离;若所述路径累计距离大于或者等于预设的距离阈值,则将电话未接通类型记录为电话接通方正在通话;若所述路径累计距离小于所述预设的距离阈值,则将电话未接通类型记录为无人接听;
    若所述标识符为第三标识符,则判断所述音频文件中是否包含人声;若无人声,则将电话未接通类型记录为无人接听;若存在人声,则对所述人声进行识别处理,得到所述人声的对应文字;对所述文字和预设标准话术集中的标准话术进行文本相似度匹配处理,得到标准相似度;将所述标准相似度最大的标准话术作为电话未接通类型,若所述标准相似度均小于预设的文本阈值,则将电话未接通类型记录为无人接听。
  4. 如权利要求3所述的电话未接通类型的区分方法,其中,所述对所述音频文件进行动态规划处理,得到路径累计距离,包括:
    计算所述音频文件的各个帧号在预设的二维直角坐标系中的横轴上的位置,以及计算预设的参考音频文件的各个帧号在所述预设的二维直角坐标系中的纵轴上的位置,形成音频网络;
    搜索所述音频网络中的目标路径,计算所述目标路径的路径累计距离。
  5. 如权利要求4所述的电话未接通类型的区分方法,其中,所述目标路径为通过所述音频网络中若干格点的路径,其中,所述格点为所述音频文件和所述参考音频文件中帧号。
  6. 如权利要求4所述的电话未接通类型的区分方法,其中,所述路径累计距离为:
    D[(n,m)]=d[T(n),R(m)]+min{D(n-1,m),D(n-1,m-1),D(n,m-1)}
    Figure PCTCN2021122834-appb-100001
    其中,D[(n,m)]为所述路径累计距离,T为所述音频文件,R为所述参考音频文件,T(n)为所述音频文件中第n帧的语音特征矢量,R(m)为所述参考音频文件中第m帧的语音特征矢量,d[T(n),R(m)]表示所述T(n)与所述R(m)之间的距离。
  7. 如权利要求1至6中任意一项所述的电话未接通类型的区分方法,其中,所述当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到电话未接通类型,包括:
    对所述通话语音进行识别处理,得到所述通话语音对应的文字;
    对所述文字与预设的互动式语音应答话术进行文本相似度匹配处理,得到文本相似度;
    若所述文本相似度大于或者等于预设的阈值,则将电话未接通类型记录为机器自动应答类型;
    若所述文本相似度小于所述预设的阈值,则将电话未接通类型记录为被叫用户挂断类型。
  8. 一种电话未接通类型的区分装置,其中,所述装置包括:
    请求阶段判断模块,用于发起通信连接请求,根据被叫用户端的响应判断所述通信连接请求所处的阶段;
    呼叫阶段判断模块,用于当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
    早期媒体阶段判断模块,用于当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型;
    连通阶段判断模块,用于当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到对应的电话未接通类型。
  9. 一种电子设备,其中,所述电子设备包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下步骤:
    发起通信连接请求,并根据被叫用户端的响应判断所述通信连接请求所处的阶段;
    当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
    当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型;
    当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到对应的电话未接通类型。
  10. 如权利要求9所述的电子设备,其中,所述根据被叫用户端的响应判断所述通信连接请求所处的阶段,包括:
    判断是否接收到所述被叫用户端返回的状态码;
    在接收到所述被叫用户端返回的状态码时,根据接收到的所述状态码中的标识符确定所述通信连接请求所处的阶段为所述呼叫阶段或者所述早期媒体阶段;
    在没有接收到所述被叫用户端返回的状态码时,确定所述通信连接请求所处的阶段为所述连通阶段。
  11. 如权利要求10所述的电子设备,其中,所述当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型,包括:
    对所述状态码进行解析,得到所述状态码中的所述标识符;
    若所述标识符为第一标识符,则将所述电话未接通类型记录为无人接听;
    若所述标识符为第二标识符,则对所述音频文件进行动态规划处理,得到路径累计距 离;若所述路径累计距离大于或者等于预设的距离阈值,则将电话未接通类型记录为电话接通方正在通话;若所述路径累计距离小于所述预设的距离阈值,则将电话未接通类型记录为无人接听;
    若所述标识符为第三标识符,则判断所述音频文件中是否包含人声;若无人声,则将电话未接通类型记录为无人接听;若存在人声,则对所述人声进行识别处理,得到所述人声的对应文字;对所述文字和预设标准话术集中的标准话术进行文本相似度匹配处理,得到标准相似度;将所述标准相似度最大的标准话术作为电话未接通类型,若所述标准相似度均小于预设的文本阈值,则将电话未接通类型记录为无人接听。
  12. 如权利要求11所述的电子设备,其中,所述对所述音频文件进行动态规划处理,得到路径累计距离,包括:
    计算所述音频文件的各个帧号在预设的二维直角坐标系中的横轴上的位置,以及计算预设的参考音频文件的各个帧号在所述预设的二维直角坐标系中的纵轴上的位置,形成音频网络;
    搜索所述音频网络中的目标路径,计算所述目标路径的路径累计距离。
  13. 如权利要求12所述的电电子设备,其中,所述目标路径为通过所述音频网络中若干格点的路径,其中,所述格点为所述音频文件和所述参考音频文件中帧号。
  14. 如权利要求12所述的电子设备,其中,所述路径累计距离为:
    D[(n,m)]=d[T(n),R(m)]+min{D(n-1,m),D(n-1,m-1),D(n,m-1)}
    Figure PCTCN2021122834-appb-100002
    其中,D[(n,m)]为所述路径累计距离,T为所述音频文件,R为所述参考音频文件,T(n)为所述音频文件中第n帧的语音特征矢量,R(m)为所述参考音频文件中第m帧的语音特征矢量,d[T(n),R(m)]表示所述T(n)与所述R(m)之间的距离。
  15. 如权利要求9至14中任意一项所述的电子设备,其中,所述当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到电话未接通类型,包括:
    对所述通话语音进行识别处理,得到所述通话语音对应的文字;
    对所述文字与预设的互动式语音应答话术进行文本相似度匹配处理,得到文本相似度;
    若所述文本相似度大于或者等于预设的阈值,则将电话未接通类型记录为机器自动应答类型;
    若所述文本相似度小于所述预设的阈值,则将电话未接通类型记录为被叫用户挂断类型。
  16. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:
    发起通信连接请求,并根据被叫用户端的响应判断所述通信连接请求所处的阶段;
    当所述通信连接请求所处的阶段为呼叫阶段时,将电话未接通类型记录为线路方故障;
    当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型;
    当所述通信连接请求所处的阶段为连通阶段时,获取通话语音,分析所述通话语音,得到对应的电话未接通类型。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述根据被叫用户端的响应判断所述通信连接请求所处的阶段,包括:
    判断是否接收到所述被叫用户端返回的状态码;
    在接收到所述被叫用户端返回的状态码时,根据接收到的所述状态码中的标识符确定所述通信连接请求所处的阶段为所述呼叫阶段或者所述早期媒体阶段;
    在没有接收到所述被叫用户端返回的状态码时,确定所述通信连接请求所处的阶段为所述连通阶段。
  18. 如权利要求17所述的计算机可读存储介质,其中,所述当所述通信连接请求所处的阶段为早期媒体阶段时,获取所述早期媒体阶段的音频文件,分析所述音频文件,得到对应的电话未接通类型,包括:
    对所述状态码进行解析,得到所述状态码中的所述标识符;
    若所述标识符为第一标识符,则将所述电话未接通类型记录为无人接听;
    若所述标识符为第二标识符,则对所述音频文件进行动态规划处理,得到路径累计距离;若所述路径累计距离大于或者等于预设的距离阈值,则将电话未接通类型记录为电话接通方正在通话;若所述路径累计距离小于所述预设的距离阈值,则将电话未接通类型记录为无人接听;
    若所述标识符为第三标识符,则判断所述音频文件中是否包含人声;若无人声,则将电话未接通类型记录为无人接听;若存在人声,则对所述人声进行识别处理,得到所述人声的对应文字;对所述文字和预设标准话术集中的标准话术进行文本相似度匹配处理,得到标准相似度;将所述标准相似度最大的标准话术作为电话未接通类型,若所述标准相似度均小于预设的文本阈值,则将电话未接通类型记录为无人接听。
  19. 如权利要求18所述的计算机可读存储介质,其中,所述对所述音频文件进行动态规划处理,得到路径累计距离,包括:
    计算所述音频文件的各个帧号在预设的二维直角坐标系中的横轴上的位置,以及计算预设的参考音频文件的各个帧号在所述预设的二维直角坐标系中的纵轴上的位置,形成音频网络;
    搜索所述音频网络中的目标路径,计算所述目标路径的路径累计距离。
  20. 如权利要求19所述的计算机可读存储介质,其中,所述目标路径为通过所述音频网络中若干格点的路径,其中,所述格点为所述音频文件和所述参考音频文件中帧号。
PCT/CN2021/122834 2020-10-09 2021-10-09 电话未接通类型的区分方法、装置、电子设备及存储介质 WO2022073507A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011073539.1A CN112235467B (zh) 2020-10-09 2020-10-09 电话未接通类型的区分方法、装置、电子设备及存储介质
CN202011073539.1 2020-10-09

Publications (1)

Publication Number Publication Date
WO2022073507A1 true WO2022073507A1 (zh) 2022-04-14

Family

ID=74120103

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/122834 WO2022073507A1 (zh) 2020-10-09 2021-10-09 电话未接通类型的区分方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112235467B (zh)
WO (1) WO2022073507A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098323A (zh) * 2022-06-16 2022-09-23 广州市企德友诚美信息技术开发有限公司 一种基于大数据的信号接入方法
CN117476011A (zh) * 2023-12-28 2024-01-30 杭州度言软件有限公司 一种基于语音信号的催收对象识别方法与系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6160995A (en) * 1998-09-22 2000-12-12 Iridium Ip Llc Method and system for uniform call termination treatment in a global communications network
CN106534529A (zh) * 2016-10-31 2017-03-22 努比亚技术有限公司 通话提示装置和方法
CN109218249A (zh) * 2017-06-29 2019-01-15 北京京东尚科信息技术有限公司 检测通话状态的方法、装置
CN109658939A (zh) * 2019-01-26 2019-04-19 北京灵伴即时智能科技有限公司 一种电话录音未接通原因识别方法
CN110830417A (zh) * 2018-08-08 2020-02-21 中兴通讯股份有限公司 呼叫结果获取方法、系统、ivr设备和计算机可读存储介质
CN110995938A (zh) * 2019-12-13 2020-04-10 上海优扬新媒信息技术有限公司 数据处理方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105959497B (zh) * 2016-06-15 2019-12-10 阿里巴巴集团控股有限公司 通讯方法及装置
CN109151220A (zh) * 2018-09-11 2019-01-04 中国—东盟信息港股份有限公司 一种通信话路呼叫失败场景分析系统
CN111294469B (zh) * 2018-12-07 2021-07-23 中国移动通信集团陕西有限公司 一种通话接通问题的故障分析方法、装置及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6160995A (en) * 1998-09-22 2000-12-12 Iridium Ip Llc Method and system for uniform call termination treatment in a global communications network
CN106534529A (zh) * 2016-10-31 2017-03-22 努比亚技术有限公司 通话提示装置和方法
CN109218249A (zh) * 2017-06-29 2019-01-15 北京京东尚科信息技术有限公司 检测通话状态的方法、装置
CN110830417A (zh) * 2018-08-08 2020-02-21 中兴通讯股份有限公司 呼叫结果获取方法、系统、ivr设备和计算机可读存储介质
CN109658939A (zh) * 2019-01-26 2019-04-19 北京灵伴即时智能科技有限公司 一种电话录音未接通原因识别方法
CN110995938A (zh) * 2019-12-13 2020-04-10 上海优扬新媒信息技术有限公司 数据处理方法和装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098323A (zh) * 2022-06-16 2022-09-23 广州市企德友诚美信息技术开发有限公司 一种基于大数据的信号接入方法
CN117476011A (zh) * 2023-12-28 2024-01-30 杭州度言软件有限公司 一种基于语音信号的催收对象识别方法与系统
CN117476011B (zh) * 2023-12-28 2024-03-01 杭州度言软件有限公司 一种基于语音信号的催收对象识别方法与系统

Also Published As

Publication number Publication date
CN112235467A (zh) 2021-01-15
CN112235467B (zh) 2022-09-02

Similar Documents

Publication Publication Date Title
US9525767B2 (en) System and method for answering a communication notification
WO2022073507A1 (zh) 电话未接通类型的区分方法、装置、电子设备及存储介质
CN107580149B (zh) 外呼失败原因的识别方法、装置、电子设备、存储介质
US10354677B2 (en) System and method for identification of intent segment(s) in caller-agent conversations
CN108682420B (zh) 一种音视频通话方言识别方法及终端设备
WO2020238209A1 (zh) 音频处理的方法、系统及相关设备
US20090097634A1 (en) Method and System for Call Processing
US11762629B2 (en) System and method for providing a response to a user query using a visual assistant
CN111696556A (zh) 一种分析用户对话情绪方法、系统、设备和存储介质
WO2019196238A1 (zh) 一种语音识别方法、终端设备及计算机可读存储介质
JP2007074175A (ja) 電話業務検査システム及びそのプログラム
CN114760387A (zh) 管理保持的方法和装置
JP6254504B2 (ja) 検索サーバ、及び検索方法
WO2023207212A1 (zh) 语音对话检测方法及装置
WO2021107208A1 (ko) 챗봇 채널연계 통합을 위한 챗봇 통합 에이전트 플랫폼 시스템 및 그 서비스 방법
JP3761158B2 (ja) 電話応答支援装置及び方法
WO2023090380A1 (ja) プログラム、情報処理システム及び情報処理方法
US10540966B2 (en) System and method for parameterization of speech recognition grammar specification (SRGS) grammars
CN110740212A (zh) 基于智能语音技术的通话接听方法、装置及电子设备
JP2016225740A (ja) 通話振り分けシステム、呼制御装置およびプログラム
CN110708418B (zh) 一种识别呼叫方属性的方法及装置
CN113163059A (zh) 一种ippbx性能检测方法、终端设备及存储介质
CN110798566A (zh) 通话信息记录方法、装置以及相关设备
CN109788128A (zh) 一种来电提示方法、来电提示装置及终端设备
CN117828045A (zh) 智能会话应答方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21877024

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 31/08/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21877024

Country of ref document: EP

Kind code of ref document: A1