CN116614574A

CN116614574A - Call recording control method and mobile communication terminal

Info

Publication number: CN116614574A
Application number: CN202310880256.5A
Authority: CN
Inventors: 王川; 欧阳俊; 钟剑伟
Original assignee: Unimaxcomm Ltd
Current assignee: Unimaxcomm Ltd
Priority date: 2023-07-18
Filing date: 2023-07-18
Publication date: 2023-08-18
Anticipated expiration: 2043-07-18
Also published as: CN116614574B

Abstract

The application provides a call recording control method and a mobile communication terminal, which are characterized in that call voice data are acquired and cached after an incoming call or an outgoing call is detected to be connected, a voice recognition engine is called to convert the call voice data into call text contents in real time, conversation topics of the call text contents are extracted by taking conversation as a unit, the speaking times of all conversation parties are recorded in real time in the conversation process, when the sum of the speaking times of all the conversation parties is greater than a preset speaking times threshold value, clustering analysis is carried out on the conversation topics to obtain a plurality of conversation classes, class topics of each conversation class are determined, the conversation class of which the conversation number is greater than the preset conversation number threshold value is determined to be an effective conversation class, when the class topics of any effective conversation class have correlation with any recording topic in a recording main library, the conversation to be recorded can be accurately recognized, the call recording function can be automatically started, and the user experience can be effectively improved.

Description

Call recording control method and mobile communication terminal

Technical Field

The present application relates to the field of mobile communications technologies, and in particular, to a call recording control method and a mobile communications terminal.

Background

The development and popularization of the mobile communication technology enable people to contact with people in different places at any time and any place, and bring great convenience to the work and life of people. Particularly, with the continuous updating of mobile communication terminals such as smart phones, the functions of the mobile communication terminals are more and more abundant, the mobile communication terminals are more and more tightly combined with the work and life of people, and the mobile communication terminals become one of the indispensable articles carried by people. The call function is one of the basic functions of the mobile communication terminal, and is always an important function frequently used by people in daily life. In most cases, the call contents do not need to be recorded, if the call contents are not distinguished, all call voices are stored indiscriminately, so that on one hand, a large amount of storage space of the mobile phone is occupied, and on the other hand, a large amount of call recording records bring unnecessary trouble to the user. For call recording, there are two common operation modes: one is to display the call record button on the call interface, when users think this call needs recording, manually operate and start the call record, because the call record is very functional, for important call users in the first place forget to start the record function easily, thus lose important call data; the other is to configure the telephone number needing to record in advance, when the telephone number is called or the user calls the telephone number, the intelligent mobile phone automatically starts the call recording function, the intelligent mobile phone has the advantages that the condition that the user forgets to start the recording is avoided, but the configuration operation is inconvenient, the condition of needing to record the call is complex, whether the call needs to be recorded is not completely associated with the telephone number, in some cases, even calls of strange numbers need to be recorded, frequent intervention operation of the user is needed, and the user experience is very poor.

Disclosure of Invention

Based on the above problems, the application provides a call recording control method and a mobile communication terminal, which can accurately identify a call requiring recording and automatically start a call recording function, and can effectively improve user experience.

In view of this, a first aspect of the present application provides a call recording control method, including:

after detecting that an incoming call or a call is switched on, acquiring and caching call voice data;

calling a voice recognition engine to convert the call voice data into call text content in real time;

extracting conversation topics of the conversation text content by taking conversation as a unit, wherein the conversation is a collection of multiple utterances of the same conversation topic related to each conversation party;

recording the speaking times of each calling party in real time in the calling process;

when the sum of the speaking times of all the call parties is larger than a preset speaking times threshold value, carrying out cluster analysis on the session theme to obtain a plurality of session categories;

determining a category theme of each session category based on the session theme;

acquiring the number of sessions in each session category;

determining a session class with the session number larger than a preset session number threshold as an effective session class;

performing correlation analysis on the class theme of the effective session class and the call recording theme of the recording theme library;

and when the category theme of any effective session category has correlation with any recording theme, starting call recording.

Further, in the above call recording control method, the step of extracting the session theme of the call text content by taking the session as a unit specifically includes:

constructing a speaking topic list;

after one speaking of any party is finished, extracting a current speaking theme from the speaking text content of the speaking of the party;

when the speaking topic list is not empty, acquiring the last speaking topic from the speaking topic list;

performing semantic similarity analysis on the current speech topic and the last speech topic;

if the semantic similarity of the current speaking topic and the last speaking topic is greater than or equal to a preset semantic similarity threshold, merging the speaking corresponding to the last speaking topic of the other speaking party with the speaking of the speaking party into the same session;

writing the current speaking subject to the tail of the speaking subject list;

repeating the steps from extracting the current speaking topic from the speaking text content of the speaking party, writing the current speaking topic into the tail of the speaking topic list, and until the semantic similarity of the current speaking topic and the last speaking topic is smaller than a preset semantic similarity threshold;

and determining the session theme of the current session from the speaking theme list.

Further, in the above call recording control method, the step of determining the session theme of the current session from the speaking theme list specifically includes:

constructing word vectors for each topic in the list of topicsWherein->1 to->Positive integer between>For the number of topics in the list of topics, wherein the superscript +.>A word vector representing the subject of the utterance;

calculating semantic similarity between every two of the speaking topics in the speaking topic list:

，

wherein the method comprises the steps of1 to->Positive integer between>1 to->Positive integer between>For the dimension of each word vector, +.>For the +.>Word vector of individual speech topics->Is>The value of the individual dimension is used to determine,for the +.>Word vector of individual speech topics->Is>Values of individual dimensions>For the +.>Subject matter of individual speaking and->Semantic similarity between individual speech topics;

calculating the first of the list of subjects to speakThe sum of the semantic similarity of the word vector of each utterance topic with the word vectors of other utterances:

；

at 1 to 1Between which is defined a +.>Value of>The sum of the semantic similarity of the word vector of each speaking topic and the word vector of other speaking topics satisfies the following conditions:

；

list the first of the speaking subjectsThe individual talk topics are determined as session topics for the current session.

Further, in the call recording control method, the step of performing cluster analysis on the session subject to obtain a plurality of session categories specifically includes:

constructing word vectors for each session topicWherein->1 to->Positive integer between>The number of session topics, wherein the superscript ∈ ->A word vector representing the topic of the conversation;

calculating semantic similarity between every two session topics:

，

wherein the method comprises the steps of1 to->Positive integer between>Is->Word vector of individual session topics->Is>Values of individual dimensions>Is->Word vector of individual session topics->Is>Values of individual dimensions>Is the firstPersonal session topic and->Semantic similarity between individual session topics;

based on semantic similarity between every two session topicsCalculating a cluster parameter->；

According to the clustering parametersConfiguring the Density of each Session topic>；

According to the density of each session themeDetermining a core theme from the session theme;

clustering the session topics by taking the core topic as a center to obtain a plurality of session categories.

Further, in the call recording control method, semantic similarity between every two session topics is based onConverting into clustering distance:

，

wherein the method comprises the steps ofA cluster distance coefficient which is preset;

calculating the average clustering distance of the session topics:

；

calculating the standard deviation of the clustering distance of the session theme:

；

calculating a clustering parameter:

。

further, in the call recording control method, according to the clustering parameterConfiguring the Density of each Session topic>The method specifically comprises the following steps:

density of each session topicInitializing to 0;

for the firstDensity of individual session topics>Traverse->Clustering distance of individual session topics from other session topics +.>，/>1 to->Positive integer between->；

When (when)When in use, let->。

Further, in the above call recording control method, before the step of extracting the session theme of the call text content in units of a session, the method further includes:

identifying a parquet phrase in the call text content;

judging whether the primary speaking in the conversation text content only contains a parlance;

when the primary speech in the call text content only contains the parlance, determining the primary speech as an invalid speech;

and eliminating the invalid speech from the call text content.

Further, in the above call recording control method, after the step of performing correlation analysis on the category theme of the effective session category and the call recording theme of the recording subject library, the call recording control method further includes:

when all the class topics of the effective conversation class have no correlation with any recording topic, controlling a loudspeaker to send out a call recording prompt tone;

outputting call recording prompt information at a call interface;

monitoring the operation of a user on the call interface;

when detecting that a user clicks a call recording button on the call interface in the call process, starting call recording;

and deleting the cached call voice data after the call is ended when the user is not detected to click the call record button on the call interface in the call process.

Further, in the above call recording control method, the step of starting the call recording specifically includes:

displaying a call record identifier on a call interface;

a call record file of the current call is established in the storage space of the current equipment;

writing the call voice data which are already stored in the cache space into the call record file;

continuously caching call voice data in the call process;

and writing the cached call voice data into the call record file according to a preset rule.

A second aspect of the present application proposes a mobile communication terminal, including a memory and a processor, the processor executing a computer program stored in the memory to implement the call recording control method according to any one of the first aspect of the present application.

Drawings

Fig. 1 is a flowchart of a call recording control method according to an embodiment of the present application.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced otherwise than as described herein, and therefore the scope of the present application is not limited to the specific embodiments disclosed below.

In the description of the present application, the term "plurality" means two or more, unless explicitly defined otherwise, the orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, merely for convenience of description of the present application and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application. The terms "coupled," "mounted," "secured," and the like are to be construed broadly, and may be fixedly coupled, detachably coupled, or integrally connected, for example; can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", etc. may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

In the description of this specification, the terms "one embodiment," "some implementations," "particular embodiments," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

A call recording control method and a mobile communication terminal according to some embodiments of the present application are described below with reference to the accompanying drawings.

As shown in fig. 1, a first aspect of the present application proposes a call recording control method, including:

acquiring the number of sessions in each session category;

Specifically, the calling party refers to all users participating in the call, for example, if the current call is a two-party call, the calling party is a user of the home terminal and a user of the opposite terminal. It should be noted that the number of parties referred to herein is related to the number of call participants and not to the number of call devices, for example, if one of the two smart phone parties is engaged in a call and the other is engaged in a call after a call connection is made with both smart phones, then in this scenario the parties are three parties rather than two parties.

In the technical scheme of the application, one speaking refers to the conversation content corresponding to one complete speaking of any party, namely the conversation content before the one speaking and the conversation content after the one speaking are the speaking content of the other party. The speaking of this time refers to the last speaking after the speaking is finished in the current conversation process.

Further, in the above call recording control method, the call recording control method further includes:

periodically searching call record files stored in a storage space of the current equipment;

when a call record file with an analysis state being unresolved exists, analyzing the call record file to obtain one or more call topics;

saving the conversation theme to a recording theme library;

and configuring the analysis state of the call record file to be analyzed.

constructing a speaking topic list;

writing the current speaking subject to the tail of the speaking subject list;

In some embodiments of the present application, the step of extracting a current speech topic from a text content of a call that is uttered by the caller at a time is specifically to process the text content of the call, and then input an LDA (Latent Dirichlet Allocation, cryptodirichlet allocation) model to perform topic word extraction, where the LDA model is a topic word extraction model obtained by training using call recording data of a local terminal as a corpus. In the technical solutions of other embodiments of the present application, a word Frequency statistics method and a TF-IDF (Term Frequency-inverse document Frequency) algorithm may be further used to extract a current speech topic of the speech of the caller.

In some embodiments of the present application, the performing semantic similarity analysis on the current speech topic and the last speech topic specifically maps the current speech topic and the last speech topic to a Vector space of Word Vector models such as Word2vec (Word to Vector), gloVe (Global Vectors for Word Representation, word representation based on global vectors), and obtains semantic similarity of words by calculating indexes such as distance or angle cosine value between Word vectors.

In the technical scheme of the application, a corresponding speaking topic list is constructed for each session, and if the semantic similarity between the current speaking topic and the last speaking topic is smaller than a preset semantic similarity threshold, the step of constructing the speaking topic list is re-executed to obtain the speaking topic list of the current session.

，

；

In the technical solutions of other embodiments of the present application, the speech topic with the highest word frequency in the speech topic list may also be determined as the session topic.

calculating semantic similarity between every two session topics:

，

It should be appreciated that for ease of presentation, multiplexing is used in this embodimentCalculating the similarity between every two session subjects as a counting variable, wherein the value range is +.>While in calculating semantic similarity between every two of the topics in the list of topics,/for example>The value range of (2) is +.>The use of the same counting variable symbols does not mean +.>Represented values and computing in the list of speech topicsThe similarity between the subjects of the speech has any relevance. In this embodiment, the construction of the word vector uses the same word vector model as in the previous embodiment, and therefore uses the same number of word vector dimensions +.>. The word vectors of the speaking topic and the conversation topic can be constructed by using different word vector models, and the number of dimensions of the word vectors can be different, so that the application is not limited to the word vectors.

，

calculating the average clustering distance of the session topics:

；

calculating a clustering parameter:

。

in the technical solution of the foregoing embodiment, a sum of a standard deviation of an average clustering distance of the session topics and a clustering distance of the session topics is taken as the clustering parameterTherefore, the semantic similarity distribution situation among the session topics is also used as one of the consideration factors of the cluster analysis, so that the cluster result is more accurate.

Further, in the call recording control method, according to the clustering parameterConfiguring the Density of each Session topic +.>The method specifically comprises the following steps:

density of each session topicInitializing to 0;

When (when)When in use, let->。

In the technical solution of the above embodiment, the density of each session theme is determinedThe step of determining a core theme from the session theme specifically includes:

acquiring a preconfigured density threshold；

To satisfy the densityIs determined as the core topic.

Specifically, the step of clustering the session topics with the core topic as a center to obtain a plurality of session categories specifically includes:

for each core theme, the clustering distance is smaller than or equal to the clustering parameterIs determined as a cluster range;

for any two or more core topics, when the cluster range of the core topics has intersection, adding the two or more core topics into the same cluster;

determining the union of the cluster ranges of all core topics of the same cluster as the cluster range of the cluster;

adding session topics of non-core topics falling into the cluster range of the cluster to the cluster;

the included core topics and non-core topics of the same cluster are determined as a session category.

In the technical solution of the foregoing embodiment, the step of determining a category theme of each session category based on the session theme specifically includes:

obtaining semantic similarity between session subjects in the session categoryWherein->1 to->Positive integer between>A number of session topics in the session category;

computing the first of the session categoriesThe sum of the semantic similarity of the word vector of each conversation topic with the word vectors of other conversation topics:

；

at 1 to 1Between which is defined a +.>Value of>The sum of the semantic similarity of the word vector of each conversation topic and the word vector of other conversation topics satisfies the following conditions:

；

by the first of the session categoriesThe session topics are determined as class topics for the session class.

identifying a parquet phrase in the call text content;

and eliminating the invalid speech from the call text content.

In particular, the parlance refers to a polite, a sleeve, and a familiar language used by people in communication to express honoring, greeting, and coldness to others, or to express some degree of friendliness and proximity, such as: "your good", "how much your recent body? "today's weather really good o" and so on. The parcels have the characteristics of fixed expression form, unoccupied expression content without specific information quantity, relatively random expression language, and the like, and the parcels in the conversation text content can be identified by identifying the expression form characteristics, the expression content characteristics and the expression language characteristics of the conversation text content and corresponding conversation voice data.

It should be noted that not all utterances related to the above-mentioned parlance are invalid utterances, and when the call contents of the respective call parties are spread around "healthy" or "weather", etc., the purpose or key subject of the call is related to "healthy" or "weather", these utterances are not determined as invalid utterances. In the technical solutions of other embodiments of the present application, after the step of determining whether the primary utterance in the call text content includes only the parlance, when the primary utterance in the call text content includes only the parlance, the semantic similarity between the primary utterance and the subsequent utterances is analyzed, and when the primary utterance and the subsequent utterances are combined into the same session, the primary utterance is determined to be an effective utterance.

outputting call recording prompt information at a call interface;

monitoring the operation of a user on the call interface;

In some embodiments of the present application, after detecting that an incoming call or an outgoing call is turned on, further comprising:

displaying a call voice analysis identifier on the call interface;

displaying the session theme and/or the category theme on the call interface according to the operation of the user on the call voice analysis identifier;

when the conversation interface displays the category theme;

and receiving the operation of the user on the category theme so as to configure the category theme as a call recording theme.

displaying a call record identifier on a call interface;

continuously caching call voice data in the call process;

Specifically, the current device refers to a current mobile communication terminal that is talking. Writing the cached call voice data into a call recording file according to a preset rule specifically comprises:

detecting the size of the cached call voice data in real time;

and when the cached call voice data is larger than a preset cache threshold value, writing the call voice data into the call recording file.

Further, in the above call recording control method, the step of parsing the call recording file to obtain one or more call topics specifically includes:

calling a voice recognition engine to convert call voice data in the call recording file into call text content;

performing cluster analysis on the session topics to obtain a plurality of session categories;

acquiring the number of sessions in each session category;

and determining the class theme of the effective session class as the call theme of the call record file.

It should be noted that in this document relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Embodiments in accordance with the present application, as described above, are not intended to be exhaustive or to limit the application to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best utilize the application and various modifications as are suited to the particular use contemplated. The application is limited only by the claims and the full scope and equivalents thereof.

Claims

1. A call recording control method is characterized by comprising the following steps:

acquiring the number of sessions in each session category;

2. The call recording control method according to claim 1, wherein the step of extracting the session subject of the call text content in units of sessions specifically comprises:

constructing a speaking topic list;

writing the current speaking subject to the tail of the speaking subject list;

repeating the steps from extracting a current speaking topic from the conversation text content of the current speaking of the conversation party to writing the current speaking topic into the tail of the speaking topic list until the semantic similarity of the current speaking topic and the last speaking topic is smaller than a preset semantic similarity threshold;

3. The call recording control method according to claim 2, wherein the step of determining a session topic of the current session from the talk topic list specifically includes:

，

wherein the method comprises the steps of1 to->Positive integer between>1 to->Positive integer between>For the dimension of each word vector,for the +.>Word vector of individual speech topics->Is>Values of individual dimensions>For the +.>Word vector of individual speech topics->Is>Values of individual dimensions>For the +.>Subject matter of individual speaking and->Semantic similarity between individual speech topics;

；

4. The call recording control method according to claim 3, wherein the step of performing cluster analysis on the session subject to obtain a plurality of session categories specifically includes:

calculating semantic similarity between every two session topics:

，

wherein the method comprises the steps of1 to->Positive integer between>Is->Word vector of individual session topics->Is>Values of individual dimensions>Is->Word vector of individual session topics->Is>Values of individual dimensions>Is->Personal session topic and->Semantic similarity between individual session topics;

5. The call recording control method according to claim 4, wherein the semantic similarity between two conversation topics is based onCalculating a cluster parameter->The method specifically comprises the following steps:

semantic similarity between every two session subjectsConverting into clustering distance:

，

calculating the average clustering distance of the session topics:

；

calculating a clustering parameter:

。

6. the call recording control method according to claim 5, wherein, according to the clustering parametersConfiguring the Density of each Session topic>The method specifically comprises the following steps:

density of each session topicInitializing to 0;

for the firstDensity of individual session topics>Traverse->Clustering distance of individual session topics from other session topics，/>1 to->Positive integer between->；

When (when)When in use, let->。

7. The call recording control method according to claim 1, characterized by further comprising, before the step of extracting a session subject of the call text content in units of sessions:

identifying a parquet phrase in the call text content;

and eliminating the invalid speech from the call text content.

8. The call recording control method according to claim 1, further comprising, after the step of performing correlation analysis on the category subjects of the active session category and call recording subjects of a recording subject library:

outputting call recording prompt information at a call interface;

monitoring the operation of a user on the call interface;

9. The call recording control method as claimed in claim 1, wherein the step of starting call recording specifically comprises:

displaying a call record identifier on a call interface;

continuously caching call voice data in the call process;

10. A mobile communication terminal comprising a memory and a processor executing a computer program stored in the memory to implement the call recording control method according to any one of claims 1 to 9.