CN115695622A

CN115695622A - Communication information record generation method and device in multimedia communication

Info

Publication number: CN115695622A
Application number: CN202110850370.4A
Authority: CN
Inventors: 王雷
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2023-02-03

Abstract

The application provides a communication information record generation method and device in multimedia communication. By implementing the method, the electronic equipment such as the mobile phone, the tablet personal computer and the intelligent large screen can determine key information such as keywords describing the video call according to the call content of the video call. In the process of video call, the electronic equipment can determine the target record content of the call according to the key information, and then generate a communication information record according to the target record content. Therefore, after the user finishes the video call, the user can check the communication information record, and therefore the user can quickly know the main content of the video call.

Description

Communication information record generation method and device in multimedia communication

Technical Field

The present application relates to the field of terminals, and in particular, to a method and an apparatus for generating a communication information record in multimedia communication.

Background

Currently, multimedia communication is widely applied, such as video call, video conference, live network, and so on. In multimedia communication, users often need to record some important content. For example, in a video conference, a user needs to record content related to the conference topic; in live webcasting, especially in live webcasting class scenes, a user needs to record classroom emphasis.

In order to avoid the disadvantage of manual recording, the electronic device can record the whole communication process. However, the communication information records obtained by implementing the method have serious redundancy and high storage cost, and are not beneficial to positioning key contents when viewed by a user.

Disclosure of Invention

The application provides a communication information record generation method and device in multimedia communication. The method can be used for electronic equipment such as mobile phones and tablet computers which can provide multimedia communication services. By implementing the method, the electronic equipment can determine key information related to target record content indicating the multimedia call by acquiring the context information, further determine target record information appearing in the multimedia call and generate the target record.

In a first aspect, the present application provides a communication information record generating method, which is applied to a first electronic device, and includes: starting a multimedia communication information recording function; confirming key information for identifying the content of the multimedia communication information record; acquiring call data generated by multimedia communication; identifying target record content in the call data according to the key information; and generating a communication information record according to the target record content.

Implementing the method provided by the first aspect, the electronic device may locate important content in call data generated by the multimedia call through the key information, and further generate a record containing the important content. Therefore, after the call is ended, the user can consult the record at any time to acquire important information related to the call.

With reference to the embodiments of the first aspect, in some embodiments, the confirming key information for identifying recording content of the multimedia communication information specifically includes: confirming key information for identifying the content of the multimedia communication information record using the call context information and/or the auxiliary data; the call context information includes: one or more of an application used before the call, content of the application used, time of the call, location; the assistance data comprises: one or more of a message record, a web browsing record, a mail message, a calendar, an electronic ticket, ticketing information, order information, search information, weather, an address book, a label for a name of a photo in an album, or a head portrait of a social contact.

Implementing the method provided by the above embodiment, the electronic device may determine the key information for identifying the content of the multimedia communication information record by using the context information such as the application used before the call, the content of the application used, and the like, and the auxiliary data such as the message record, the web browsing record, and the like. Through the key information, the electronic equipment can determine which call data is important and the important user wants to record and review the call data in a plurality of call data generated by the multimedia call.

With reference to the embodiments of the first aspect, in some embodiments, the determining key information for identifying content of the multimedia communication information record using the call context information and/or the auxiliary data specifically includes: performing semantic analysis on the call context information and/or the auxiliary data, and extracting one or more pieces of first key information for describing the content of the context information; and determining the first key information as key information for identifying the content of the multimedia communication information record.

With reference to the embodiments of the first aspect, in some embodiments, after the step of performing semantic analysis on the call context information and/or the auxiliary data to extract one or more pieces of first key information describing the content of the context information, the method further includes: performing association analysis on the first key information, and determining one or more key information strongly associated with the first key information as second key information; and determining the second key information as key information for identifying the recording content of the multimedia communication information.

By implementing the method provided by the embodiment, the electronic device can determine the key information according to the obtained context information and/or the auxiliary data, and can perform correlation analysis on the determined key information on the basis to determine other information strongly related to the determined key information, so as to enrich the content of the key information and improve the richness and accuracy of the electronic device positioning target record.

With reference to an embodiment of the first aspect, in some embodiments, the call data includes audio data and image data.

With reference to the embodiments of the first aspect, in some embodiments, the target recorded content in the call data includes a target recorded audio and a target recorded image, and the positioning the target recorded content in the call data according to the key information specifically includes: determining the content matched with the key information in the audio data as a target recording audio; detecting that the camera is switched or the image main body is changed, and acquiring an image acquired by the camera after the camera is switched or the image main body is changed; and determining one or more image frames as target recording images from the images acquired after switching or the image subject is changed.

By implementing the method provided by the embodiment, the electronic equipment can respectively position the target recording audio and the target recording image, namely, the important voice and characters in the call process are positioned according to the audio data, and the important image and video in the call process are positioned according to the video data.

With reference to the embodiments of the first aspect, in some embodiments, the target recorded content in the call data includes a target recorded audio and a target recorded image, and the identifying the target recorded content in the call data according to the key information specifically includes: determining the content matched with the key information in the audio data as a target recording audio; one or more image frames are determined from the image data within a time period associated with the target recorded audio as target recorded images.

By implementing the method provided by the above embodiment, according to the relevance of the video picture and the conversation in the conversation process, the electronic device may position the target recorded audio, then determine the video data of the same period according to the target recorded audio, and determine the target recorded image from the video data. Therefore, the electronic equipment can avoid analyzing each frame of image, further accurately position the image of the directory mark record, and simultaneously reduce the calculation cost.

With reference to the embodiments of the first aspect, in some embodiments, determining that the content in the audio data that matches the key information is a target recorded audio specifically includes: determining that the audio data at the first moment is matched with the key information; acquiring n frames of audio data before the first moment from a cache; determining m frames of audio data after the first moment according to the audio data, the n frames of audio data and the key information at the first moment; the audio data at the first moment, the n frames of audio data and the m frames of audio data are the target recording audio.

In combination with an embodiment of the first aspect, in some embodiments, the key information comprises: one or more of a keyword, a key image, a key voice.

With reference to an embodiment of the first aspect, in some embodiments, after the step of generating a record from the target record content, the method further comprises: displaying a first interface, wherein the first interface comprises a first control; and displaying the recorded content in response to the operation acted on the first control.

By implementing the method provided by the embodiment, the user can acquire the communication information record generated according to the call through the channel provided by the electronic equipment after the call is finished. Through the communication information record, the user can browse important contents in the call at any time.

In a second aspect, the present application provides an electronic device comprising one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors for storing computer program code comprising computer instructions which, when executed by the one or more processors, cause the electronic device to perform the method as described in the first aspect and any possible implementation of the first aspect.

In a third aspect, the present application provides a computer-readable storage medium, which includes instructions that, when executed on an electronic device, cause the electronic device to perform the method described in the first aspect and any possible implementation manner of the first aspect.

In a fourth aspect, the present application provides a computer program product containing instructions that, when run on an electronic device, cause the electronic device to perform the method as described in the first aspect and any possible implementation manner of the first aspect.

It is understood that the electronic device provided by the second aspect, the computer storage medium provided by the third aspect, and the computer program product provided by the fourth aspect are all used for executing the method provided by the present application. Therefore, the beneficial effects achieved by the method can refer to the beneficial effects in the corresponding method, and are not described herein again.

Drawings

FIG. 1 is a flow chart provided by an embodiment of the present application;

FIG. 2 is a flow chart provided by an embodiment of the present application;

fig. 3 is a schematic cache diagram according to an embodiment of the present application;

FIG. 4 is a flow chart provided by an embodiment of the present application;

fig. 5 is a hardware structure diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The terminology used in the following embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application.

In order to solve the problems of low efficiency and redundancy of the existing method for automatically generating the communication information record and improve the richness and readability of the content recorded in the communication information record, the embodiment of the application provides a method for generating the communication information record in multimedia communication. The method can be used for electronic equipment (electronic equipment A) such as a mobile phone, a tablet computer and the like which can provide multimedia communication services.

By implementing the method provided by the embodiment of the application, the electronic device a can acquire the context information and the auxiliary data, and determine the key information related to the target recording content according to the context information and the auxiliary data. Furthermore, during the process of conducting the multimedia call, the electronic device a may determine the target record information appearing in the multimedia call according to the key information. The electronic device a may then generate a record from the target record information and may present the record to the user. Here, the content in the recording includes text, and/or images, and/or audio, video clips. Therefore, through the recording, the user can check the target recorded content of the multimedia communication at any time.

The context information refers to state data of the electronic device a at the moment before or at the moment of recording information. In an embodiment of the present application, the context information includes: applications used before the call, content of the applications used, time of the call, location, etc.

The auxiliary data includes: one or more of a message record, a web browsing record, a mail message, a schedule, a calendar, an electronic ticket, ticketing information, order information, search information, weather, an address book, a tag of a name of a photo in an album, or an avatar of a social contact. The message record comprises a short message record, a multimedia message record and an instant message record.

The key information refers to data extracted from the context information and/or the auxiliary data and capable of expressing the user's intention, and includes one or more of a keyword, a key image, and a key voice. Meanwhile, the key information is an information set. That is, the key information may include one or more of a keyword, a key image, and/or a key voice. The key information may be used as a basis for positioning important contents in the multimedia call, that is, the electronic device a may determine which contents in the multimedia call are important contents through the key information.

Taking a multimedia communication scenario of a video call as an example, after detecting that a video call is opened, the electronic device a may obtain context information and/or auxiliary data, and determine a set of key information (including one or more keywords, key images, and/or key audio) expressing the user's intention from the context information and/or the auxiliary data.

During a video call, call data generated from the video call includes image data and audio data. Based on the key information, the electronic device a can locate and mark call content (including image data and audio data during the call) mentioned by the user in the video call and consistent with or related to the key information. The above-mentioned marked audio data may be referred to as target recording audio, and the above-mentioned marked image data may be referred to as a target recording image. The target recorded audio and the target recorded image are target recorded contents of multimedia communication, that is, information contents to be included in communication information records generated after the call is ended.

The method is not limited to the video call scene, and can also be used for multimedia communication scenes such as video conferences and live broadcast network courses. The embodiment of the present application is not limited to this.

The electronic device a may further include, but is not limited to, a mobile phone and a tablet computer, a Personal Computer (PC), a Personal Digital Assistant (PDA), a smart television, a smart wearable device, an Augmented Reality (AR) device, a Virtual Reality (VR) device, and the like. The electronic device a may also be other portable electronic devices such as a Laptop computer (Laptop) or the like. It should also be understood that in other embodiments, the electronic device a may be a desktop computer, etc., instead of the portable electronic device a.

Fig. 1 illustrates a flow of automatically generating a record by the electronic device a. Taking a video call scene as an example, a recording generation method in multimedia communication according to an embodiment of the present application will be specifically described below with reference to a flowchart shown in fig. 1.

S101: electronic device a detects that the user is engaged in a video call.

The user can make a video call through an application program installed on the electronic device a, which provides a communication service. The communication services include voice calls, video calls, and the like.

The method for generating the communication information record in the multimedia communication can be used as a system-level application to be called by the application program, so that the function of automatically generating the communication information record after the video call is completed is realized. In other optional implementation manners, the method may also be embedded in the application program, so as to implement a function of automatically generating a communication information record after completing a video call. The embodiment of the present application does not limit this.

The electronic device a may detect an operation of the user to perform a video call using the real-time communication application, and in response to the operation, the electronic device a may display a user interface to perform the video call. At this time, the electronic apparatus a confirms that the user is currently engaged in a video call.

For example, in the course of a video call service, electronic device a may display a video call icon in the interface. When a user operation acting on the video call icon is detected, the electronic device a may display a user interface for the video call in response to the operation. While displaying the user interface for the video call, the electronic apparatus a confirms that the user is performing the video call.

The electronic device a may also obtain the current device in the video call according to a bearer type negotiated in the video call protocol (for example, the bearer type includes video), or a hardware state such as a camera invoked by the communication application, and the like.

After confirming that the user opens the video call, the electronic device a may first determine key information for identifying the call content (S102), so that the electronic device a identifies the target recorded content during the call.

It can be understood that the communication information record generation method in multimedia communication provided by the embodiment of the present application is optional. That is, before the user makes a video call, the user may select to turn on the function or turn off the function. When the function is started, the electronic device a can automatically generate a communication information record after completing the video call. On the contrary, when the function is turned off, the electronic apparatus a performs the existing video call operation (after the video communication is ended, the electronic apparatus a does not generate the communication information record).

The method for generating the communication information record in the multimedia communication provided by the embodiment of the present application can be applied to other communication services, such as voice communication, without being limited to a video call scene. Specifically, after the user completes a voice call (telephone, internet phone), the electronic device a may generate a communication information record describing important contents of the call according to the voice call.

S102: the electronic device a obtains the context information and/or the auxiliary data and determines key information for identifying the video call information record.

The key information refers to data extracted from the context information and/or the auxiliary data and capable of being used for identifying the target recorded content, and includes one or more of a keyword, a key image and a key voice. Meanwhile, the key information is an information set. That is, the key information may include one or more keywords, key images, and/or key voices.

The electronic device a can identify the target record content in the multimedia communication service such as video call by the key information, so as to generate the communication information record describing the multimedia communication content.

Fig. 2 illustrates a process in which electronic device a determines key information from context information and/or assistance data.

First, S201 and S202 introduce a process in which the electronic device a determines key information from context information:

s201, the electronic device A acquires the context information.

With reference to the foregoing description, the context information includes: applications used before the call, content of the applications used, time of the call, location, etc.

In some embodiments, the electronic device a may also obtain context information related to an application providing the video call according to the application.

Specifically, when an application program of a user a (a user using the electronic device a) performing a video call with a user B provides an application program providing a communication service for a third party, the electronic device a may determine a context in which the video call occurs by acquiring a network chat record recorded in the application program. For example, after user a and user B open a video call using the electronic device, electronic device a may obtain a chat record of user a and user B. The chat log is context information before the user a and the user B open the video call.

When the user a and the user B use the "phone" application installed on the electronic device a, the electronic device a can determine the context in which the video call or the voice call occurs by acquiring the short message. Here, the short message is context information before the user a and the user B open the video call.

When the user A and the user B use the electronic equipment to carry out a video conference, the electronic equipment A can acquire the incoming and outgoing mails of the user A and the user B. The communication data included in the mail is context information.

Optionally, the electronic device a may obtain the content such as the short message, the network chat record, the email, and the like at the same time to determine the intention of the user to perform the video call, and further determine the key information describing the video call.

The electronic device a performs semantic analysis on the context information to determine key information indicating the context S202.

The electronic device a includes a semantic analysis module. The semantic analysis module is established based on a deep learning algorithm. Li Yongyu analysis module, after obtaining the context (sms message, network chat record), the electronic device a may determine the key information describing the content of the context. The deep learning algorithm used by the semantic analysis module is conventional and will not be described herein.

The key information includes a key word, a key image, and a key voice. Taking keywords as an example, the keywords include: keywords directly extracted from the context of the text type, and strongly associated words derived from the directly extracted keywords. For example, the keywords extracted from the context may include "beijing", and at this time, the strongly associated words derived from "beijing" include "roast duck", "beijing tourist attractions", and the like.

For example, user A has multiple communications with user B regarding the "Beijing tour" fact. The communication includes: the user A informs the user B that the user B can arrive at Beijing after 2 days, and the user B performs Beijing travel in 4 days; the user A requests the user B to help and book a hotel; user a requests user B to help make a travel strategy that travels in beijing, etc.

On the basis of the communication, the user A and the user B carry out a video call. At this time, after the video call is started, the electronic device a (a device used by the user a for the video call) can acquire the network chat record. The contents of the network chat log can describe the context of the video call. Further, through the above context, the electronic device a may determine key information describing the main content of the video call, such as keywords: "Beijing", "traveling", "2 days later", "4 days later", "hotel reservation", "traveling strategy". The keywords derived from the above keywords may further include: "hotel near Beijing Chong Chinese," entrance ticket of Beijing scenic spot "," the home palace "," water cube ", etc.

Likewise, electronic device a may extract key images and key voices from the context. For example, after the electronic device a acquires a series of voice type online chat records, the electronic device a may determine a key voice from the series of voice types. Here, the key voices include: a voice indicating time, place, article, a voice with a high frequency of occurrence, and the like.

The key images can refer to the keywords and the key voices described above, and are not described in detail here.

Next, S301 and S302 introduce the procedure in which the electronic device a determines key information from the auxiliary data:

s301, the electronic device A acquires auxiliary data.

The auxiliary data is preset. With reference to the foregoing description, in the embodiments of the present application, the auxiliary data may include a message record, a network browsing record, mail information, schedule, calendar, electronic ticket, ticket information, order information, search information, weather, address book, a mark of photo name in photo album, or head portrait of social contact, etc.

S302, the electronic device A carries out semantic analysis on the auxiliary data and determines key information reflecting the state or intention of the user.

After the electronic device a detects that the user starts the video call, the electronic device a may obtain preset auxiliary data.

Taking the ticket booking information of the high-speed rail as an example, after the video call is started, the electronic device a may read the short message in the electronic device a, and further determine that the electronic device a receives a ticket service short message (a notification short message) from the short message. Through the ticket short message, the electronic device a may also determine one or more pieces of key information.

For example, upon receiving the airline ticket note from shenzhen to beijing ordered by user a, electronic device a may determine the keyword "beijing". Optionally, the electronic device a may further determine keywords "shenzhen", "air ticket", "happy international airport", "subway 10 number line", "hotel", and the like, which are related to the electronic device a.

The auxiliary data may also include mail. Therefore, the electronic device A can also read the mail and determine one or more pieces of key information according to the content of the mail. For example, user B sends user a file of "exhibition hall initial design plan" by mail. The electronic device a may obtain the file by reading the mail, and further, the electronic device a may determine the key information from the file: keywords such as "exhibition hall", "design plan", "plan view", "effect", "construction time", "exhibition hall address", "material", and key pictures in a document (plan view, view 1 effect view, and the like).

Then, the electronic device a can locate the important content in the video call according to the key information to generate a concise record containing the important call content.

Preferably, the electronic device a can obtain the context information and the auxiliary information at the same time, and then determine the key information that is richer or is more capable of accurately positioning the target record.

When there are multiple users engaged in video call, the electronic device a may also determine in advance the users engaged in video call according to the auxiliary data.

For example, electronic device A may read the user's schedule. The above arrangement, for example, "scheme discussed with Li Gong, zhang Gong, xiao Zhao, 2021.7.1, 16. At this time, at 16.

Then, the electronic apparatus a can record speech from the same speaker using the same flag based on the sound characteristic information. Further, by a call in the conference, the electronic device a can determine the speaker name. Meanwhile, the electronic device a may also determine which user each ID specifically corresponds to according to the conference agenda. For example, electronic device a, through the conference agenda, may determine that: "16.

S103: the electronic device a acquires audio data and image data in a video call.

After the user starts the video call, the camera of the electronic device A can collect images and display the images on the display screen. The images displayed on the display screen also include images sent by the user B and received by the electronic device a. Meanwhile, a microphone of the electronic device a can collect the sound of the user a, and a speaker, a receiver or an earphone connected to the electronic device a of the electronic device a can play the audio sent by the electronic device used by the user B.

In the process of displaying images and playing conversation voice, the electronic device a can locate important audio data and image data, namely, target recorded audio and target recorded images, according to predetermined key information.

At this time, the electronic device a may buffer the image data (the image captured by the electronic device a, the image transmitted by the electronic device receiving the user B), and the audio data (the sound of the user a captured by the electronic device a, the audio transmitted by the electronic device receiving the user B).

Optionally, the electronic device a may locate content in the image data and the audio data in real time. Specifically, taking audio data as an example, when an audio stream is sent to a device such as a speaker and a receiver for playing, the electronic device a may perform a similar analysis on the audio stream, and determine whether the currently played audio is consistent with or related to predetermined key information. If the currently played audio is consistent with or related to the predetermined key information, the electronic device a may mark the current audio and the audio content related to the current audio as the target recorded audio.

The cache can be used for backtracking, namely determining the audio content which is before the current audio and is related to the current audio.

In the above process, the process of playing the audio stream to the specified device and the process of locating the audio stream to determine the important content are performed simultaneously by the electronic device a.

In other embodiments, the electronic device a may also adopt a non-real-time method to locate important content in the video call. Similarly, taking the audio data processing method as an example, the electronic device a may further segment the real-time audio stream by a preset window length, and generate a segment of audio data. Then, the electronic device a may analyze the buffered audio data of the segment to determine that the audio data generated during the video call is consistent with or related to the predetermined key information.

Fig. 3 shows a schematic diagram of the electronic device a caching in segments according to a preset duration. As shown in fig. 3, X1 may represent a time axis of a video call. T0 may represent the time at which the video call is started. T1, T2, T3 and T4 are all certain time nodes in the video call process. Wherein the time length between T0-T1, T1-T2, T2-T3 and T3-T4 is a preset time length T.

For example, the electronic device a may buffer image data and audio data generated by a video call of a user in segments for 3 minutes, that is, T =3min. After the call starts (T0), the electronic apparatus a may start buffering the image data and the audio data generated by the call. At the same time, electronic device a starts timing. When the timing time reaches the preset time duration of 3 minutes, the electronic device a may determine the image data and the audio data within the time duration of 3 minutes as a buffer, refer to X21 in fig. 3. The electronic device a may then continue to buffer the data generated by the user video call, e.g., X22, X23, X24, for a period of 3 minutes until the call is over. It is understood that the last buffered duration may not be required to be a predetermined duration after the call is over.

In this way, during a video call, electronic device a may obtain multiple caches of recorded call content. After determining a cache, the electronic device a may analyze the call content according to the key information determined in S102, determine important information in the call process, and further generate a record including the important information.

S104: the electronic equipment A positions the target recording audio and the target recording image in the video call according to the key information.

Since the video call is more active, there is very much redundant information in the audio data and the image data of the video call collected by the electronic device a. In order to improve the quality of the generated record, it is necessary to screen the audio data and the image data, determine data reflecting important call content, and generate a communication information record with redundancy removed according to the data.

After the key information is determined and the audio data and the image data of the video call are acquired, the electronic device 100 may determine and mark the target recording audio and the target recording image in the video call according to the key information. The marked target recording audio and the target mark recording image are the content in the communication information record generated after the video call.

Alternatively, the electronic device a may first locate the audio data and then locate the image data according to the timestamp of the audio data. In another alternative embodiment, electronic device A may simultaneously locate audio data and image data, respectively.

Fig. 4 exemplarily shows a flow that the electronic device a first locates the audio data and then locates the image data according to the time stamp of the audio data.

S401, the electronic equipment A locates the target and records the audio.

First, the electronic device a may convert the buffered audio data into text data using a speech recognition algorithm. The text data is a preliminary record. At this time, the preliminary recording is not distinguished from the existing recording by voice conversion. Therefore, the preliminary recording at this time has a large amount of redundant data.

Examples of the speech recognition algorithm include a Dynamic Time Warping (DTW) algorithm, a Vector Quantization (VQ) algorithm, a Hidden Markov Model (HMM), an Artificial Neural Network (ANN), and a Deep Neural Network (DNN). The embodiment of the present application does not limit this.

After obtaining the preliminary record, the electronic device a may perform semantic analysis on the text. First, electronic device A may use a matching algorithm to locate key information in the records. The above matching algorithm is existing and is not developed here. Then, according to syntactic parsing (syntactical parsing), the electronic device a can determine a sentence in which the key information is located, and a context sentence associated with the sentence, thereby expressing the complete sentence. The statement in which the above-mentioned key information is located, and/or the context statement in which the statement is located may be referred to as a target statement.

The above syntactic analysis refers to determining a syntactic structure (syntactical structure) of a sentence or a dependency relationship between words in the sentence. The syntax analysis includes statistical syntax analysis based on a Probabilistic Context Free Grammar (PCFG), and the like. The syntax analysis is a part of the semantic analysis module, and is not described herein again.

For example, in a video call, the electronic device a recognizes the content of the conversation between the user a and the user B, and obtains: the roast duck can be eaten at noon 12 hours by gathering all the ingredients. 14 in the afternoon: and 5, providing an ancient shopping at 00 points. Then, 19 pm: the 00 o 'clock water cube has a performance and can see the contents of' and the like. The aforementioned determined key information ("roast duck", "water cube", etc.) is included in the aforementioned contents, and thus, the aforementioned contents can be labeled as a target sentence.

In other embodiments, the audio data during the call may not be converted into text data, i.e., the electronic device may directly locate the target recording audio and the target recording image based on the audio data. In particular, at this time, the electronic device is configured to determine that the key information of the audio data of the videotape is the key audio. Furthermore, the important audio data can be directly displayed in the generated record.

Optionally, after the target sentence is determined, the electronic device a may further obtain real-time auxiliary data such as time and location. The real-time auxiliary data can be used for expanding key information. For example, when the current location indicates that the user is near a landmark building, electronic device a may add the landmark building to the range of the key information. That is, the specific content of the key information varies and is not limited to being obtained from the context. The electronic device may also determine one or more key information during the video call by the user.

The electronic device a can also periodically acquire real-time auxiliary data such as time, position and the like during the video call. Therefore, the real-time auxiliary data can be used for identifying and judging the key information, and the accuracy is improved.

In some embodiments, the target sentence is a target recording audio, that is, an audio to be displayed in the communication information record generated by the electronic device a or a text message converted into a text display.

Preferably, the electronic device a may further process the target sentence through pragmatic analysis, so as to make the target sentence simpler and more book-oriented. The pragmatic analysis is part of a semantic analysis module. The pragmatic analysis refers to analyzing the context, language background, context, etc. of the target sentence, i.e. extracting additional information such as image, interpersonal relationship, etc. from the structure of the text, thereby forming a dynamic ideographic structure.

Referring to the above example, through pragmatic analysis, the above target sentence can be simplified as: "the roast duck is eaten at 12 am in order to gather completely", "14 pm: point 00 to the summer palace "," night 19:00 o 'clock water removal cube looks at the performance' and the like. Therefore, the content recorded in the record is removed from the activating expression, and the record is more concise and refined. Therefore, when browsing the above records, the user can more intuitively and quickly acquire the target recording audio and the target recording image in the conversation.

S402, the electronic device A determines a target recorded image according to the target recorded audio.

After marking the target recorded audio data, the electronic apparatus a may determine important image data from the time stamp of the marked target recorded audio data. In particular, in a localization method for converting audio data into text data, important image data is determined based on a time stamp of a target recording audio data of a mark, that is: and determining the image data corresponding to the time stamp as the target recording image according to the time stamp of the marked important text data.

After determining the target recorded audio, the electronic device a may determine a target recorded image from the target recorded audio. Specifically, the electronic device a may determine an image stream corresponding to the target recorded audio according to a timestamp of the target recorded audio. By filtering the image stream, the electronic apparatus a can specify a target recording image (image data in a recording generated by the electronic apparatus a).

The time stamp of the target record includes a start time stamp and an end time stamp. When the electronic device a determines that the target recorded audio is recorded from the preliminary recording, the electronic device a may mark a time stamp of a starting point of the target recorded audio as a starting time stamp; the timestamp of the target recorded audio end point is marked as the end timestamp.

From the start time stamp and the end time stamp, electronic device a may determine an image stream corresponding to the target recorded audio. At this point, the image stream includes a series of successive image frames. Due to the limitation of the image data by the finally generated record, the record cannot be presented nor is it required to present the series of consecutive image frames. Therefore, the electronic apparatus a needs to determine one or more frames having a typical representative from the above-described series of continuous image frames as the target recording image.

In some embodiments, the electronic device a may determine one or more clear image frames as the target recording image from the image stream. The content definition refers to that motion blur, compression blur and Gaussian blur in the image are within a preset acceptable range.

Further, in other embodiments, the electronic device a may further perform image recognition on one or more selected image frames with clear contents, determine the association between the image frames and the corresponding target recording audio, and further select fewer more representative image frames from the image frames as the target recording images.

Of course, the electronic apparatus a may directly determine the image stream corresponding to the target recorded audio as the target recorded image. In this way, the generated communication information record can directly display the image stream, i.e. the video-type file.

Next, a method by which the electronic apparatus a can simultaneously separately locate the audio data and the image data will be described.

In this method, electronic device a can simultaneously record audio based on audio stream targeting and record images based on image stream targeting. The process of the electronic device a recording audio according to the audio stream positioning target can refer to the foregoing description, and is not described herein again.

In the process that the electronic device A positions the target recording image according to the image stream, the electronic device A can determine the target recording image according to the transformation of the camera. Specifically, the electronic device a may detect whether an image sent by the device of the user B is captured by a front-facing camera or a rear-facing camera. When the image sent by the device of the user B is collected by the rear camera, the electronic device a may mark the image as a target recording image. This is often because: when a video call is performed, the electronic device generally uses a front camera to capture an image, and when one user needs to show a certain event, an article, or the like to another user, the user switches the front camera to a rear camera.

The method and the device for determining the target recorded image are not limited to converting the front camera into the rear camera, and in other communication scenes, the electronic device a may also be triggered to determine the target recorded image by switching the lens of the front camera with the rear camera.

For example, when "good, that is we later include this set of tree edges at east", the user will typically switch the rear camera to capture the image when the target recording audio includes. The electronic device a may determine one or more clear image frames from the image stream collected by the switched rear camera. Further, the electronic device a may identify whether the selected one or more content-specific image frames have a tree. Therefore, the electronic apparatus a can select an image frame including a tree from the above image frames as a target recording image.

Thus, the record may include the image containing the tree. The user can open the record generated according to the video call at any time, so that the meeting place appointed with the friend is determined through the image containing the tree at any time.

In other embodiments, electronic device a may determine the target recording image based on identifying a change in the subject of the image content. For example, when the electronic apparatus a recognizes that the main person in the image is turned from the person a to the person B, the electronic apparatus a may determine that the image containing the person B is the target recorded image.

S105: electronic device a generates a record of the video call.

After determining the target recorded audio and the target recorded image, the electronic apparatus a may combine the target recorded audio and the target recorded image in chronological order, thereby generating a final recording.

Wherein the presentation form of the target recorded audio in the generated communication information record comprises: audio presented directly, and/or audio content converted to a textual display. When the target recording image is a continuous image stream, the communication information recording can directly display the image stream in the form of a video or a moving image. When the target recorded image is a plurality of independent image frames, and the image contents reflected by the image frames are greatly different, the communication information record can show the target recorded image in the form of pictures.

The electronic device a may then display the final record. For example, the electronic device a may display an icon of a record generated according to the video call after displaying a dialog box of the video call. The electronic device a can detect the operation of the user clicking the icon of the record, and in response to the operation, the electronic device a can display the record, that is, the important call content and the video image marked by the electronic device a in the video call process.

Fig. 5 shows a schematic structural diagram of the electronic device a.

The electronic device a may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) device, a Virtual Reality (VR) device, an Artificial Intelligence (AI) device, a wearable device, a vehicle-mounted device, a smart home device, and/or a smart city device, and the specific type of the electronic device is not particularly limited by the embodiment of the present application.

The electronic device a may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like. Wherein the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device a. In other embodiments of the present application, electronic device a may include more or fewer components than illustrated, or some components may be combined, or some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus including a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface, thereby implementing a touch function of the electronic device a.

The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 through an I2S bus, enabling communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through the I2S interface, so as to implement a function of receiving a call through a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.

MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device a. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device a.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, and the like.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device a, and may also be used to transmit data between the electronic device a and a peripheral device. And the method can also be used for connecting a headset and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices and the like.

It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only an exemplary illustration, and is not a structural limitation of the electronic device a. In other embodiments of the present application, the electronic device a may also adopt different interface connection manners or a combination of multiple interface connection manners in the foregoing embodiments.

The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device a. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In other embodiments, the power management module 141 may be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.

The wireless communication function of the electronic device a may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in electronic device a may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied to the electronic device a. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, and may perform filtering, amplification, and other processing on the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

The wireless communication module 160 may provide a solution for wireless communication applied to the electronic device a, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

In some embodiments, the antenna 1 of the electronic device a is coupled with the mobile communication module 150 and the antenna 2 is coupled with the wireless communication module 160 so that the electronic device a can communicate with a network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

In the embodiment of the present application, the communication modules such as the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor support the electronic device a to provide multimedia communication services, such as video calls and video conferences. In addition, some data acquired by the electronic device a to help determine the key information also depends on the communication module, for example, the electronic device a acquires communication data such as short messages, network chat records, mails to help determine the key information.

The electronic device a implements a display function through the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, electronic device A may include 1 or N display screens 194, N being a positive integer greater than 1.

In the embodiment of the present application, the display modules such as the GPU, the display screen 194, and the application processor may support the electronic device a to provide display services, including displaying a user interface and video data in multimedia communication.

The electronic device a may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, and the application processor.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV and other formats. In some embodiments, electronic device a may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device a selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. Electronic device a may support one or more video codecs. Thus, the electronic device a can play or record videos in a plurality of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

In the embodiment of the present application, the image capturing and processing modules such as the ISP, the camera 193, the video codec, the GPU, the display screen 194, and the application processor can support the electronic device a to provide the shooting service. In the multimedia communication process, such as a video call and a video conference process, the related video data can be obtained through the image acquisition and processing module.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can realize applications such as intelligent cognition of the electronic device a, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The internal memory 121 may include one or more Random Access Memories (RAMs) and one or more non-volatile memories (NVMs).

The random access memory may include static random-access memory (SRAM), dynamic random-access memory (DRAM), synchronous dynamic random-access memory (SDRAM), double data rate synchronous dynamic random-access memory (DDR SDRAM), such as fifth generation DDR SDRAM generally referred to as DDR5 SDRAM, and the like;

the nonvolatile memory may include a magnetic disk storage device, a flash memory (flash memory).

The FLASH memory may include NOR FLASH, NAND FLASH, 3D NAND FLASH, etc. according to the operation principle, may include single-level cell (SLC), multi-level cell (MLC), triple-level cell (TLC), quad-level cell (QLC), etc. according to the level order of the memory cell, and may include universal FLASH memory (UFS), embedded multimedia memory Card (embedded multi media Card, eMMC), etc. according to the storage specification.

The random access memory may be read directly by the processor 110, may be used to store executable programs (e.g., machine instructions) for an operating system or other programs in operation, and may also be used to store data for user and application programs, etc.

The nonvolatile memory may also store executable programs, data of users and application programs, and the like, and may be loaded into the random access memory in advance for the processor 110 to directly read and write.

In the embodiment of the present application, the electronic device a may buffer video data and audio data in the multimedia communication process through the internal memory 121.

The external memory interface 120 may be used to connect an external nonvolatile memory, so as to expand the storage capability of the electronic device a. The external non-volatile memory communicates with the processor 110 through the external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are saved in an external nonvolatile memory.

The electronic device a may implement an audio function through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal for output, and also used to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The electronic apparatus a can listen to music through the speaker 170A, or listen to a handsfree call.

The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic device a receives a phone call or voice information, it can receive voice by placing the receiver 170B close to the ear.

The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking near the microphone 170C through the mouth. The electronic device a may be provided with at least one microphone 170C. In other embodiments, the electronic device a may be provided with two microphones 170C, which may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device a may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

In the embodiment of the present application, the electronic device a may obtain the sound of the user a through the microphone 170C; the electronic device a plays the sound of the user B through the speaker 170A, the 180A earphone interface 170D connected to a wired earphone or a wireless earphone connected to a communication module.

The pressure sensor 180A is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. There are many types of pressure sensors, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device a determines the intensity of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus a detects the intensity of the touch operation according to the pressure sensor 180A. The electronic apparatus a may also calculate the position of the touch from the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 180B may be used to determine the motion attitude of the electronic device a. In some embodiments, the angular velocity of electronic device a about three axes (i.e., x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyroscope sensor 180B detects a shake angle of the electronic device a, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device a through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, electronic device a calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 180C.

The magnetic sensor 180D includes a hall sensor. The electronic device a may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device a is a flip, the electronic device a may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.

The acceleration sensor 180E can detect the magnitude of acceleration of the electronic device a in various directions (generally, three axes). When the electronic device a is stationary, the magnitude and direction of gravity can be detected. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and the like.

A distance sensor 180F for measuring a distance. The electronic device a may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, electronic device a may range using distance sensor 180F to achieve fast focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device a emits infrared light to the outside through the light emitting diode. The electronic apparatus a detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device a. When insufficient reflected light is detected, the electronic apparatus a can determine that there is no object near the electronic apparatus a. The electronic device a can detect that the user holds the electronic device a close to the ear for talking by using the proximity light sensor 180G, so as to automatically turn off the screen and achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode, automatic unlock and lock screen.

The ambient light sensor 180L is used to sense the ambient light level. The electronic device a may adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device a is in a pocket to prevent accidental touches.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic equipment A can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering and the like.

The temperature sensor 180J is used to detect temperature. In some embodiments, electronic device a implements a temperature processing strategy using the temperature detected by temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device a performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device a heats the battery 142 to avoid abnormal shutdown of the electronic device a due to low temperature. In other embodiments, when the temperature is lower than a further threshold, the electronic device a performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.

The touch sensor 180K is also called a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor may communicate the detected touch operation to the application processor to determine a touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on the surface of the electronic device a, different from the position of the display screen 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor 180M may also be disposed in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone block vibrated by the sound part obtained by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.

In the embodiment of the present application, the real-time auxiliary information or the non-real-time auxiliary information acquired by the electronic device a may be implemented by one or more sensors described above.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic device a may receive a key input, and generate a key signal input related to user setting and function control of the electronic device a.

The motor 191 may generate a vibration cue. The motor 191 may be used for both an incoming call vibration prompt and a touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects in response to touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, information receiving, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be attached to and detached from the electronic device a by being inserted into the SIM card interface 195 or being pulled out from the SIM card interface 195. The electronic device A can support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic equipment A realizes functions of conversation, data communication and the like through the interaction of the SIM card and the network. In some embodiments, electronic device a employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device a and cannot be separated from the electronic device a.

In the embodiments of the present application:

in S202 or S302, the key information such as the keyword and the key image extracted from the call context information and/or the auxiliary data may be referred to as first key information, for example, the keyword extracted from the chat history: "Beijing", "travel"; extracting derived key information that is strongly related to the first key information according to the first keyword may be referred to as second key information, for example, extracting key information that is strongly related to the first key information according to the keyword: the keywords derived from Beijing and travel are Beijing sight ticket, tiananmen, and Imperial palace.

By implementing the method for generating the communication information record in the multimedia communication, the electronic equipment such as the mobile phone and the tablet personal computer can determine key information such as keywords describing the video call according to the context information and/or auxiliary data of the video call. During the video call, the electronic device can determine the important content (text or image) of the call according to the key information, and then generate a communication information record according to the important content. Therefore, after the user finishes the video call, the user can check the communication information record, so that the user can quickly know the main content of the video call, and the user can conveniently record and acquire the important content in the call process.

As used in the specification of the present application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the listed items. As used in the above embodiments, the term "when …" may be interpreted to mean "if …" or "after …" or "in response to determination …" or "in response to detection of …", depending on the context. Similarly, depending on the context, the phrase "at the time … is determined" or "if (a stated condition or event) is detected" may be interpreted to mean "if … is determined" or "in response to … is determined" or "at the time (a stated condition or event) is detected" or "in response to (a stated condition or event) being detected".

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Claims

1. A communication information record generation method applied to a first electronic device is characterized by comprising the following steps:

starting a multimedia communication information recording function;

confirming key information for identifying the content of the multimedia communication information record;

acquiring call data generated by the multimedia communication;

identifying target record content in the call data according to the key information;

and generating a communication information record according to the target record content.

2. The method according to claim 1, wherein the confirming key information for identifying the content of the multimedia communication record comprises:

confirming key information for identifying the multimedia communication information record content by using call context information and/or auxiliary data; the call context information includes: one or more of an application used before the call, content of the application used, time of the call, location; the assistance data comprises: one or more of a message record, a web browsing record, a mail message, a schedule, a calendar, an electronic ticket, ticketing information, order information, search information, weather, an address book, a tag of a name of a photo in an album, or an avatar of a social contact.

3. The method according to claim 2, wherein the using the call context information and/or the auxiliary data to identify key information for identifying the content of the multimedia communication record comprises:

performing semantic analysis on the call context information and/or the auxiliary data, and extracting one or more pieces of first key information describing the content of the context information;

and determining the first key information as key information for identifying the content of the multimedia communication information record.

4. The method according to claim 3, wherein after the step of performing semantic analysis on the call context information and/or the auxiliary data to extract one or more first key information describing the context information content, the method further comprises:

performing association analysis on the first key information, and determining one or more key information strongly associated with the first key information as second key information;

and determining the second key information as key information for identifying the recording content of the multimedia communication information.

5. The method according to any one of claims 1-4, wherein the call data comprises audio data and image data.

6. The method according to claim 5, wherein the target recorded content in the call data includes a target recorded audio and a target recorded image, and wherein the locating the target recorded content in the call data according to the key information specifically includes:

determining the content matched with the key information in the audio data as target recording audio;

detecting that the camera is switched or the image main body is changed, and acquiring an image acquired by the camera after the camera is switched or the image main body is changed;

and determining one or more image frames as target recording images from the images acquired after switching or the image subject is changed.

7. The method according to claim 5, wherein the target recorded content in the call data includes a target recorded audio and a target recorded image, and the identifying the target recorded content in the call data according to the key information specifically includes:

determining one or more image frames as target recording images from image data within a time period associated with the target recording audio.

8. The method according to claim 7, wherein the determining that the content in the audio data that matches the key information is a target recorded audio specifically comprises:

determining that the audio data at the first moment is matched with the key information;

acquiring n frames of audio data before the first moment from a buffer;

determining m frames of audio data after the first moment according to the audio data at the first moment, the n frames of audio data and the key information;

and the audio data at the first moment, the n frames of audio data and the m frames of audio data are the target recorded audio.

9. The method according to any of claims 1-8, wherein the key information comprises: one or more of a keyword, a key image, and a key voice.

10. The method of any of claims 1-9, wherein after the step of generating a record from the target recording content, the method further comprises:

displaying a first interface, wherein the first interface comprises a first control;

and responding to the operation acted on the first control, and displaying the recorded content.

11. An electronic device comprising one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors for storing computer program code comprising computer instructions which, when executed by the one or more processors, cause performance of the method recited in any of claims 1-10.

12. A computer program product comprising instructions for causing an electronic device to perform the method according to any one of claims 1-10 when the computer program product is run on the electronic device.

13. A computer-readable storage medium comprising instructions that, when executed on an electronic device, cause performance of the method of any of claims 1-10.