WO2024078210A1

WO2024078210A1 - Memo reminding method and apparatus, and terminal and storage medium

Info

Publication number: WO2024078210A1
Application number: PCT/CN2023/117394
Authority: WO
Inventors: 曾理; 王立中; 米岚
Original assignee: Oppo广东移动通信有限公司
Priority date: 2022-10-12
Filing date: 2023-09-07
Publication date: 2024-04-18
Also published as: CN115526602A

Abstract

The embodiments of the present application belong to the technical field of human-computer interaction. Disclosed are a memo reminding method and apparatus, and a terminal and a storage medium. The method comprises: when there is a memo record requirement, acquiring memo content (201); on the basis of a content mode corresponding to the memo content, performing information extraction on the memo content, so as to obtain key information (202); storing the memo content and the key information in an associated manner (203); and when it is determined on the basis of the key information that a memo reminding trigger condition is met, performing memo reminding on the basis of the memo content (204). Memory information required by a user often has multi-content modes, and therefore by means of the present solution, information extraction is performed regarding different content modes, and key information is stored, thereby enriching memo content and also improving the efficiency and quality of memo reminding.

Description

Memo reminder method, device, terminal and storage medium

This application claims priority to Chinese patent application No. 202211249686.9, filed on October 12, 2022, and entitled “Memo Reminder Method, Device, Terminal and Storage Medium”, the entire contents of which are incorporated by reference into this application.

Technical Field

The embodiments of the present application relate to the field of human-computer interaction technology, and more particularly to a memo reminder method, device, terminal and storage medium.

Background technique

With the development of society, people's work and life are becoming more and more abundant, and the information that needs to be processed is increasing, which brings about a heavy memory task. Based on the rapid development of digital science, people are becoming accustomed to using the powerful storage function of smart terminals to assist memory and improve work and life efficiency.

In the related art, the terminal takes a note of the text or image input by the user, and the note content is limited to a single mode, which limits the flexibility of the user's note and affects the efficiency of human-computer interaction.

Summary of the invention

The embodiment of the present application provides a memo reminder method, device, terminal and storage medium. The technical solution is as follows:

On the one hand, an embodiment of the present application provides a memo reminder method, which is executed by a terminal and includes:

If there is a need for memo recording, obtain the memo content;

Based on the content mode corresponding to the memo content, extract information from the memo content to obtain key information, wherein the information extraction method is different under different content modes;

storing the memo content and the key information in association with each other;

When it is determined based on the key information that the memo reminder triggering condition is met, a memo reminder is performed based on the memo content.

On the other hand, an embodiment of the present application provides a memo reminder device, the device comprising:

The acquisition module is used to obtain the memo content when there is a need for memo recording;

An information extraction module, configured to extract information from the memo content based on the content mode corresponding to the memo content to obtain key information, wherein different information extraction methods are used in different content modes;

A storage module, used for associating and storing the memo content and the key information;

The memo reminder module is used to make a memo reminder based on the memo content when it is determined based on the key information that a memo reminder triggering condition is met.

On the other hand, an embodiment of the present application provides a terminal, which includes a processor and a memory; the memory stores at least one program, and the at least one program is used to be executed by the processor to implement the memo reminder method as described in the above aspect.

On the other hand, an embodiment of the present application provides a computer-readable storage medium, wherein the storage medium stores at least one program, and the at least one program is used to be executed by a processor to implement the memo reminder method as described in the above aspects.

On the other hand, an embodiment of the present application provides a computer program product, which includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the memo reminder method provided in the above aspect.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 shows a block diagram of a terminal provided by an exemplary embodiment of the present application;

FIG2 shows a flow chart of a memo reminder method provided by an exemplary embodiment of the present application;

FIG3 is a schematic diagram showing a memo reminder method according to an exemplary embodiment of the present application;

FIG4 shows a schematic diagram of actively acquiring memo content provided by an exemplary embodiment of the present application;

FIG5 is a schematic diagram showing a method of obtaining an active trigger memo provided by an exemplary embodiment of the present application;

FIG6 shows a schematic diagram of obtaining a passive trigger memo provided by an exemplary embodiment of the present application;

FIG. 7 is an exemplary visual modal memo content of the present application;

FIG8 is a schematic diagram showing an active trigger reminder provided by an exemplary embodiment of the present application;

FIG9 shows a flowchart of a passive trigger reminder provided by an exemplary embodiment of the present application;

FIG10 is a schematic diagram showing a memo combining spatiotemporal information provided by an exemplary embodiment of the present application;

FIG11 is a schematic diagram showing a method of acquiring extended content provided by an exemplary embodiment of the present application;

FIG. 12 shows a structural block diagram of a memo reminder device provided in one embodiment of the present application.

Detailed ways

In order to make the objectives, technical solutions and advantages of the present application more clear, the implementation methods of the present application will be further described in detail below with reference to the accompanying drawings.

The term "multiple" as used herein refers to two or more than two. "And/or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the related objects are in an "or" relationship.

Please refer to FIG1 , which shows a block diagram of a terminal provided by an exemplary embodiment of the present application. The terminal 100 may include one or more of the following components: a processor 110 and a memory 120 .

The processor 110 may include one or more processing cores. The processor 110 uses various interfaces and lines to connect various parts within the entire terminal 100, and executes various functions of the computer device 100 and processes data by running or executing instructions, programs, code sets or instruction sets stored in the memory 120, and calling data stored in the memory 120. Optionally, the processor 110 can be implemented in at least one hardware form of digital signal processing (DSP), field programmable gate array (FPGA), and programmable logic array (PLA). The processor 110 can integrate one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), a neural network processor (NPU), and a modem. Among them, the CPU mainly processes the operating system, user interface, and application programs; the GPU is responsible for rendering and drawing the content to be displayed on the touch display; the NPU is used to implement artificial intelligence (AI) functions; and the modem is used to process wireless communications. It is understandable that the above-mentioned modem may not be integrated into the processor 110, but may be implemented by a separate chip.

The memory 120 may include a random access memory (RAM) or a read-only memory (ROM). Optionally, the memory 120 includes a non-transitory computer-readable storage medium. The memory 120 may be used to store instructions, programs, codes, code sets, or instruction sets. The memory 120 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), instructions for implementing the following various method embodiments, etc.; the data storage area may store data created according to the use of the computer device 100 (such as audio data, a phone book), etc.

In some embodiments, the terminal may further include a display screen 130 and a microphone 140. The display screen 130 is a component for displaying images. The display screen 130 may be a built-in screen of the terminal, such as a screen of a smart phone, or an external screen of the terminal, such as an external display of a personal calculator.

In some embodiments, the display screen 130 has a touch function in addition to the image display function, that is, the display content can be controlled by touching and clicking the display screen 130.

The microphone 140 is a component for collecting external sounds. In the embodiment of the present application, the terminal 100 supports the user to perform human-computer interaction in the memo reminder scenario through voice commands, and the microphone 140 can be used to collect user voice audio information for memo reminder.

In addition, those skilled in the art will appreciate that the structure of the terminal 100 shown in the above figures does not limit the computer device, and the computer device may include more or fewer components than shown in the figure, or combine certain components, or arrange the components differently. For example, the terminal 100 may also include a camera component, a speaker, a radio frequency circuit, an input unit, a sensor, and a plurality of other components. Devices (such as acceleration sensors, angular velocity sensors, light sensors, etc.), audio circuits, Wi-Fi modules, power supplies, Bluetooth modules and other components are not described here.

Please refer to Figure 2, which shows a flow chart of a memo reminder method provided by an exemplary embodiment of the present application. The method may include the following steps.

Step 201, when there is a need to record a memo, obtain the memo content.

In response to the user's memo operation, the terminal obtains the corresponding memo content, wherein the memo content can be one of text, image, video, audio or a combination of multiple thereof, that is, the memo content can be multimodal. The memo content will be described below in conjunction with some application scenarios. As shown in FIG3 , in the scenario of web browsing, the memo content can be digital content, such as online products, news reports, e-books, and URLs (Uniform Resource Locator), etc.; in the scenario of convenient travel, the memo content can be map navigation data and screenshots indicating the route, and can also include the real physical coordinates of attractions, restaurants, etc., official websites, introduction pictures and audio, guide text, etc.; in the scenario of convenient life, the memo content can be schedule information, itinerary information, and note text, etc.

In a possible implementation, the need for memo recording may be that the terminal receives a voice instruction for memo recording, and accordingly, the terminal obtains the memo content indicated by the voice instruction for memo recording. The terminal performs ASR (Automatic Speech Recognition) processing on the voice audio information contained in the received voice instruction for memo recording, determines the memo content indicated by the voice instruction for memo recording, and further performs corresponding acquisition operations based on the memo content modality. For example, in the case where the memo content is voice input text, the terminal extracts the corresponding text from the text obtained by ASR processing as the memo content. In the case where the voice instruction for memo recording indicates that the memo content is a picture, the terminal obtains the corresponding memo content by taking a screenshot, shooting or saving a picture, and when the user instructs to take a memo for the current web page, the terminal obtains the web page link corresponding to the web page as the memo content.

Illustratively, in the case of a memo reminder based on a memo recording voice command, the terminal obtains the user's memo recording voice command "Remind me to catch this flight" in response to the user's awakening voice, and the terminal determines that the memo content of the memo recording voice command is flight information and itinerary information through ASR and other processing methods. Then, the terminal obtains the currently displayed flight information picture through a screenshot tool, and obtains the flight information description text as the itinerary text. The above flight information picture and itinerary text are both memo contents.

Optionally, the need for memo recording may also be that the terminal obtains user behavior information and the user behavior information meets the memo conditions, and accordingly, the terminal determines the memo content based on the user behavior information. In work and life, the terminal actively understands user behavior based on perceived user voice, operation behavior, action behavior, scenario information, etc., and actively obtains the memo content based on the judgment of user needs. The memo can be completed without the user waking up the device, avoiding the user forgetting to perform the memo operation and missing information, and bringing a natural and seamless interactive experience to the user.

Schematically, as shown in FIG4 , in a scenario where the terminal perceives based on the motion sensor and positioning component that the user is jogging outdoors, the terminal actively understands the user's action behavior and situational information, and determines that the user has the need to store the current jogging route, and then automatically obtains the route information and coordinate information as memo content.

It should be noted that in order to protect the privacy of users, when obtaining the memo content, if the terminal needs to use sensors with permission restrictions such as cameras and recorders, the terminal can trigger the sensor to turn on and obtain the memo content based on the user's memo record voice command, or it can actively ask for the user's permission by displaying a reminder pop-up window, and turn on the sensor and obtain the memo content based on the user's positive feedback.

Step 202, based on the content mode corresponding to the memo content, extract information from the memo content to obtain key information, wherein the information extraction method is different under different content modes.

Among them, the key information may include core summaries such as the attributes and themes of the memo content, as well as memo temporal and spatial information such as timestamps and climate, and information such as the intent type and triggering method determined based on the analysis of the memo content.

When the memo content input by the user is a combination of multiple modalities, such as website text + screenshot, user command text + photo, song audio + title text, etc., the terminal extracts information of a single modality for memo contents of different modalities respectively, obtains sub-key information of the memo content of each modality, and then merges them to obtain key information.

Step 203: store the memo content and key information in association with each other.

The key information obtained in step 202 includes different data forms, such as text, entity relationship pairs, image region coordinates, pixel values, spectrograms, timestamps, temperature and humidity, altitude, longitude and latitude, etc. Key information in different data forms corresponds to different dimensions of the memo content.

In a possible implementation, as shown in FIG3 , based on the correspondence between the key information and the memo content, the terminal constructs the obtained key information into a semi-structured Key-value data structure, such as a dictionary, a Hashmap, etc. Table 1 illustrates the basic forms of storing key information through semi-structured data by enumeration.

Table I

Step 204: When it is determined based on the key information that the memo reminder triggering condition is met, a memo reminder is made based on the memo content.

Among them, based on the different intent types corresponding to the memo content, the memo reminder method can be an active reminder, that is, actively making a memo reminder based on satisfying the memo reminder trigger condition, wherein the memo reminder method can be to display a reminder message in a message notification box, or to emit a reminder audio or to combine vibration, etc., which is not limited in this application. The terminal can also make a memo reminder in a passive reminder manner, that is, the terminal uses the user's query input as the memo reminder trigger condition, and only makes a memo reminder based on the corresponding memo content after obtaining the user's query input.

The feedback form of the terminal's memo reminder can be the original file stored in the associated storage, such as the user's original voice, or the memo information obtained by the terminal processing the original file, such as text-to-speech or a visual notification displayed on the terminal.

In summary, in the embodiment of the present application, based on the memo recording voice command input by the user, the terminal obtains multi-modal and multi-dimensional information such as text, vision, hearing, situational information, and time and space information through a variety of sensors to form the memo content, while being compatible with multi-modal data, the richness of the memo content is improved. On the basis of obtaining the memo content, the embodiment of the present application determines the intention type of the memo content through information extraction, and then automatically determines the triggering method of the memo reminder, and performs a memo reminder when the memo reminder triggering conditions are met; the embodiment of the present application expands support for the input and output of multi-modal information, improves the way to help users remember, and thus improves the efficiency and quality of human-computer interaction.

In some embodiments, based on the content modality corresponding to the memo content, information is extracted from the memo content to obtain key information, including:

When the content modality corresponding to the memo content includes a text modality, natural language processing is performed on the memo content to obtain key information of the text;

When the content modality corresponding to the memo content includes a visual modality, performing image recognition processing on the memo content to obtain key image information;

In some embodiments, when the content modality corresponding to the memo content includes an auditory modality, audio recognition processing is performed on the memo content to obtain audio key information.

Performing natural language processing on the memo content to obtain key text information includes at least one of the following methods:

Perform named entity recognition on the memo content to obtain entity information;

Extract entity relationships from the memo content to obtain entity relationship information;

Extract the subject summary of the memo content to obtain the subject information;

Perform text intent recognition on the memo content to obtain the intent type, which is used to characterize the intent of recording the memo;

A causal inference analysis is performed on the memo content to obtain trigger mode information, wherein the trigger mode includes active triggering and passive triggering, and when the trigger mode indicated by the trigger mode information is active triggering, the trigger mode information includes active triggering conditions.

In some embodiments, the key information includes text key information, and the text key information includes the trigger mode information;

When the memo reminder triggering condition is met based on the key information, a memo reminder is made based on the memo content, including:

When the trigger mode indicated by the trigger mode information is active triggering and the active triggering conditions are met, an active memo reminder is performed based on the memo content;

When the trigger mode indicated by the trigger mode information is a passive trigger and the key information matches the reminder instruction, a passive memo reminder is performed based on the memo content.

In some embodiments, image recognition processing is performed on the memo content to obtain key image information, including:

When the memo content is a picture, optical character recognition is performed on the picture to obtain picture text; and/or, image natural language description processing is performed on the picture to obtain picture description text;

When the memo content is a video, the video is understood to obtain a video description text;

The method further includes:

Natural language processing is performed on at least one of the picture text, the picture description text, and the video description text to obtain text key information.

In some embodiments, audio recognition and extraction are performed on the memo content to obtain audio key information, including:

Performing automatic speech recognition on the memo content to obtain an audio text; and/or, performing audio feature extraction on the memo content to obtain an audio fingerprint;

The method further includes:

Perform natural language processing on audio text to obtain key information of the text.

In some embodiments, after obtaining the memo content, the method further includes:

Get the extended content corresponding to the memo content;

The associated storage of the memo content and key information includes:

The memo content, key information and extended content are stored in an associated manner.

In some embodiments, obtaining extended content corresponding to the memo content includes:

When the content mode of the memo content is a text mode and the amount of information of the key information is less than an information amount threshold, obtaining at least one of auditory extended content or visual extended content;

When it is determined based on the key information that the memo reminder triggering condition is met, a memo reminder is performed based on the memo content, including:

When it is determined based on the key information that the memo reminder triggering condition is met, a memo reminder is performed based on the memo content and the extended content.

When the memo content is time-limited, obtain the extended content corresponding to the memo content;

When it is determined based on the key information that the memo reminder triggering condition is met, a memo reminder is made based on the memo content, including:

When it is determined based on the key information that the memo reminder triggering condition is met and the memo content is valid, a memo reminder is made based on the memo content;

The method further includes:

When it is determined based on the key information that the memo reminder triggering condition is met and the memo content is invalid, a memo reminder is performed based on the extended content.

In some embodiments, the method further comprises:

When there is a need for memo recording, obtain spatiotemporal information, which is used to represent the time when the memo is recorded. time and space status;

The memo content and key information are stored in association, including:

The memo content, key information and time and space information are stored in association;

When it is determined that the memo reminder triggering condition is met based on the key information and the time-space information, a memo reminder is made based on the memo content.

In some embodiments, after extracting information from the memo content based on the content modality corresponding to the memo content and obtaining key information, the method further includes:

Vectorize and encode the key information to obtain a key information vector;

The memo content and key information vector are stored in association;

When a reminder instruction is received, vector encoding is performed on the reminder instruction to obtain an instruction vector;

When the vector similarity between the instruction vector and the key information vector is greater than a threshold, it is determined that a memo reminder triggering condition is met, and a memo reminder is performed based on the memo content.

In some embodiments, when a reminder instruction is received, vector encoding is performed on the reminder instruction to obtain an instruction vector, including:

When a reminder instruction is received, information of the reminder instruction is extracted based on the content mode corresponding to the reminder instruction to obtain key information of the instruction;

The key information of the instruction is vectorized and encoded to obtain an instruction vector.

In some embodiments, when there is a need to record a memo, obtaining the memo content includes:

When receiving a memo recording voice instruction, obtaining the memo content indicated by the memo recording voice instruction; or,

When the user behavior information is obtained and the user behavior information meets the memo condition, the memo content is determined based on the user behavior information.

In some embodiments, when it is determined based on the key information that the memo reminder triggering condition is met, after the memo reminder is made based on the memo content, the method further includes:

In response to the memo deletion instruction, the memo content is deleted.

In daily life, the information that users need to remember is not limited to text information, but also includes a large amount of visual information such as image information, video information, and auditory information such as music information and voice information. That is, when users use the terminal to help remember, the information content that needs to be stored is often multimodal. Compared with the prior art that only supports single modal input during the memo process, in the embodiment of the present application, the terminal supports the acquisition of multimodal memo content, and further, based on the memo content of different content modes, the terminal adopts a corresponding key information extraction method to determine the key information in the memo content for storage, so as to improve the human-computer interaction efficiency in the memo reminder scenario. As shown in Figure 3, the method for extracting information from the memo content can be a combination of any one or more of the following:

1. When the content modality corresponding to the memo content includes text modality, natural language processing (NLP) is performed on the memo content to obtain key information of the text.

In a possible implementation, as shown in FIG3 , the terminal needs to understand the meaning of the memo content through NLU (Natural Language Understanding). The terminal first performs Named Entity Recognition (NER) on the memo content to obtain entity information. Among them, the predefined entity types may include time, location, name, object, and may also include currency, organization, etc. This application does not limit this. When the memo content in text mode is obtained, the terminal performs entity recognition and annotation on the memo text based on the entity type. For example, based on the memo content "Meet Li Ming at 3 o'clock tomorrow in Zhongshan Park", the terminal obtains the recognition result through NER: [Time] Meet [Name] Li Ming at [Location] Zhongshan Park at 3 o'clock tomorrow. It should be noted that the method for implementing NER is not limited in the embodiment of this application.

Furthermore, the terminal performs entity and relation extraction (ERE) on the memo content and obtains To entity relationship information. Based on the named entity recognition results of the memo content obtained by NER, the terminal performs entity extraction and relationship extraction, and simplifies the memo content into core entity relationships for text analysis. For example, based on the memo content of "I put the key in the desk drawer", the terminal obtains the recognition result through NER: I put [item] in [location] desk drawer. Based on the above recognition result, the terminal performs entity relationship extraction and obtains the entity relationship information: key [location] desk drawer.

Correspondingly, the terminal can also extract the subject summary (Text Summarization) of the memo content to obtain the subject information. In this application, the method of obtaining the text summary of the memo content is different. The main information can be an extractive summary (Extractive Summarization) or a generative summary (Abstractive Summarization), which is not limited in this application. Schematically, based on the memo content of "having dinner with Zhang San tomorrow night", the terminal can determine that its text body is "eating" through subject summary extraction.

On the basis of determining the text content of the memo, the terminal further performs text intent recognition on the memo content to obtain the intent type, which is used to characterize the intention of recording the memo. For different types of memo content, users have different storage intentions. For example, when the memo content is flight information, the user expects to receive a terminal reminder at the corresponding time, and when the memo content is product shopping information, the user expects to query it in the future. The terminal can determine the intent type of the memo content based on text classification (TC) technology. Among them, the intent type can include schedule, reminder, and memo. By classifying the memo content into different intent types, the terminal can provide a basis for the subsequent terminal to determine the memo reminder method.

Illustratively, when the memo content is "Remember to call mom to take medicine at 9 o'clock in the evening", based on the conditional prompt content contained in the memo content, the terminal can determine that the user expects to be prompted when the corresponding conditions are met based on the memo content, and then the terminal can determine that its intent type is a reminder; based on the memo record voice command of "Please remember this flight for me", the terminal obtains the memo content of "The XXX flight from Chengdu to Beijing you purchased on March 1st 8:45-11:20 has been issued", and then the terminal determines its intent type as a schedule through text intent recognition; based on the memo content of "My power-on password is XXX", the terminal determines its intent type as a memo.

The terminal obtains the trigger mode information by performing causal inference (CI) analysis on the memo content, that is, the key information includes the text key information, and the text key information includes the trigger mode information. Among them, the trigger mode includes active triggering and passive triggering, and when the trigger mode indicated by the trigger mode information is active triggering, the trigger mode information includes active triggering conditions. The trigger mode information includes at least the trigger mode, and the trigger mode corresponds to the intention type of the memo content. Schematically, as shown in FIG5, based on the memo content of "If Changjiang Road is congested, remind me to ride an electric bike to work", the terminal determines that the intention type based on the memo content is reminder, and then determines that its trigger mode is active triggering, and extracts the traffic condition entity "congestion" in the memo content as the trigger mode information; as shown in FIG6, based on the memo recording voice instruction of "Help me collect this skirt", the terminal obtains the product link corresponding to the product as the memo content, and then based on the intention type of the memo content is memo, the terminal determines that its trigger mode is passive triggering, and only when the user enters the reminder instruction, the terminal performs a memo reminder based on the memo content.

It should be noted that the above-mentioned various NLP processing, such as NER, ERE, CI, etc., are carried out simultaneously in the process of extracting key information from the memo content by the terminal. The processing results obtained based on the above-mentioned processing constitute key information. In a possible situation, the processing result of a certain NLP processing may be empty, which has no effect on the memo reminder process.

2. When the content modality corresponding to the memo content includes a visual modality, image recognition processing is performed on the memo content to obtain key image information. Further, the terminal performs natural language processing on at least one of the image text, the image description text, and the video description text to obtain key text information. The terminal performs natural language processing on the above text information in a manner that can be a combination of any one or more sub-methods in method 1.

In the case where the memo content is a picture, in a possible implementation, the terminal performs optical character recognition (OCR) on the picture to obtain the picture text. For picture information with text content as the main content, the terminal can convert the text symbols therein into text information through OCR technology, and can further extract text key information based on the obtained text information to obtain the key information contained in the image.

In another possible implementation, the terminal performs natural language description processing (Image Caption, IC) to obtain the picture description text. For picture information with picture content as the main content, the terminal converts the image into natural language describing the image content so as to determine the information contained in the image. Schematically, in the case where the memo content is a picture as shown in FIG7, the terminal obtains the following text description through IC technology: a boy with a travel bag is traveling. Further, the terminal extracts the key information of the text as described in method 1 and determines that the theme of the picture is "travel".

It should be noted that for the same picture, the terminal can perform optical character recognition to obtain the text in the picture, and also perform natural language description processing to further enrich the key information corresponding to the picture. It should also be noted that for the picture, the terminal can use visual grounding (VG) technology to locate the description subject in the picture based on the picture and the picture description text, and obtain the location area information of the picture subject. For example, in the case where the memo content is the picture shown in Figure 7, based on the picture and the text description, the terminal determines the regional location information of the subject "boy" in the picture as part of the target result through VG technology.

In the case where the memo content is a video, the terminal performs video understanding on the video to obtain a video description text. Among them, the technologies adopted for video understanding may include but are not limited to video scene recognition, video action understanding, and video event understanding. Through video understanding, the terminal expresses the video content in the form of natural language text, that is, the information contained in the memo content is reflected through the video description text. Furthermore, the terminal can process the video description text by extracting key information from the text to further clarify the video information, so as to facilitate the subsequent memo reminder to the user based on the memo content.

3. When the content modality corresponding to the memo content includes an auditory modality, audio recognition processing is performed on the memo content to obtain audio key information.

In a possible implementation, the terminal performs automatic speech recognition on the memo content to obtain an audio text. That is, when the user inputs the memo content in the form of a memo recording voice command, the terminal converts the voice information into a natural language text, that is, an audio text, through ASR, and further, the terminal performs natural language processing on the audio text to obtain text key information. Among them, the natural language processing method can be any one or more combinations of the processing methods in method 1.

In another possible implementation, the terminal extracts audio features from the memo content to obtain an audio fingerprint. The terminal can extract digital features from a segment of audio through audio fingerprinting technology and represent them through identifiers, thereby obtaining information contained in the audio memo content. Optionally, the terminal can also calculate the spectrogram of the audio file, that is, the frequency information of the audio in the time domain. Based on the spectrogram or audio fingerprint, the terminal supports users to use audio (such as a humming) as a reminder instruction to query the memo content.

Based on the triggering mode information in the key text information, the terminal can be triggered actively or passively for reminder.

In a possible implementation, the intent type corresponding to the memo content belongs to the schedule type or reminder type, and the trigger mode information of the memo content indicates that the trigger mode is active triggering. When the active triggering conditions are met, the terminal performs an active memo reminder based on the memo content.

Compared with the related art, the terminal only actively reminds the user based on the time information and location information in the memo content. In the embodiment of the present application, the memo reminder trigger condition may include any one or more combinations of all the entity information in the memo content, that is, the terminal can use the time, location, climate, etc. in the entity information as active trigger conditions, or can use events, traffic conditions, etc. in the entity information as active trigger conditions, which increases the richness of the reminder scenarios and improves the human-computer interaction experience.

Illustratively, when the memo content is "If Changjiang Road is congested, remind me to ride an electric bike to work", through intent recognition and causal inference, the terminal determines that the intent category of the memo content is a reminder, and takes an active triggering approach to make a memo reminder, wherein the triggering method information includes an active triggering condition [traffic conditions], that is, "Changjiang Road is congested", and then when it is detected that the traffic conditions meet the active triggering conditions, the terminal actively makes a memo reminder.

Optionally, the terminal can make a memo reminder to an object other than the user who performs the memo operation. That is, compared with the prior art in which the terminal only makes a memo reminder to the user who uses the terminal, in the embodiment of the present application, the user can add the memo reminder object information in the memo content. For example, as shown in FIG8, the user inputs the memo content as "Remind mom to take medicine at 9 o'clock in the evening" through the memo recording voice command, and then the terminal can determine that the memo reminder object indicated by the key information is the user through information extraction. and actively remind the mother when the active trigger condition "9 o'clock in the evening" is met.

In another possible implementation, the intent type corresponding to the memo content belongs to the schedule category or the memo category, and the trigger mode information of the memo content indicates that the trigger mode is passive triggering. When the key information matches the reminder instruction, the terminal performs a passive memo reminder based on the memo content.

Among them, the memo reminder triggering condition corresponding to the passive trigger can also be multimodal, that is, the user can query based on the text corresponding to the voice command, or based on multimodal or multimodal combination reminder instructions such as picture information, audio information, web page links, etc. In the case of semi-structured data based on the storage content consisting of multimodal information, the terminal can enrich the freedom of user query operations by supporting multimodal queries, so that users can complete queries in a way that is easy to express, which fits the user's intuitive feelings, and in the case where the memo content mode is the same as the reminder instruction mode, the memo reminder efficiency can be improved and the human-computer interaction experience can be improved. Schematically, the user can retrieve the corresponding song audio in the memo content based on a humming (audio information), or the user can query the corresponding product information in the memo content based on a clothing picture (image information).

As shown in FIG9 , for the passively triggered memo content, the terminal responds to the user's wake-up command to start, and retrieves and feedbacks based on the reminder command input by the user. Based on the fact that the reminder command can be multimodal, similar to the way the terminal processes the memo content, the terminal extracts information from the reminder command, obtains the key information of the command, and then constructs semi-structured data such as a dictionary to store the key information of the command. The information extraction and associated storage method are the same as those in the above embodiment, and will not be repeated here.

When the key information of the instruction is determined, the terminal compares and matches the key information of the instruction with the key information corresponding to the memo content, and feeds back the key information with the highest relevance to the user as the query result to complete the memo reminder. Among them, based on the different modalities of the key information, the basis for comparing the relevance is different. The relevance can be text similarity, image similarity, spectrogram similarity, etc., and the above matching process is carried out between two key information dictionaries, each of which contains multiple key-value pairs, and then the number of similar values in the two dictionaries needs to be introduced to determine the similarity during matching.

Illustratively, based on the prompt instruction "Where did I put my keys?" the terminal determines that the key information of the instruction is "key", and then the terminal matches the key information dictionary corresponding to the memo content to obtain a dictionary whose attributes include "key", and feeds back the original data of the obtained dictionary to the user "I put my keys in the desk drawer".

In a possible implementation, when there is a need for memo recording, the terminal obtains spatiotemporal information, which is used to characterize the time and space state when the memo is recorded. The spatiotemporal information may include the timestamp, current location, altitude, temperature and humidity, and climate information corresponding to the memo recording need. Spatiotemporal information is important information for enriching the memo content, which can improve query support, provide query tags for subsequent users to query based on the memo content, and improve the efficiency of memo reminders.

Further, in the case of the presence of spatiotemporal information, the terminal associates and stores the memo content and key information with the spatiotemporal information. Accordingly, in the case of the presence of spatiotemporal information in the memo content, the terminal determines that the memo reminder triggering condition is met based on the key information and spatiotemporal information, and performs a memo reminder based on the memo content. For example, as shown in FIG10, based on the memo content of "The Summer Resort is really magnificent", the terminal obtains the spatiotemporal information such as the timestamp, temperature and humidity, and altitude of the memo time while obtaining the scenic spot photos, location coordinates, videos, and text descriptions, and constructs a key information dictionary for associated storage through information extraction. When the reminder instruction "Where did I go to play when it was the hottest last year" is obtained, the terminal extracts the instruction key information through the instruction information: [time] last year, [temperature] hottest, [theme] travel, and then based on the spatiotemporal information in the memo content to meet the memo reminder triggering condition, the terminal feedbacks the text "The Summer Resort is really magnificent" and memo contents such as pictures and videos to the user. Acquiring spatiotemporal information as the memo content not only facilitates the user to query the memo content based on the reminder instruction, but also improves the accuracy of the feedback result.

When using the memo recording voice command to store the memo content, the user usually inputs the memo content in the form of daily communication that the user is accustomed to. The input content is often single and incomplete, so there is a situation where the memo content cannot meet their memo needs, which affects the subsequent memo reminder experience. In the embodiment of the present application, based on the terminal can support multimodal memo content, on the basis of extracting information from the memo content input by the user, the terminal obtains the extended content corresponding to the memo content, and enriches the memo content through the extended content, so as to improve the user's human-computer interaction experience in the memo reminder scenario.

When the content mode of the memo content is text mode and the information volume of the key information is less than the information volume threshold, the terminal obtains at least one of the auditory extended content or the visual extended content.

In one possible implementation, the memo content input by the user only includes text mode, and the terminal extracts information from the text memo content to obtain key information. When the amount of key information is less than the information threshold, that is, when the number of values in the key-value pair is small, in a passive triggering scenario, the reminder instructions that the user can use to query the corresponding memo content are limited, and thus there is a situation where the user cannot obtain the required memo content based on the key information dictionary. In an embodiment of the present application, the terminal obtains the corresponding extended content of the memo content based on the recording scenario, wherein the extended content can be a web page snapshot, screenshot, or target area. Visual extended content such as pixels can also be auditory extended content such as ambient sound and music, providing richer information for memo reminders.

It should be noted that when obtaining multimodal extended content, the terminal needs to use sensors involving user privacy such as cameras and recorders. Therefore, in the scenario of obtaining extended content, the terminal can set active inquiries and collect corresponding information after obtaining positive feedback from the user. For example, the user records a voice instruction through the memo to indicate "please help me write down the delicious stir-fried dishes of Xiao Ming's home cooking." Based on the small amount of information in the memo content, the terminal can remind the user to take a photo of the store sign through voice or visual display, obtain visual extended content, and enrich the memo information. Optionally, the terminal can also be set to automatically turn on the sensor in response to user instructions.

Optionally, the extended content may also include location information, etc. For example, the user indicates in the memo recording voice command "Please write down for me how delicious the stir-fried dishes of Xiao Ming's home cooking are", and the terminal obtains the text modal memo content. To enrich the memo information, while reminding the user to take a photo of the store to obtain visual extended content, the terminal can correspondingly obtain the location coordinates of "Xiao Ming's Home Cooking" and the user's travel routes as extended content.

Furthermore, the terminal stores the memo content, key information and extended content in an associated manner. Since the extended content can be multimodal, the terminal can still construct semi-structured data in the form of key-value pairs to store the extended content in an associated manner. Table 2 lists the methods of storing the extended content in an associated manner.

Table II

Correspondingly, when it is determined based on the key information that the memo reminder triggering condition is met, a memo reminder is performed based on the memo content and the extended content. The memo reminder method is the same as the above embodiment and will not be described in detail here.

In applications, web links such as product links in shopping applications or public account content links have a certain timeliness. For example, for product links, after the merchant removes the product from the shelves, the network link stored by the user in the form of memo content will become invalid immediately, and then in the subsequent query process, the user cannot obtain relevant product information based on the memo content, affecting the user experience. In the embodiment of the present application, the terminal can enrich the information related to the memo content by obtaining extended content to avoid the situation where the query is fruitless.

In a possible implementation, when the memo content is time-sensitive, the terminal obtains the extended content corresponding to the memo content. The terminal first extracts information from the obtained memo content, and when it is determined that the content modality corresponding to the memo content is time-sensitive, for example, the memo content is a URL, an online product, etc., the terminal obtains the corresponding extended content based on the key information.

Schematically, as shown in FIG. 11 , in the case of a memo reminder based on a memo recording voice command, the terminal obtains the user's memo recording voice command "help me write down this skirt" in response to the user's arousal voice, and the terminal processes the memo recording voice command through NLP and other methods. The method determines that the memo content indicated by the memo record voice command is the product information that the user is browsing, and then the terminal obtains the current product link and captures the current product image through a screenshot tool, or obtains the product image through the page if the image can be saved, and obtains the current product introduction text "Spring and Autumn New Dress". The above product link, product image, and description text are all the memo contents obtained by the terminal.

Further, if the memo reminder triggering condition is met based on the key information and the memo content is valid, the terminal performs a memo reminder based on the memo content. The memo reminder method is the same as the above embodiment and will not be described in detail here.

Accordingly, when it is determined based on the key information that the memo reminder triggering conditions are met and the memo content is invalid, the terminal performs a memo reminder based on the extended content. In the case where the memo reminder can be implemented based on the extended content, the terminal uses the extended content or the processed extended content as feedback to perform a memo reminder. For example, when the user enters the reminder command "view the collected dresses", the corresponding online product link as the memo content is invalid, and the terminal stores the picture corresponding to the online link as the extended content, and then the terminal can feedback the picture to the user to complete the memo reminder.

Optionally, when the memo reminder cannot be realized based on the extended content, the terminal can perform an online search based on the web page screenshots, snapshots, etc. in the extended content, and feedback similar results obtained from the online search to the user to complete the memo reminder, thereby avoiding the situation where the user searches for the memo content but cannot obtain the search results, thereby improving the human-computer interaction experience.

When multimodal memo content is reminded, the data information between different modes is heterogeneous. Therefore, when semi-structured data is constructed based on multimodal memo content for associated storage, there is a problem of complex and changeable data format, which affects the input and output efficiency of the memo reminder process. In an embodiment of the present application, the terminal can achieve unified representation of memo content of different content modes through multimodal fusion, thereby improving the efficiency of memo reminders.

During the input stage of the memo content, the terminal vectorizes the key information to obtain the key information vector. Among them, the method of vectorization encoding can be the multimodality fusion technology (MFT) in deep learning. The terminal uses deep neural networks (DNN) and multimodal pre-training models and other technologies to convert multimodal key information in semi-structured data into vectors in high-dimensional space, realize the unified representation of key information, and provide convenience for users to obtain the memo content through passive triggering, that is, through reminder instructions.

Furthermore, when the key information vector is obtained, the terminal stores the memo content and the key information vector in association. The terminal can store the key information vector in the form of a key-value pair to obtain a key information dictionary, where the keyword can be a feature vector and the value is the key information vector obtained by vectorization encoding.

Correspondingly, in the memo reminder stage, when the terminal receives the reminder instruction, it vectorizes and encodes the reminder instruction to obtain an instruction vector.

In a possible implementation, when receiving a reminder instruction, the terminal extracts information from the reminder instruction based on the content mode corresponding to the reminder instruction to obtain key instruction information. The way in which the terminal extracts information from the reminder instruction is the same as the above-mentioned way of extracting information from the memo content, which will not be repeated here. When obtaining the key information of the instruction, the terminal vectorizes the key information of the instruction to obtain an instruction vector. The way in which the terminal vectorizes the key information of the instruction is the same as the above-mentioned embodiment, which will not be repeated here.

Furthermore, when the vector similarity between the instruction vector and the key information vector is greater than a threshold, it is determined that the memo reminder trigger condition is met, and a memo reminder is performed based on the memo content. When the instruction vector is determined, the terminal can compare the instruction vector with the key information vector in the key information dictionary, that is, calculate the cosine distance between the instruction vector and the key information vector in the high-dimensional space. When the vector similarity represented by the cosine distance is greater than the vector similarity threshold, the terminal determines that the memo reminder trigger condition is met, and then feeds back the corresponding memo content to the user.

It should be noted that after the passive trigger is completed based on the memo content, if the user believes that the memo content does not need to be retained, the terminal deletes the memo content in response to the memo deletion instruction. For example, the user performs a passive trigger through the reminder instruction "Where did I put my keys?", and the terminal gives the user feedback "I put them in the desk drawer" based on the key information. If the user believes that the memo content can be deleted, the user can voice input "Delete this memo" to notify the terminal to delete it.

Optionally, the terminal can also remind the user to delete the expired information based on the timeliness information in the key information, so as to remind the user to delete the expired information to save storage space. For example, for flight information used as a memo content, if the flight time has passed, the terminal will remind the user of the flight expiration and actively prompt the user to delete the memo content.

Optionally, based on the embodiment of the present application, by determining the intention type of the memo content, different trigger mode information is assigned to the memo content. For the memo content whose trigger mode indicated by the trigger mode information is active triggering, after the terminal actively pulls up to complete the memo reminder, the memo content often loses its storage value. Accordingly, the terminal can remind the user to delete the memo after completing the memo reminder, and delete the memo content in response to the user's positive feedback.

It should be noted that the information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data used for analysis, stored data, displayed data, etc.) and signals involved in this application are all authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions. For example, the voice, itinerary information, geographic location, etc. involved in this application are all obtained with full authorization.

Please refer to FIG. 12 , which shows a structural block diagram of a memo reminder device provided by an exemplary embodiment of the present application, the device comprising:

The acquisition module 1201 is used to acquire the memo content when there is a memo recording requirement;

An information extraction module 1202 is used to extract information from the memo content based on the content mode corresponding to the memo content to obtain key information, wherein different information extraction methods are used in different content modes;

The storage module 1203 is used to store the memo content and the key information in association;

The memo reminder module 1204 is configured to make a memo reminder based on the memo content when it is determined based on the key information that a memo reminder triggering condition is met.

Optionally, the information extraction module 1202 is further used to:

In a case where the content modality corresponding to the memo content includes a text modality, performing natural language processing on the memo content to obtain text key information;

In a case where the content modality corresponding to the memo content includes a visual modality, performing image recognition processing on the memo content to obtain image key information;

In the case that the content modality corresponding to the memo content includes an auditory modality, audio recognition processing is performed on the memo content to obtain audio key information.

Optionally, when natural language processing is performed on the memo content to obtain key text information, the information extraction module 1202 is further used to:

Performing named entity recognition on the memo content to obtain entity information;

Extracting entity relationships from the memo content to obtain entity relationship information;

Extracting a subject summary from the memo content to obtain subject information;

Performing text intent recognition on the memo content to obtain an intent type, where the intent type is used to characterize the intent of recording the memo;

Perform causal inference analysis on the memo content to obtain trigger mode information, wherein the trigger mode includes active triggering and passive triggering, and when the trigger mode indicated by the trigger mode information is active triggering, the trigger mode information contains active triggering conditions.

Optionally, when the key information includes the text key information, and the text key information includes the trigger mode information, the memo reminder module 1204 is further used to:

When the trigger mode indicated by the trigger mode information is active triggering and the active triggering condition is met, an active memo reminder is performed based on the memo content;

Optionally, when image recognition processing is performed on the memo content to obtain image key information, the information extraction module 1202 is further used to:

In the case where the memo content is a picture, performing optical character recognition on the picture to obtain picture text; and/or performing image natural language description processing on the picture to obtain picture description text;

When the memo content is a video, performing video understanding on the video to obtain a video description text;

Natural language processing is performed on at least one of the picture text, the picture description text, and the video description text to obtain the text key information.

Optionally, when audio recognition and extraction are performed on the memo content to obtain audio key information, the information extraction module 1202 is further used to:

Performing automatic speech recognition on the memo content to obtain an audio text; and/or performing audio feature extraction on the memo content to obtain an audio fingerprint;

Perform natural language processing on the audio text to obtain key information of the text.

Optionally, the acquisition module 1201 is further used to:

Obtaining the extended content corresponding to the memo content;

The storage module 1203 is further used for:

The memo content, the key information and the extended content are stored in association.

Optionally, the acquisition module 1201 is further used to:

When the content mode of the memo content is a text mode and the information amount of the key information is less than an information amount threshold, obtaining at least one of auditory extended content or visual extended content;

The memo reminder module 1204 is also used for:

In the case where it is determined based on the key information that the memo reminder triggering condition is met, a memo reminder is performed based on the memo content and the extended content.

Optionally, the acquisition module 1201 is further used to:

In the case where the memo content is time-limited, obtaining the extended content corresponding to the memo content;

The memo reminder module 1204 is also used for:

If it is determined based on the key information that a memo reminder triggering condition is met and the memo content is valid, a memo reminder is made based on the memo content;

When it is determined based on the key information that a memo reminder triggering condition is met and the memo content is invalid, a memo reminder is performed based on the extended content.

Optionally, the acquisition module 1201 is further used to:

When there is a need for memo recording, obtaining time and space information, wherein the time and space information is used to represent the time and space state when the memo is recorded;

The storage module 1203 is further used for:

storing the memo content, the key information and the time and space information in association with each other;

The memo reminder module 1204 is also used for:

When it is determined based on the key information and the spatiotemporal information that a memo reminder triggering condition is met, a memo reminder is performed based on the memo content.

Optionally, the device further includes an encoding module, configured to perform vector encoding on the key information to obtain a key information vector;

The storage module 1203 is further used for:

storing the memo content and the key information vector in association with each other;

The encoding module is further used for:

When receiving a reminder instruction, vectorize and encode the reminder instruction to obtain an instruction vector;

The memo reminder module 1204 is also used for:

Optionally, when the reminder instruction is vectorized and encoded to obtain an instruction vector, the information extraction module 1202 is further used to:

When the reminder instruction is received, extracting information from the reminder instruction based on the content mode corresponding to the reminder instruction to obtain key instruction information;

The encoding module is further used for:

The key information of the instruction is vectorized and encoded to obtain the instruction vector.

Optionally, the acquisition module 1201 is further used to:

In case of receiving a memo recording voice instruction, obtaining the memo content indicated by the memo recording voice instruction; or,

When the user behavior information is acquired and the user behavior information satisfies the memo condition, the memo content is determined based on the user behavior information.

Optionally, the device further comprises a deletion module, configured to delete the memo content in response to a memo deletion instruction.

In summary, in the embodiment of the present application, the terminal uses an acquisition module to acquire multimodal and multidimensional information such as text, vision, hearing, situational information, and spatiotemporal information through a variety of sensors to form the memo content. While being compatible with multimodal data, it improves the richness of the memo content. In terms of processing the memo content, the embodiment of the present application performs all-round information extraction through the information extraction module and associates and stores it through the storage module. The terminal automatically determines the triggering method of the memo reminder based on the determination of the intention type of the memo content. For passively triggered memo content, when the user passively triggers the memo reminder through the reminder instruction, it can also be triggered through multimodal input. The embodiment of the present application improves the way to help users remember by expanding the support for the input and output of multimodal information, thereby improving the efficiency and quality of human-computer interaction.

The embodiment of the present application further provides a computer-readable storage medium, which stores at least one program, and the at least one program is used to be executed by a processor to implement the memo reminder method as described in the above embodiment.

The embodiment of the present application provides a computer program product or a computer program, which includes a computer instruction stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the memo reminder method provided in the above embodiment.

Those skilled in the art should be aware that in one or more of the above examples, the functions described in the embodiments of the present application can be implemented with hardware, software, firmware, or any combination thereof. When implemented using software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on a computer-readable medium. Computer-readable media include computer storage media and communication media, wherein the communication media include any media that facilitates the transmission of a computer program from one place to another. The storage medium can be any available medium that a general or special-purpose computer can access.

The above description is only an optional embodiment of the present application and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application shall be included in the protection scope of the present application.

Claims

A memo reminder method, the method being executed by a terminal, the method comprising:

If there is a need for memo recording, obtain the memo content;

Based on the content mode corresponding to the memo content, extract information from the memo content to obtain key information, wherein the information extraction method is different under different content modes;

storing the memo content and the key information in association with each other;

When it is determined based on the key information that the memo reminder triggering condition is met, a memo reminder is performed based on the memo content.
The method according to claim 1, wherein the extracting information from the memo content based on the content modality corresponding to the memo content to obtain key information comprises:

In a case where the content modality corresponding to the memo content includes a text modality, performing natural language processing on the memo content to obtain text key information;

In a case where the content modality corresponding to the memo content includes a visual modality, performing image recognition processing on the memo content to obtain image key information;

In the case that the content modality corresponding to the memo content includes an auditory modality, audio recognition processing is performed on the memo content to obtain audio key information.
The method according to claim 2, wherein the performing natural language processing on the memo content to obtain text key information comprises at least one of the following methods:

Performing named entity recognition on the memo content to obtain entity information;

Extracting entity relationships from the memo content to obtain entity relationship information;

Extracting a subject summary from the memo content to obtain subject information;

Performing text intent recognition on the memo content to obtain an intent type, where the intent type is used to characterize the intent of recording the memo;

Perform causal inference analysis on the memo content to obtain trigger mode information, wherein the trigger mode includes active triggering and passive triggering, and when the trigger mode indicated by the trigger mode information is active triggering, the trigger mode information contains active triggering conditions.
The method according to claim 3, wherein the key information includes the text key information, and the text key information includes the trigger mode information;

In the case where it is determined based on the key information that a memo reminder triggering condition is satisfied, performing a memo reminder based on the memo content includes:

When the trigger mode indicated by the trigger mode information is active triggering and the active triggering condition is met, an active memo reminder is performed based on the memo content;

When the trigger mode indicated by the trigger mode information is a passive trigger and the key information matches the reminder instruction, a passive memo reminder is performed based on the memo content.
The method according to claim 2, wherein the step of performing image recognition processing on the memo content to obtain image key information comprises:

In the case where the memo content is a picture, performing optical character recognition on the picture to obtain picture text; and/or performing image natural language description processing on the picture to obtain picture description text;

When the memo content is a video, performing video understanding on the video to obtain a video description text;

The method further comprises:

Natural language processing is performed on at least one of the picture text, the picture description text, and the video description text to obtain the text key information.
The method according to claim 2, wherein the step of performing audio recognition and extraction on the memo content to obtain audio key information comprises:

Performing automatic speech recognition on the memo content to obtain an audio text; and/or performing audio feature extraction on the memo content to obtain an audio fingerprint;

The method further comprises:

Perform natural language processing on the audio text to obtain key information of the text.
The method according to claim 1, wherein after obtaining the memo content, the method further comprises:

Obtaining the extended content corresponding to the memo content;

The associative storage of the memo content and the key information includes:

The memo content, the key information and the extended content are stored in association.
The method according to claim 7, wherein the step of obtaining the extended content corresponding to the memo content comprises:

When the content mode of the memo content is a text mode and the information volume of the key information is less than an information volume threshold, obtaining at least one of auditory extended content or visual extended content;

In the case where it is determined based on the key information that a memo reminder triggering condition is satisfied, performing a memo reminder based on the memo content includes:

In the case where it is determined based on the key information that the memo reminder triggering condition is met, a memo reminder is performed based on the memo content and the extended content.
The method according to claim 7, wherein the step of obtaining the extended content corresponding to the memo content comprises:

In the case where the memo content is time-limited, obtaining the extended content corresponding to the memo content;

In the case where it is determined based on the key information that a memo reminder triggering condition is satisfied, performing a memo reminder based on the memo content includes:

If it is determined based on the key information that a memo reminder triggering condition is met and the memo content is valid, a memo reminder is made based on the memo content;

The method further comprises:

When it is determined based on the key information that a memo reminder triggering condition is met and the memo content is invalid, a memo reminder is performed based on the extended content.
The method according to claim 1, wherein the method further comprises:

When there is a need for memo recording, obtaining time and space information, wherein the time and space information is used to represent the time and space state when the memo is recorded;

The associative storage of the memo content and the key information includes:

storing the memo content, the key information and the time and space information in association with each other;

In the case where it is determined based on the key information that a memo reminder triggering condition is satisfied, performing a memo reminder based on the memo content includes:

When it is determined based on the key information and the spatiotemporal information that a memo reminder triggering condition is met, a memo reminder is performed based on the memo content.
The method according to claim 1, wherein after extracting information from the memo content based on the content modality corresponding to the memo content to obtain key information, the method further comprises:

Performing vector encoding on the key information to obtain a key information vector;

storing the memo content and the key information vector in association with each other;

In the case where it is determined based on the key information that a memo reminder triggering condition is satisfied, performing a memo reminder based on the memo content includes:

When receiving a reminder instruction, vectorize and encode the reminder instruction to obtain an instruction vector;

When the vector similarity between the instruction vector and the key information vector is greater than a threshold, it is determined that a memo reminder triggering condition is met, and a memo reminder is performed based on the memo content.
The method according to claim 11, wherein, when receiving a reminder instruction, vectorizing and encoding the reminder instruction to obtain an instruction vector comprises:

When the reminder instruction is received, extracting information from the reminder instruction based on the content mode corresponding to the reminder instruction to obtain key instruction information;

The key information of the instruction is vectorized and encoded to obtain the instruction vector.
The method according to claim 1, wherein, when there is a need for memo recording, obtaining the memo content comprises:

In case of receiving a memo recording voice instruction, obtaining the memo content indicated by the memo recording voice instruction; or,

When the user behavior information is acquired and the user behavior information satisfies the memo condition, the memo content is determined based on the user behavior information.
The method according to claim 1, wherein, after performing a memo reminder based on the memo content when it is determined based on the key information that a memo reminder triggering condition is met, the method further comprises:

In response to the memo deletion instruction, the memo content is deleted.
A memo reminder device, comprising:

The acquisition module is used to obtain the memo content when there is a need for memo recording;

An information extraction module, configured to extract information from the memo content based on the content mode corresponding to the memo content to obtain key information, wherein different information extraction methods are used in different content modes;

A storage module, used for associating and storing the memo content and the key information;

The memo reminder module is used to make a memo reminder based on the memo content when it is determined based on the key information that a memo reminder triggering condition is met.
A terminal comprises a processor and a memory; the memory stores at least one program, and the at least one program is used to be executed by the processor to implement the memo reminder method according to any one of claims 1 to 14.
A computer-readable storage medium stores at least one program, wherein the at least one program is used to be executed by a processor to implement the memo reminder method according to any one of claims 1 to 14.
A computer program product, the computer program product comprising computer instructions, the computer instructions being stored in a computer-readable storage medium; a processor of a computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device implements the memo reminder method as described in any one of claims 1 to 14.