CN115526602A

CN115526602A - Memo reminding method, device, terminal and storage medium

Info

Publication number: CN115526602A
Application number: CN202211249686.9A
Authority: CN
Inventors: 曾理; 王立中; 米岚
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2022-12-27
Also published as: WO2024078210A1

Abstract

The embodiment of the application discloses a memo reminding method, a memo reminding device, a terminal and a storage medium, and belongs to the technical field of man-machine interaction. The method comprises the following steps: acquiring memo contents under the condition that a memo recording requirement exists; extracting the information of the memo content based on the content modality corresponding to the memo content to obtain key information; storing the memorandum content and the key information in an associated manner; and under the condition that the memo reminding triggering condition is determined to be met based on the key information, performing memo reminding based on the memo content. The information required to be memorized by the user is usually in a multi-content mode, the scheme extracts information and stores key information according to different content modes, the efficiency and the quality of memorandum reminding are improved while the memorandum contents are enriched, and the memorandum reminding interaction experience of the user is improved.

Description

Memo reminding method, device, terminal and storage medium

Technical Field

The embodiment of the application relates to the technical field of human-computer interaction, in particular to a memo reminding method, a memo reminding device, a terminal and a storage medium.

Background

Along with social development, the work and life of people are increasingly abundant, further, the information needing to be processed is increasingly increased, heavy memory tasks are brought along with the information, and people are gradually used to assisting memory by utilizing the powerful storage function of an intelligent terminal to improve the work and life efficiency based on the rapid development of digital science.

In the related art, a terminal memorializes texts or images input by a user, and the like, and memorandum contents are limited to a single mode, so that the memorandum flexibility of the user is limited, and the human-computer interaction efficiency is influenced.

Disclosure of Invention

The embodiment of the application provides a memo reminding method, a memo reminding device, a terminal and a storage medium. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a memo reminding method, where the method includes:

acquiring memo contents under the condition that a memo recording requirement exists;

extracting information of the memo content based on the content modality corresponding to the memo content to obtain key information, wherein the information extraction modes under different content modalities are different;

performing associated storage on the memo content and the key information;

and under the condition that a memo reminding triggering condition is determined to be met based on the key information, performing memo reminding based on the memo content.

On the other hand, the embodiment of the present application provides a memo reminding device, and the device includes:

the acquisition module is used for acquiring memo contents under the condition that a memo recording requirement exists;

the information extraction module is used for extracting information of the memo content based on the content modality corresponding to the memo content to obtain key information, wherein the information extraction modes in different content modalities are different;

the storage module is used for performing associated storage on the memo content and the key information;

and the memorandum reminding module is used for reminding the memorandum based on the memorandum contents under the condition that the key information confirms that the trigger condition of the memorandum reminding is met.

In another aspect, an embodiment of the present application provides a computer device, which includes a processor and a memory; the memory stores at least one program for execution by the processor to implement the memo alert method as described in the above aspect.

In another aspect, an embodiment of the present application provides a computer-readable storage medium, where at least one program is stored, and the at least one program is used for being executed by a processor to implement the memo reminding method according to the above aspect.

In another aspect, embodiments of the present application provide a computer program product including computer instructions, which are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to enable the computer device to execute the memo reminding method provided by the above aspect.

In the embodiment of the application, based on the memo voice instruction input by the user, the terminal acquires multi-mode multi-dimensional information such as text, vision, hearing, scene information, space-time information and the like through various sensors to form the memo content, and the richness of the memo content is improved while the multi-mode data is compatible. On the basis of obtaining the memo content, the embodiment of the application determines the intention type of the memo content through information extraction, further automatically judges the trigger mode of the memo reminding, and carries out the memo reminding under the condition of meeting the trigger condition of the memo reminding; the embodiment of the application expands and supports the input and output of multi-mode information, improves the mode of helping a user to remember, and further improves the efficiency and quality of man-machine interaction.

Drawings

Fig. 1 is a block diagram illustrating a structure of a terminal according to an exemplary embodiment of the present application;

FIG. 2 is a flow chart illustrating a memo alert method provided by an exemplary embodiment of the present application;

FIG. 3 is a diagram illustrating a memo alert method according to an exemplary embodiment of the present application;

FIG. 4 illustrates a schematic diagram of actively acquiring memo content as provided by an exemplary embodiment of the present application;

FIG. 5 illustrates a schematic diagram of obtaining an actively triggered tickler as provided by an exemplary embodiment of the present application;

FIG. 6 is a diagram illustrating the acquisition of a passively triggered tickler as provided by an exemplary embodiment of the present application;

FIG. 7 is an exemplary visual modality memo content of the present application;

FIG. 8 illustrates a schematic diagram of an actively triggered reminder provided by an exemplary embodiment of the present application;

FIG. 9 illustrates a flow chart of passively triggering a reminder provided by an exemplary embodiment of the present application;

FIG. 10 illustrates a diagram of a combined spatiotemporal information memo provided by an exemplary embodiment of the present application;

FIG. 11 is a diagram illustrating the acquisition of extended content provided by an exemplary embodiment of the present application;

fig. 12 shows a block diagram of a memo reminding device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

For convenience of understanding, terms referred to in the embodiments of the present application will be explained first.

Natural Language Processing (NLP): is the field of computer science, artificial intelligence, linguistics focusing on the interaction between computers and human languages. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing does not generally study natural language but rather develops computer systems, and particularly software systems therein, that can efficiently implement natural language communications. Through natural language processing, people can use the computer with the most used language without spending a great deal of time and energy to learn various computer languages which are not natural and used; through which people can further understand the mechanisms of human language ability and intelligence.

Referring to fig. 1, a block diagram of a terminal according to an exemplary embodiment of the present application is shown. Terminal 100 may include one or more of the following components: processor 110, memory 120, display 130, microphone 140.

Processor 110 may include one or more processing cores. The processor 110 connects various parts within the overall terminal 100 using various interfaces and lines, performs various functions of the computer device 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and calling data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), field Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Neural-Network Processing Unit (NPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing contents required to be displayed by the touch display screen; the NPU is used for realizing an Artificial Intelligence (AI) function; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a single chip.

The Memory 120 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). Optionally, the memory 120 includes a non-transitory computer-readable medium. The memory 120 may be used to store instructions, programs, code sets, or instruction sets. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like; the storage data area may store data (such as audio data, a phonebook) created according to the use of the computer device 100, and the like.

The display screen 130 is a component for performing image display. The display screen 130 may be a built-in screen of the terminal, such as a screen of a smart phone, or an external screen of the terminal, such as an external display of a personal computer.

In some embodiments, the display screen 130 has a touch function in addition to the image display function, that is, the control of the display content can be realized by touching and clicking the display screen 130.

The microphone 140 is a component for collecting external sounds. In the embodiment of the present application, the terminal 100 supports the user to perform human-computer interaction in a memo reminding scene through a voice instruction, and the microphone 140 may be used to collect voice and audio information of the user to perform memo reminding.

In addition, those skilled in the art will appreciate that the configuration of terminal 100 as illustrated in the above figures does not constitute a limitation of computing devices, which may include more or fewer components than those illustrated, or some of the components may be combined, or a different arrangement of components. For example, the terminal 100 further includes a camera module, a speaker, a radio frequency circuit, an input unit, a sensor (such as an acceleration sensor, an angular velocity sensor, a light sensor, and the like), an audio circuit, a Wi-Fi module, a power supply, a bluetooth module, and the like, which are not described herein again.

Please refer to fig. 2, which shows a flowchart of a memo reminding method according to an exemplary embodiment of the present application. The method may include the following steps.

Step 201, obtaining the memo content under the condition that the memo recording requirement exists.

The terminal responds to the memo operation of the user to obtain corresponding memo content, wherein the memo content can be one or a combination of a plurality of texts, images, videos and audios, namely the memo content can be multi-modal. The memo content will be described with reference to a part of application scenarios, as shown in fig. 3, in a scenario of web browsing, the memo content may be digital content, such as an online product, a news report, an electronic book, and a URL (Uniform Resource Locator); in a scene of convenient travel, the memo content can be map navigation data and screenshot indicating a route, and can also comprise real physical coordinates of physical world such as scenic spots and restaurants, official website, introduction pictures and audio, and attack text and the like; in the scene of convenient life, the memo content can be schedule information, travel information, note text and the like.

In a possible implementation manner, the fact that the memo requirement exists may be that the terminal receives a memo voice command, and accordingly, the terminal acquires the memo content indicated by the memo voice command. The terminal performs ASR (Automatic Speech Recognition) processing on voice audio information contained in the received memo voice instruction, determines memo content indicated by the memo voice instruction, and further performs corresponding obtaining operation based on a memo content modality, for example, in the case that the memo content is a voice input text, the terminal intercepts corresponding text from the text obtained by ASR processing as the memo content, in the case that the memo voice instruction indicates that the memo content is a picture, the terminal obtains the corresponding memo content in the modes of screenshot or picture saving and the like, and in the case that the user indicates to perform memo on the current webpage, the terminal obtains a webpage link corresponding to the webpage as the memo content.

Illustratively, under the condition of reminding a memo based on a memo voice instruction, the terminal responds to the memo voice instruction of the user for calling the voice to acquire the memo voice instruction of the user to remind me to catch the airplane, the terminal determines that memo content of the memo voice instruction is flight information and itinerary information through processing modes such as ASR (auto-regressive and asynchronous receiver) and the like, and then the terminal acquires a currently displayed flight information picture through a screenshot tool and acquires a flight information description text as an itinerary text, wherein the flight information picture and the itinerary text are both memo content.

Optionally, in the case that a memo recording requirement exists, the terminal may acquire the user behavior information, and the user behavior information meets the memo condition, and correspondingly, the terminal determines the memo content based on the user behavior information. In working life, the terminal actively understands user behaviors based on the user voice, operation behaviors, action behaviors, contextual information and the like obtained through perception, actively acquires memo contents based on judgment of user requirements, can complete memo without the need of a user to call equipment, avoids the problem that the user forgets to perform memo operation to omit information, and brings natural and insensible interaction experience to the user.

Schematically, as shown in fig. 4, in a scenario where the terminal senses that the user is jogging outdoors based on the motion sensor and the position sensor, the terminal actively understands the action behavior and the context information of the user, judges that the user has a need to store the current jogging route, and further automatically acquires route information and coordinate information as memo content.

It should be noted that, in order to protect the privacy of the user, when obtaining the memo content, if the terminal needs to use a sensor with permission limitation, such as a camera, a recorder, and the like, the terminal may trigger the sensor to open and obtain the memo content based on the memo recording voice instruction of the user, or may actively inquire to obtain permission of the user through displaying a reminding pop-up window, and open the sensor and obtain the memo content based on the forward feedback of the user.

And 202, extracting information of the memo content based on the content modality corresponding to the memo content to obtain key information, wherein the information extraction modes under different content modalities are different.

The key information may include core essences such as attributes and themes of the memo content, and may also include memo spatio-temporal information such as a time stamp and a climate, and information such as an intention type and a trigger mode determined based on analysis of the memo content.

Under the condition that the memorandum contents input by the user are combined with multiple modalities, such as website text + screenshot, user instruction text + photo, song audio + title text and the like, the terminal respectively extracts information of a single modality for the memorandum contents of different modalities to obtain sub-key information of the memorandum contents of each modality and then combines the sub-key information to obtain key information.

And step 203, performing associated storage on the memo content and the key information.

The key information obtained in step 202 includes different data forms, such as text, entity relationship pairs, image area coordinates, pixel values, spectrogram, timestamp, temperature, humidity, altitude, latitude and longitude, and the like. The key information in different dataforms corresponds to different dimensions of the memo content.

In one possible implementation, as shown in fig. 3, based on the correspondence between the Key information and the memo content, the terminal constructs the obtained Key information into a semi-structured Key-value data structure, such as a dictionary, hashmap (hash table), etc., where the table indicates, by way of example, the basic form of storing the Key information by using the semi-structured data.

Table semi-structured basic form for storing key information

And step 204, performing memo reminding based on the memo content under the condition that the fact that the memo reminding triggering condition is met is determined based on the key information.

The method for reminding the memo content includes the steps of setting a message notification frame, setting a prompt audio or a vibration mode, and setting a prompt audio or a vibration mode, wherein the prompt audio is set in the message notification frame, and the prompt audio is set in the message notification frame. The terminal can also perform memo reminding in a passive reminding mode, namely the terminal takes the query input of the user as a memo reminding trigger condition and performs memo reminding based on corresponding memo contents only after obtaining the query input of the user.

The feedback form of the terminal for reminding the memo can be an original file which is stored in an associated manner, such as original voice of a user, or memo information obtained by processing the original file by the terminal, such as text synthesized voice or visual notification displayed on the terminal.

In summary, in the embodiment of the application, based on the memo voice instruction input by the user, the terminal acquires multi-mode multi-dimensional information such as text, vision, hearing, contextual information, temporal-spatial information and the like through various sensors to form the memo content, and the richness of the memo content is improved while the multi-mode data is compatible. On the basis of acquiring the memo content, the embodiment of the application determines the intention type of the memo content through information extraction, further automatically judges the trigger mode of the memo reminding, and carries out the memo reminding under the condition of meeting the trigger condition of the memo reminding; the embodiment of the application expands and supports the input and output of multi-mode information, improves the mode of helping the user to remember, and further improves the efficiency and quality of human-computer interaction.

In daily life, information that a user needs to memorize is not limited to text information, but also includes a large amount of visual information such as image information and video information, and auditory information such as music information and voice information, that is, when the user uses a terminal to help memorize, information content that the user needs to memorize is often multi-modal. Compared with the prior art that only single-mode input is supported during the memo process, in the embodiment of the application, the terminal supports obtaining of multi-mode memo contents, and further, based on the memo contents in different content modes, the terminal adopts a corresponding key information extraction mode to determine the key information in the memo contents to store, so that the man-machine interaction efficiency under the memo reminding scene is improved conveniently. As shown in fig. 3, the method for extracting the memo content may be any one or more of the following methods:

1. when the content modality corresponding to the memo content includes a text modality, natural Language Processing (NLP) is performed on the memo content to obtain text key information.

In one possible implementation, as shown in fig. 3, the terminal needs to understand the meaning of the memo content through NLU (Natural Language Understanding). The terminal firstly carries out Named Entity Recognition (NER) on the memo content to obtain Entity information. The predefined entity types may include time, location, name, and article, and may also include currency, organization, and the like, which are not limited in this application. When the memo content of the text modality is acquired, the terminal performs entity identification and labeling on the memo text based on the entity type, for example, when the memo content is based on "three points tomorrow, in zhongshan park and Li Ming meet", the terminal obtains an identification result through the NER: three points in tomorrow, zhongshan park and Li Ming for the name of man. It should be noted that, in the embodiments of the present application, a method for implementing NER is not limited.

Further, the terminal performs Entity Relationship Extraction (ERE) on the memo content to obtain Entity relationship information. And the terminal performs entity extraction and relation extraction based on the named entity identification result of the memo content obtained by the NER, and simplifies the memo content into a core entity relation so as to perform text analysis. For example, based on the memo content being "i put the key in the desk drawer", the terminal gets the recognition result through NER as: i put [ article ] in [ position ] desk drawer, based on the above-mentioned recognition result, the terminal carries on the entity relation to draw, obtain the entity relation information: key [ position ] desk drawer.

Correspondingly, the terminal can also extract the subject abstract (Text Summarization) of the memo content to obtain the subject information. In the present application, the manner of obtaining the text summary of the memo content is different, and the main information may be an extraction summary (extraction summary) or a generation summary (abstraction summary), which is not limited in the present application. Illustratively, the terminal can determine that the main body of the text is "eat" by extracting the subject abstract based on the fact that the memo content is "eat one together with three pieces of paper at night".

On the basis of determining the content of the memo text, further, the terminal performs text intention identification on the memo content to obtain an intention type, wherein the intention type is used for representing the intention of performing memo recording. For the different kinds of memo content, the user has different storage intentions, for example, when the memo content is flight information, the user expects that the terminal reminder can be received at the corresponding time, and when the memo content is commodity shopping information, the user expects that the terminal reminder can be inquired later. The terminal may determine the intention type of the memo content based on a Text Classification (TC) technique. The terminal can provide basis for determining a memo reminding mode for a subsequent terminal by classifying the memo content into the intention types with different values.

Schematically, under the condition that the memorandum content is 'the mother yells to take medicine at nine night', the terminal can judge that the user expects to be prompted when corresponding conditions are met based on the memorandum content based on the fact that the memorandum content contains conditional prompt content, and then the terminal can determine that the intention type of the terminal is prompting; based on the memo voice instruction of ' help me write the plane down ', the terminal acquires the memo content of ' 3 month and 1 day for you purchase 8: 20, the XXX flight from Chengdu to Beijing has been invoiced ", and then the terminal determines that the intention type is schedule through text intention identification; based on the memo content being "my power-on password is XXX", the terminal determines that its intention type is memo.

The terminal obtains the trigger mode information by performing cause and effect Inference (CI) on the memo content, that is, the key information includes text key information, and the text key information includes the trigger mode information. The triggering mode includes active triggering and passive triggering, and the triggering mode information includes an active triggering condition when the triggering mode indicated by the triggering mode information is active triggering. The trigger mode information at least comprises a trigger mode, and the trigger mode corresponds to the intention type of the memo content. Schematically, as shown in fig. 5, based on the fact that the memo content is "if the Yangtze river is congested and remind me to ride the electric vehicle to work", the terminal determines that the intention type based on the memo content is reminding, further determines that the triggering mode is active triggering, and extracts the traffic condition entity "congestion" in the memo content as triggering mode information; as shown in fig. 6, the terminal acquires a product link and the like corresponding to the product as memo content based on the memo recording voice instruction of "help me collect this skirt", and then determines that the triggering mode is passive triggering based on the intention type of the memo content as memo, and only when the user inputs a reminding instruction, the terminal reminds the user of the memo based on the memo content.

It should be noted that, the above various types of NLP processing, such as NER, ERE, CI, etc., are performed synchronously in the process of extracting the key information of the memo content by the terminal, and the processing result obtained based on the above processing is communicated to form the key information.

2. And under the condition that the content modality corresponding to the memo content comprises a visual modality, carrying out image recognition processing on the memo content to obtain image key information. Further, the terminal performs natural language processing on at least one of the picture text, the picture description text and the video description text to obtain text key information. The way for the terminal to perform the natural language processing on the text information may be any one or a combination of multiple sub-methods in the method 1.

In a possible implementation, in the case that the memo content is a picture, the terminal performs Optical Character Recognition (OCR) on the picture to obtain a picture text. For the picture information mainly comprising text contents, the terminal can convert the character symbols in the picture information into text information through an OCR technology, and can further extract key text information based on the obtained text information to obtain the key information contained in the image.

In another possible implementation, the terminal performs Image natural language description (IC) on the picture to obtain a picture description text. For picture information with picture content as the main, the terminal converts the image into natural language describing the image content in order to determine the information contained in the image. Illustratively, in the case that the memo content is a picture as shown in fig. 7, the terminal obtains a text description through the IC technology as follows: a boy carrying a traveling bag travels, and further, the terminal performs text key information extraction on the text according to the method 1 to determine that the chart is subject to the theme of 'travel'.

It should be noted that, for the same picture, the terminal may perform optical character recognition on the same picture to obtain a text in the picture, and perform natural language description processing on the same picture to further enrich the key information corresponding to the picture. It should also be noted that, for a picture, the terminal may locate the description subject in the picture based on the picture and the picture description text by using a Visual positioning technology (VG), and obtain the location area information of the picture subject, for example, in the case that the memo content is the picture shown in fig. 7, the terminal determines the area location information of the subject "boy" in the picture as a part of the target result based on the picture and the text description by using a VG technology.

And under the condition that the memo content is the video, the terminal performs video understanding on the video to obtain a video description text. The technologies adopted for video understanding can include, but are not limited to, video scene recognition, video action understanding, and video event understanding. Through video understanding, the terminal expresses the video content in a natural language text mode, namely, the information contained in the memo content is embodied through the video description text, further, the terminal can extract the video description text through the text key information to process, further, the video information is clarified, and the follow-up memo reminding for the user based on the memo content is facilitated.

3. And under the condition that the content modality corresponding to the memo content comprises an auditory modality, carrying out audio identification processing on the memo content to obtain audio key information.

In a possible implementation manner, the terminal performs automatic speech recognition on the memo content to obtain an audio text. That is, when the user inputs the memo content in a manner of forgetting to record the voice instruction, the terminal converts the voice information into a natural language text, that is, an audio text, through the ASR, and further, the terminal performs natural language processing on the audio text to obtain text key information. The natural language processing method may be any one or a combination of two or more of the processing methods in method 1.

In another possible implementation manner, the terminal performs audio feature extraction on the memo content to obtain an audio fingerprint. The terminal can extract the digital features in a section of Audio through Audio Fingerprinting Technology and represent the digital features through identifiers, and then acquire the information contained in the Audio memo content. Optionally, the terminal may further calculate a Spectrogram (spectrum) of the audio file, that is, frequency information of the audio in the time domain. Based on the spectrogram or audio fingerprint, the terminal supports the user to query the memo content by using audio (e.g. humming) as a prompting instruction.

Based on the trigger mode information in the text key information, the terminal can be actively triggered or passively triggered in a memorandum reminding mode.

In a possible implementation mode, the intention type corresponding to the memo content belongs to a schedule class or a reminding class, the trigger mode indicated by the trigger mode information of the memo content is active trigger, and the terminal performs active memo reminding based on the memo content under the condition that an active trigger condition is met.

Compared with the prior art, the terminal only actively reminds the user based on the time information and the position information in the memo content, in the embodiment of the application, the memo reminding triggering condition can include any one or combination of more of all entity information in the memo content, that is, the terminal can take the time, the position, the climate and the like in the entity information as the active triggering condition, and also can take the event, the traffic condition and the like in the entity information as the active triggering condition, so that the richness of reminding scenes is improved, and the human-computer interaction experience is improved.

Illustratively, when the memo content is 'reminding people to ride an electric vehicle for work if the Yangtze river is congested', the terminal determines the intention type of the memo content as reminding through intention identification and cause and effect inference, and carries out memo reminding by adopting an active triggering mode, wherein the triggering mode information comprises an active triggering condition [ traffic condition ], namely 'Yangtze river is congested', and further, the terminal actively carries out memo reminding when the condition that the traffic road condition meets the active triggering condition is detected.

Optionally, the terminal may perform memo reminding on objects other than the user who performs the memo operation, that is, compared with the prior art in which the terminal performs memo reminding only for the user of the terminal, in the embodiment of the present application, the user may add memo reminding object information to the memo content, for example, as shown in fig. 8, the user inputs the memo content as "reminding mom to take medicine at nine night" through the memo recording voice instruction, and then the terminal may determine, through information extraction, that the memo reminding object indicated by the key information is the "mom" of the user, and perform active memo reminding on the memo under the condition that the event of "nine night" as the active trigger condition is satisfied.

In another possible implementation manner, the intention type corresponding to the memo content belongs to a schedule class or a memo class, the trigger manner indicated by the trigger manner information of the memo content is passive trigger, and the terminal performs passive memo reminding based on the memo content under the condition that the key information is matched with the reminding instruction.

The corresponding memo reminding triggering condition triggered passively can also be multi-modal, that is, the user can query based on the text corresponding to the voice command, or based on multi-modal or multi-modal combined reminding commands such as picture information, audio information, web page links and the like. Under the condition that the storage content is semi-structured data formed by multi-mode information, the terminal can enrich the freedom degree of user query operation by supporting multi-mode query, so that a user can complete query in a convenient expression mode, the visual feeling of the user is fitted, and the memorandum prompting efficiency and the human-computer interaction experience can be improved under the condition that the memorandum content mode is the same as the prompting instruction mode. Illustratively, the user may retrieve the corresponding song audio in the memo content based on a humming (audio information), or the user may query the corresponding product information in the memo content based on a clothing picture (image information).

As shown in fig. 9, for the passively triggered memo content, the terminal is started in response to the user's wake-up instruction, and performs retrieval and feedback based on the reminder instruction input by the user. Based on the fact that the reminding instruction can be multi-modal, the terminal extracts information of the reminding instruction to obtain instruction key information in a manner similar to the manner of processing the memorandum content by the terminal, and further constructs semi-structured data such as a dictionary and other storage instruction key information. The information extraction and association storage method is the same as the above embodiment, and is not described herein again.

And under the condition that the instruction key information is determined, the terminal compares and matches the instruction key information with the key information corresponding to the memorandum content, and feeds back the key information with the highest correlation as a query result to the user to finish the memorandum reminding. The matching process is carried out between two key information dictionaries, each dictionary comprises a plurality of key value pairs, and the similarity is determined by introducing the number of similar values in the two dictionaries during matching.

Illustratively, based on the prompt instruction "where did i put the key? The terminal determines that the instruction key information is 'key', and then the terminal performs matching in a key information dictionary corresponding to the memo content to obtain a dictionary with the attribute including 'key', and feeds back the original data of the obtained dictionary to the user 'i put the key in a desk drawer'.

In one possible implementation mode, under the condition that a memo recording requirement exists, the terminal obtains space-time information, and the space-time information is used for representing time and space states when the memo recording is carried out. The time-space information may include a timestamp corresponding to the memo requirement, a current location, an altitude, a temperature and humidity, climate information, and the like. The time-space information is important information for enriching the memo content, so that the query support degree can be improved, a query tag is provided for a subsequent user to query based on the memo content, and the memo reminding efficiency is improved.

Further, in the case that the spatio-temporal information exists, the terminal stores the memo content and the key information in association with the spatio-temporal information. Correspondingly, under the condition that the time-space information exists in the memo content, when the terminal determines that the memo reminding triggering condition is met based on the key information and the time-space information, the terminal reminds the user of the memo based on the memo content. For example, as shown in fig. 10, based on that the memo content is "summer-heat-avoiding mountain village is really beautiful", the terminal acquires the blank information such as the timestamp, the temperature and the humidity, the altitude and the like of the memo moment while acquiring the photos, the position coordinates, the videos and the text descriptions of the scenic spots, and constructs a key information dictionary for association and storage through information extraction, and under the condition that the reminding instruction "where i go to play when it is hottest in the last year" is acquired, the terminal extracts and acquires the instruction key information through the instruction information: the time is the last year, the temperature is the hottest, and the subject travels, so that the terminal feeds back the text 'summer heat-avoiding mountain village true and beautiful' and the memo content such as pictures and videos to the user based on that the spatio-temporal information in the memo content meets the memo reminding triggering condition. The spatiotemporal information is obtained and used as memorandum contents, so that the accuracy of a feedback result is improved while a user can conveniently inquire the memorandum contents based on the reminding instruction.

When the memo content is stored by using a memo voice instruction mode, a user usually inputs the memo content in a customary daily communication mode, and the input content is often single and incomplete, so that the condition that the memo content cannot meet the memo requirement exists, and subsequent memo reminding experience is influenced. In the embodiment of the application, based on the fact that the terminal can support multi-mode memo content, on the basis of information extraction of the memo content input by a user, the terminal obtains the extended content corresponding to the memo content, and enriches the memo content through the extended content, so that the human-computer interaction experience of the user in a memo reminding scene is improved.

And under the condition that the content modality of the memo content is a text modality and the information quantity of the key information is less than the information quantity threshold value, the terminal acquires at least one of the auditory expansion content or the visual expansion content.

In a possible implementation mode, the memo content input by the user only comprises a text mode, the terminal extracts the information of the text memo content to obtain key information, and when the information amount of the key information is smaller than an information amount threshold value, namely the number of the key value pairs is smaller, a reminding instruction which can be used by the user for inquiring the corresponding memo content is limited under a passive triggering scene, so that the condition that the user cannot obtain the required memo content based on the key information dictionary exists. In the embodiment of the application, the terminal obtains the extended content corresponding to the memo content based on the recording scene, wherein the extended content can be a web page snapshot, a screenshot and a target area. The visual extension contents such as pixels and the like can also be auditory extension contents such as environmental sounds and music, and richer information is provided for the memo reminding.

It should be noted that, when obtaining the multimodal extended content, the terminal needs to use a camera, a recorder, and other sensors related to the privacy of the user, so that in a scene of obtaining the extended content, the terminal can set an active query, and collect corresponding information after obtaining the forward feedback of the user. For example, the user indicates "help me to write down the stir-fried dish of the Xiaoming family dish and eat well" through the memo recording voice instruction, and based on the fact that the memo content information amount is small, the terminal can remind the user to take a picture of a shop signboard through voice or visual display and the like, so that the visual extension content is obtained, and the memo information is enriched. Optionally, the terminal may be further configured to automatically turn on the sensor in response to a user instruction.

Optionally, the extended content may further include location information, for example, the user indicates "help me to write down the saute and eat of the xiao ming jia frequent dish" through a memo recording voice instruction, the terminal obtains a text modal memo content, and for enriching the memo information, while reminding the user to take a picture of a store to obtain a visual extended content, the terminal may correspondingly obtain a location coordinate of the "xiao ming frequent dish" and a user's route to go and the like as the extended content.

Further, the terminal stores the memo content, the key information and the extended content in an associated manner. The extended content can be multi-modal, the terminal can still construct semi-structured data in a key value pair form to perform associated storage on the extended content, and the second table explains the manner of performing associated storage by the terminal in an enumerated manner.

Table two containing related storage mode of extended content

Correspondingly, under the condition that the memo reminding triggering condition is determined to be met based on the key information, the memo reminding is carried out based on the memo content and the extended content. The manner of reminding the user is the same as that of the above embodiment, and is not described herein again.

In the application, the commodity link or the web page link such as the public content link in the shopping application has certain timeliness, for example, for the commodity link, after the commodity is put on shelf by a merchant and processed, the network link stored by the user in the form of the memorandum content is invalid immediately, and further, in the subsequent query process, the user cannot obtain the related commodity information based on the memorandum content, and the user experience is influenced. In the embodiment of the application, the terminal can enrich the related information of the memo content by acquiring the extended content, so as to avoid the condition of no inquiry effect.

In one possible implementation mode, in the case that the memo content has a time limit, the terminal acquires the extended content corresponding to the memo content. The terminal firstly extracts the acquired memo content, and when the content modality corresponding to the memo content is determined to have timeliness, for example, the memo content is a URL (uniform resource locator), an online commodity and the like, the terminal acquires the corresponding extended content based on the key information.

Illustratively, as shown in fig. 11, in the case of performing memo reminding based on a memo voice instruction, the terminal obtains a memo voice instruction "help me write this skirt" of the user in response to a call voice of the user, determines, by means of NLP processing and the like, that memo content indicated by the memo voice instruction is commodity information being browsed by the user, and then obtains a current commodity link by the terminal, and captures a current commodity picture by a screenshot tool, or obtains the commodity picture by a page and obtains an introduction text "new spring and autumn dress" of the current commodity in the case that the picture can be stored, where the commodity link, the commodity picture, and the description text are all memo content obtained by the terminal.

Further, under the condition that the fact that the memo reminding triggering condition is met and the memo content is effective is determined based on the key information, the terminal conducts memo reminding based on the memo content. The memo reminding mode is the same as the above embodiment, and is not described herein again.

Correspondingly, under the condition that the fact that the memo reminding triggering condition is met and the memo content is invalid is determined based on the key information, the terminal conducts memo reminding based on the extended content. And under the condition that the memo reminding can be realized based on the extended content, the terminal takes the extended content or the processed extended content as feedback to carry out the memo reminding. For example, when the user inputs a reminding instruction "view collected one-piece dress", the corresponding on-line commodity link as the memo content is invalid, and the terminal stores the picture corresponding to the on-line link as the extension content, so that the terminal can feed back the picture to the user to complete the memo reminding.

Optionally, under the condition that the memo reminding cannot be realized based on the extended content, the terminal can perform online search based on a webpage screenshot, a snapshot and the like in the extended content, and feed back a similar result obtained by the online search to the user to complete the memo reminding, so that the condition that the retrieval result cannot be obtained when the user retrieves the memo content is avoided, and the human-computer interaction experience is improved.

When multi-mode memo content is reminded by a memo, data information among different modes has heterogeneity, so that when semi-structured data is constructed based on the multi-mode memo content for association storage, the problem of complex and variable data forms exists, and the input and output efficiency in the memo reminding process is influenced. In the embodiment of the application, the terminal can enable the memorandum contents in different content modes to be uniformly expressed through multi-mode fusion, and the memorandum reminding efficiency is improved.

And in the input stage of the memorandum content, the terminal carries out vectorization coding on the key information to obtain a key information vector. The vectorization coding method may be a Multimodality Fusion Technology (MFT) in Deep learning, and the terminal converts Multimodality key information in the semi-structured data into a vector in a high-dimensional space by using technologies such as Deep Neural Networks (DNNs) and a Multimodality pre-training model, so as to implement uniform representation of the key information, and provide convenience for a user to obtain memo content through a passive triggering method, that is, through a reminding instruction.

Further, under the condition of obtaining the key information vector, the terminal stores the memorandum content and the key information vector in an associated manner. The terminal may store the key information vector in a form of key value pairs to obtain a key information dictionary, where the key words may be feature vectors, and the values are key information vectors obtained through vectorization coding.

Correspondingly, in the memorandum reminding stage, the terminal carries out vectorization coding on the reminding instruction under the condition that the reminding instruction is received, and an instruction vector is obtained.

In a possible implementation manner, under the condition that the reminding instruction is received, the terminal extracts the information of the reminding instruction based on the content modality corresponding to the reminding instruction to obtain the instruction key information. The method for extracting the information of the reminding instruction by the terminal is the same as the method for extracting the information of the memo content, and the description is omitted here. And under the condition of obtaining the instruction key information, the terminal carries out vectorization coding on the instruction key information to obtain an instruction vector. The way of vectorization coding the instruction key information by the terminal is the same as that in the above embodiment, and is not described here again.

Further, under the condition that the vector similarity of the instruction vector and the key information vector is larger than a threshold value, the condition that a memo reminding triggering condition is met is determined, and memo reminding is carried out based on memo contents. Under the condition of determining the instruction vector, the terminal can compare the instruction vector with the key information vector in the key information dictionary, namely, calculate the cosine distance between the instruction vector and the key information vector in the high-dimensional space, and under the condition that the vector similarity represented by the cosine distance is greater than the vector similarity threshold, the terminal determines that the trigger condition of reminding the memo is met, and then feeds back the corresponding memo content to the user.

It should be noted that, after the passive triggering is completed based on the memo content, if the user considers that the memo content does not need to be retained, the terminal deletes the memo content in response to the memo deletion instruction. For example, the user passively triggers by reminding an instruction of where i put the key, the terminal feeds back to the user "put in the desk drawer" based on the key information, and the user inputs "delete the memo" by voice to inform the terminal to delete the memo content if the user thinks the memo content can be deleted.

Optionally, the terminal may further perform a deletion reminder to the user based on the aging information in the key information, so as to remind the user to delete the expiration information and save the storage space, for example, for flight information serving as the memo content, the terminal performs the flight expiration reminder to the user when the flight time has elapsed, and actively prompts the user to delete the memo content.

Optionally, based on the embodiment of the application, by determining the intention type of the memo content, different trigger mode information is given to the memo content, the trigger mode indicated by the trigger mode information is the actively triggered memo content, and after the terminal is actively pulled up to complete the memo reminding, the memo content also often loses the storage value.

Referring to fig. 12, it shows a block diagram of a memo reminding device according to an exemplary embodiment of the present application, the memo reminding device includes:

an obtaining module 1201, configured to obtain memo content in the case that a memo recording requirement exists;

an information extraction module 1202, configured to extract information of the memo content based on a content modality corresponding to the memo content to obtain key information, where information extraction is performed in different content modalities in different manners;

a storage module 1203, configured to perform associated storage on the memo content and the key information;

and a memo reminding module 1204, configured to perform memo reminding based on the memo content when it is determined that a memo reminding trigger condition is met based on the key information.

Optionally, the information extraction module 1202 is further configured to:

under the condition that the content modality corresponding to the memo content comprises a text modality, performing natural language processing on the memo content to obtain text key information;

under the condition that the content modality corresponding to the memo content comprises a visual modality, carrying out image recognition processing on the memo content to obtain image key information;

and under the condition that the content modality corresponding to the memo content comprises an auditory modality, performing audio identification processing on the memo content to obtain audio key information.

Optionally, in a case that the memo content is processed in natural language to obtain text key information, the information extraction module 1202 is further configured to:

conducting named entity recognition on the memo content to obtain entity information;

extracting and extracting entity relations of the memo contents to obtain entity relation information;

extracting the subject abstract of the memo content to obtain subject information;

performing text intention recognition on the memo content to obtain an intention type, wherein the intention type is used for representing the intention of performing memo recording;

and carrying out causal inference analysis on the memo content to obtain trigger mode information, wherein the trigger mode comprises active trigger and passive trigger, and the trigger mode information comprises an active trigger condition under the condition that the trigger mode indicated by the trigger mode information is the active trigger.

Optionally, under the condition that the key information includes the text key information and the text key information includes the trigger mode information, the memo reminding module 1204 is further configured to:

when the trigger mode indicated by the trigger mode information is active trigger and the active trigger condition is met, performing active memo reminding based on the memo content;

and under the condition that the trigger mode indicated by the trigger mode information is passively triggered, and under the condition that the key information is matched with a reminding instruction, carrying out passive memo reminding based on the memo content.

Optionally, when the memo content is subjected to image recognition processing to obtain image key information, the information extraction module 1202 is further configured to:

under the condition that the memo content is a picture, carrying out optical character recognition on the picture to obtain a picture text; and/or, carrying out image natural language description processing on the picture to obtain a picture description text;

under the condition that the memo content is a video, video understanding is conducted on the video to obtain a video description text;

and performing natural language processing on at least one of the picture text, the picture description text and the video description text to obtain the text key information.

Optionally, under the condition that audio recognition and extraction are performed on the memo content to obtain audio key information, the information extraction module 1202 is further configured to:

carrying out automatic voice recognition on the memo content to obtain an audio text; and/or, audio feature extraction is carried out on the memo content to obtain an audio fingerprint;

and carrying out natural language processing on the audio text to obtain the text key information.

Optionally, the obtaining module 1201 is further configured to:

acquiring extended content corresponding to the memo content;

the storage module 1203 is further configured to:

and performing associated storage on the memo content, the key information and the extended content.

Optionally, the obtaining module 1201 is further configured to:

under the condition that the content modality of the memo content is a text modality and the information amount of the key information is smaller than an information amount threshold value, acquiring at least one of auditory expansion content or visual expansion content;

the memo reminding module 1204 is further configured to:

and under the condition that a memo reminding triggering condition is determined to be met based on the key information, performing memo reminding based on the memo content and the extended content.

Optionally, the obtaining module 1201 is further configured to:

under the condition that the memo content has the aging, acquiring the extended content corresponding to the memo content;

the memo reminding module 1204 is further configured to:

under the condition that a memo reminding triggering condition is determined to be met based on the key information and the memo content is effective, performing memo reminding based on the memo content;

and under the condition that a memo reminding triggering condition is determined to be met based on the key information and the memo content is invalid, performing memo reminding based on the extended content.

Optionally, the obtaining module 1201 is further configured to:

under the condition that a memo recording requirement exists, acquiring space-time information, wherein the space-time information is used for representing time and space states when memo recording is carried out;

the storage module 1203 is further configured to:

performing associated storage on the memo content, the key information and the spatiotemporal information;

the memo reminding module 1204 is further configured to:

and under the condition that a memo reminding triggering condition is determined to be met based on the key information and the time-space information, performing memo reminding based on the memo content.

Optionally, the apparatus further includes an encoding module, configured to perform vectorization encoding on the key information to obtain a key information vector;

the storage module 1203 is further configured to:

performing associated storage on the memo content and the key information vector;

the encoding module is further configured to:

under the condition that a reminding instruction is received, vectorizing coding is carried out on the reminding instruction to obtain an instruction vector;

the memo reminding module 1204 is further configured to:

and under the condition that the vector similarity of the instruction vector and the key information vector is greater than a threshold value, determining that a memo reminding triggering condition is met, and carrying out memo reminding based on the memo content.

Optionally, when the prompting instruction is vectorized and encoded to obtain an instruction vector, the information extraction module 1202 is further configured to:

under the condition that the reminding instruction is received, extracting information of the reminding instruction based on a content modality corresponding to the reminding instruction to obtain instruction key information;

the encoding module is further configured to:

vectorizing and coding the instruction key information to obtain the instruction vector.

Optionally, the obtaining module 1201 is further configured to:

under the condition that a memo voice instruction is received, acquiring the memo content indicated by the memo voice instruction; alternatively, the first and second electrodes may be,

and under the condition that user behavior information is acquired and meets the memo condition, determining the memo content based on the user behavior information.

Optionally, the apparatus further includes a deleting module, configured to delete the memo content in response to a memo deleting instruction.

In summary, in the embodiment of the application, the terminal utilizes the acquisition module to acquire multi-modal multi-dimensional information such as text, visual sense, auditory sense, contextual information, temporal-spatial information and the like through various sensors to form the memo content, and the richness of the memo content is improved while the multi-modal data is compatible. In the aspect of processing the memo content, the information extraction module extracts all-dimensional information and stores the information in a correlation mode through the storage module, the terminal automatically judges the trigger mode of the memo reminding on the basis of determining the intention type of the memo content, and for the passively triggered memo content, when a user passively triggers the memo reminding through a reminding instruction, the user can also trigger through multi-mode input. According to the embodiment of the application, the mode of helping the user to remember is improved by expanding and supporting the input and the output of multi-mode information, and further the efficiency and the quality of man-machine interaction are improved.

The embodiment of the application also provides a computer-readable storage medium, which stores at least one program, and the at least one program is used for being executed by a processor to realize the memo reminding method according to the embodiment.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the memo reminding method provided by the embodiment.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A memo reminding method is characterized by comprising the following steps:

performing associated storage on the memo content and the key information;

2. The method of claim 1, wherein the extracting information of the memo content based on the content modality corresponding to the memo content to obtain key information comprises:

3. The method of claim 2, wherein the natural language processing of the memo content to obtain the text key information comprises at least one of:

extracting entity relations of the memo contents to obtain entity relation information;

4. The method according to claim 3, wherein the key information includes the text key information, and the text key information includes the trigger mode information;

the reminding based on the memo content under the condition that a memo reminding triggering condition is met based on the key information is determined, comprises:

5. The method of claim 2, wherein the image recognition processing of the memo content to obtain image key information comprises:

the method further comprises the following steps:

6. The method of claim 2, wherein the audio recognition extraction of the memo content to obtain audio key information comprises:

the method further comprises the following steps:

7. The method of claim 1, wherein after the obtaining the memo content, the method further comprises:

acquiring extended content corresponding to the memo content;

the associating and storing the memo content and the key information comprises:

8. The method of claim 7, wherein the obtaining of the extended content corresponding to the memo content comprises:

9. The method of claim 7, wherein the obtaining of the extended content corresponding to the memo content comprises:

the method further comprises the following steps:

10. The method of claim 1, further comprising:

the associating and storing the memo content and the key information comprises:

the reminding based on the memo content under the condition that a memo reminding triggering condition is determined to be met based on the key information comprises the following steps:

11. The method of claim 1, wherein after extracting information of the memo content based on the content modality corresponding to the memo content and obtaining key information, the method further comprises:

vectorizing coding is carried out on the key information to obtain a key information vector;

carrying out vectorization coding on a reminding instruction under the condition of receiving the reminding instruction to obtain an instruction vector;

12. The method of claim 11, wherein, in the case that the alert instruction is received, vectorizing the alert instruction to obtain an instruction vector, comprises:

13. The method of claim 1, wherein the obtaining the memo content in case of a memo requirement comprises:

and under the condition that user behavior information is acquired and meets a memo condition, determining the memo content based on the user behavior information.

14. The method of claim 1, wherein after performing a memo reminder based on the memo content if it is determined that a memo reminder trigger condition is satisfied based on the key information, the method further comprises:

and in response to a memo deleting instruction, deleting the memo content.

15. A memo alert device, said device comprising:

the acquisition module is used for acquiring memo contents under the condition that the memo recording requirement exists;

and the memorandum reminding module is used for carrying out memorandum reminding based on the memorandum contents under the condition that the memorandum reminding triggering conditions are determined to be met based on the key information.

16. A terminal, characterized in that the terminal comprises a processor and a memory; the memory stores at least one program for execution by the processor to implement the memo reminding method as claimed in any one of claims 1 to 14.

17. A computer-readable storage medium, wherein the storage medium stores at least one program for execution by a processor to implement the memo alert method as claimed in any one of claims 1 to 14.

18. A computer program product, characterized in that the computer program product comprises computer instructions, the computer instructions being stored in a computer readable storage medium; a processor of a computer device reads the computer instructions from the computer readable storage medium, the processor executing the computer instructions to cause the computer device to implement the memo reminding method as recited in any one of claims 1 to 14.