CN112150103A

CN112150103A - Schedule setting method and device and storage medium

Info

Publication number: CN112150103A
Application number: CN202010936030.9A
Authority: CN
Inventors: 练志峰; 冯牮
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-08
Filing date: 2020-09-08
Publication date: 2020-12-29
Anticipated expiration: 2040-09-08
Also published as: CN112150103B

Abstract

The embodiment of the application discloses a schedule setting method, a schedule setting device and a storage medium, wherein the schedule setting method, the schedule setting device and the storage medium can acquire voice information input by a user; recognizing the voice information to obtain text information corresponding to the voice information; separating the text information to obtain time information, and determining schedule time according to the time information and the current time; generating schedule information according to the schedule time and the text information; and carrying out schedule setting on the terminal according to the schedule information. By automatically identifying the time information in the voice information, the schedule can be conveniently and quickly set, and the user experience is improved.

Description

Schedule setting method and device and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a schedule setting method, a schedule setting device and a storage medium.

Background

In recent years, many calendar applications provide a function of manually inputting a calendar, in which a user usually inputs time and manually operates and inputs text, and a function of inputting voice and text, but the function of intelligently analyzing time and the like does not exist. However, such a schedule setting method is complicated to operate, resulting in poor user experience.

Disclosure of Invention

In view of this, embodiments of the present application provide a schedule setting method, an apparatus, and a storage medium, which may soothe the mood of a user and improve the experience of the user.

In a first aspect, an embodiment of the present application provides a schedule setting method, including:

acquiring voice information input by a user;

recognizing the voice information to obtain text information corresponding to the voice information;

separating the text information to obtain time information, and determining schedule time according to the time information and the current time;

generating schedule information according to the schedule time and the text information;

and carrying out schedule setting on the terminal according to the schedule information.

In an embodiment, the performing schedule setting on the terminal according to the schedule information includes:

displaying the schedule information;

and when the determination operation of the user for the schedule information is detected, carrying out schedule setting on the terminal according to the schedule information.

In an embodiment, the performing schedule setting on the terminal according to the schedule information further includes:

displaying the schedule information;

when the modification operation of the user for the schedule information is detected, obtaining the modified schedule information;

and carrying out schedule setting on the terminal according to the modified schedule information.

In an embodiment, the performing schedule setting on the terminal includes:

and setting the terminal to prompt schedule information based on the schedule time.

In an embodiment, the preset word segmentation model includes a word segmentation model, and the separating the text information to obtain time information and determining schedule time according to the time information and current time includes:

recognizing and dividing the text information by adopting a preset word segmentation model to obtain a plurality of word units corresponding to the text information, and determining word attributes corresponding to the word units;

and extracting time information in the text information according to the word units and the word attributes.

In an embodiment, extracting time information in the text information according to the word unit and the word attribute, and determining a schedule time according to the time information and a current time includes:

acquiring semantic relations among the word units according to the word units and the word attributes;

and determining time word units from the word units, and determining the schedule time according to the time word units, the current time and the semantic relationship among the time word units.

In an embodiment, the recognizing and dividing the text information by using a preset word segmentation model to obtain a plurality of word units corresponding to the text information, and determining word attributes corresponding to the word units includes:

identifying word units in the text information;

performing coding operation on the text information according to the word unit to obtain a text characteristic vector;

generating a part-of-speech feature vector according to the text feature vector and the hidden layer state at the preset word segmentation model feature extraction moment;

and performing decoding operation on the part-of-speech characteristic vector, and determining word units corresponding to the text information and word attributes corresponding to the word units.

In an embodiment, the generating a part-of-speech feature vector according to the text feature vector and a hidden state at a preset word segmentation model feature extraction time includes:

determining the previous feature extraction time of the current feature extraction time, acquiring the left hidden layer state of the previous feature extraction time, and calculating the left hidden layer state of the current feature extraction time according to the feature extraction time and the left hidden layer state;

determining the next feature extraction time of the current feature extraction time, acquiring the right hidden layer state of the next feature extraction time, and calculating the right hidden layer state of the current feature extraction time according to the feature extraction time and the right hidden layer state;

and generating a part-of-speech feature vector according to the left hidden layer state and the right hidden layer state of the current feature extraction moment.

In an embodiment, the recognizing the voice information to obtain text information corresponding to the voice information includes:

extracting audio characteristic information of the voice information;

and acquiring text information corresponding to the voice information according to the audio characteristic information of the voice information.

In an embodiment, the obtaining text information corresponding to the voice information according to the audio feature information of the voice information includes:

acquiring phonemes corresponding to the audio characteristic information according to a preset acoustic model;

comparing and matching the phonemes with a preset dictionary according to a preset language model to obtain text words corresponding to the phonemes;

and extracting semantic association information among the text words, and combining the text words to obtain text information according to the association information.

In a second aspect, an embodiment of the present application further provides a schedule setting apparatus, including:

the acquiring unit is used for acquiring voice information input by a user;

the recognition unit is used for recognizing the voice information to obtain text information corresponding to the voice information;

the separation unit is used for separating the text information to obtain time information and determining schedule time according to the time information and the current time;

the generating unit is used for generating schedule information according to the schedule time and the text information;

and the setting unit is used for carrying out schedule setting on the terminal according to the schedule information.

In a third aspect, embodiments of the present application provide a storage medium having a computer program stored thereon, which, when run on a computer, causes the computer to execute a schedule setting method as provided in any of the embodiments of the present application.

The embodiment of the application can acquire the voice information input by the user; recognizing the voice information to obtain text information corresponding to the voice information; separating the text information to obtain time information, and determining schedule time according to the time information and the current time; generating schedule information according to the schedule time and the text information; and carrying out schedule setting on the terminal according to the schedule information. By automatically identifying the time information in the voice information, the schedule can be conveniently and quickly set, and the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic scene diagram of a schedule setting system according to an embodiment of the present invention;

fig. 2a is a first flowchart of a schedule setting method according to an embodiment of the present invention;

FIG. 2b is a schematic diagram of a terminal interface when implementing schedule setting according to an embodiment of the present invention;

fig. 2c is a schematic flow chart of a schedule setting method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a schedule setting apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

FIG. 5a is a schematic structural diagram of a segmentation model according to an embodiment of the present invention;

FIG. 5b is a schematic diagram of a dependency relationship analysis process provided by an embodiment of the present invention;

FIG. 5c is a diagram illustrating the dependency results provided by an embodiment of the present invention;

fig. 5d is a schematic diagram of a rule extraction result according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the invention provides a schedule setting method, a schedule setting device and a storage medium.

The present invention relates to Artificial Intelligence (AI) and machine learning techniques, wherein AI is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The schedule setting apparatus may be specifically integrated in a computer device, such as a terminal or a server. The terminal can be a mobile phone, a tablet computer, a notebook computer and other devices, and also can be an intelligent terminal comprising a wearable device, an intelligent sound box, an intelligent television and the like. The server may be a single server or a server cluster composed of a plurality of servers.

Referring to fig. 1, an embodiment of the present invention provides a schedule setting system, which at least includes a terminal and a server, where the terminal and the server are linked through a network.

The above example of fig. 1 is only an example of a system architecture for implementing the embodiment of the present invention, and the embodiment of the present invention is not limited to the system architecture shown in fig. 1, and various embodiments of the present invention are proposed based on the system architecture.

The following are detailed below. The numbers in the following examples are not intended to limit the order of preference of the examples.

As shown in fig. 2a, a schedule setting method is provided, which can be executed by a terminal or a server, and the embodiment is explained by taking the method executed by the server as an example. The specific flow of the schedule setting method is as follows:

101. and acquiring voice information input by a user.

In one embodiment, the terminal may collect voice information input by a user through a signal collecting device (such as a microphone), and forward the collected voice information to the server through a network for voice recognition. Wherein voice information is stored and transmitted between computer devices (including terminals and servers) in the form of audio files.

In an embodiment, in order to improve the universality and reliability of the schedule setting method, the terminal may perform encoding operation and encapsulation on the acquired voice information to obtain an audio file, and transmit the audio file to the server. The server may decode the audio file, for example, the step "acquiring the voice information input by the user" may include:

carrying out decapsulation processing on the audio file to obtain an audio data stream;

and respectively decoding the audio data streams to obtain an audio frame sequence.

The server actually acquires the voice information input by the user in the form of the audio frame sequence.

In an embodiment, referring to fig. 2b, a voice input control is disposed in the terminal interface, and a user may click the voice input control to trigger a voice recording function of the signal acquisition device. The voice input control is an instruction interface for inputting voice, and may be represented in the form of an icon, an input box, a selection box, a button, and the like, and referring to fig. 2b, the voice input control may be represented as a microphone icon.

In an embodiment, a user can wake up the voice assistant through a preset specific instruction, and when the voice assistant is woken up, the voice recording function of the signal acquisition device is triggered, and the terminal recognizes voice information through the voice assistant. The voice assistant comprises a voice recognition engine, and the voice recognition engine can use an ASR technology to recognize voice information in the audio content and acquire text information corresponding to the voice information.

102. And identifying the voice information to obtain text information corresponding to the voice information.

In an embodiment, the recognizing the voice information to obtain the text information corresponding to the voice information may specifically include the following steps:

extracting audio characteristic information of the voice information;

Wherein the audio feature information is information for representing characteristics of sound waves. The sound wave is a sound signal corresponding to voice information, and the sound signal is propagated as a wave, and therefore, may be referred to as a sound wave.

In one embodiment, extracting the audio feature information of the speech information may include the steps of:

dividing the voice information to obtain audio frames;

and extracting the audio frame to perform feature extraction to obtain audio feature information of the voice information.

In an embodiment, the audio feature information may be represented as an MFCC (Mel-Frequency Cepstral Coefficients, Mel-Frequency cepstrum coefficient) vector, in order to avoid an excessive change of two adjacent frames, before dividing the speech information to obtain an audio frame, pre-emphasizing the audio frame, then bringing each frame of the audio frame into a hamming window function to obtain a short-time analysis window corresponding to each frame of the audio frame, obtaining a corresponding Frequency spectrum for each short-time analysis window through FFT (fourier transform), then filtering frequencies that cannot be heard by human beings through a Mel filter bank to obtain a Mel Frequency spectrum, and thereby converting a linear natural Frequency spectrum into a Mel Frequency spectrum that embodies human hearing characteristics. Performing cepstrum analysis (taking logarithm, performing inverse transformation, wherein the actual inverse transformation is generally realized by discrete cosine transformation, and taking coefficients from 2 nd to 13 th after the discrete cosine as mel frequency cepstrum coefficients) on the mel frequency spectrum to obtain the mel frequency cepstrum coefficients. The 12 mel frequency cepstral coefficients of each frame are combined to obtain the cepstral vector of each audio frame. In an embodiment, the inter-frame dynamic change features may also be calculated according to mel-frequency cepstral coefficients, and the mel-frequency cepstral coefficients of each frame together form a cepstral vector. The cepstrum vectors of all audio frames are the audio feature information of the speech information.

The process of acquiring the corresponding text information according to the audio feature information relates to an ASR (Automatic Speech Recognition) technology in the field of artificial intelligence, and the technology is used for converting vocabulary contents in the Speech information into computer-readable input, such as a keystroke, a binary coding operation, or a character sequence. ASR is one of the key technologies of Speech Technology (Speech Technology). The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.

In an embodiment, the obtaining of the text information corresponding to the voice information according to the audio feature information of the voice information may specifically include the following steps:

Where a phoneme is the smallest unit of speech divided from a timbre perspective.

In an embodiment, a Hidden Markov Model (HMM) may be employed as the acoustic model. Hidden Markov Models (HMMs) need to be trained to be used.

In another embodiment, the language model can be trained based on the deep neural network, the characteristic input language model is extracted, and text information corresponding to the voice information is obtained.

In an embodiment, the semantic feature information includes a semantic feature vector, and extracting the semantic feature information of the text information may include the following steps:

dividing the text information to obtain at least one word unit;

presetting a preset dictionary in a word segmentation model, and mapping the word units into text feature vectors, wherein the word segmentation model is a model based on a recurrent neural network;

and generating a semantic feature vector according to the text feature vector and the hidden layer state of the word segmentation model feature extraction moment.

Wherein a word unit may be a character set having a meaning of a word, and a character set may contain one or more characters. A text feature vector is a vector used to represent semantic features of a word unit, and each element of the text feature vector represents a feature having certain semantic and grammatical interpretations. Therefore, each element of the text feature vector may be referred to as a word feature. Wherein, the elements of the text feature vector refer to the numerical values of each dimension of the text feature vector.

The text information can be converted into text feature vectors through a preset Word embedding algorithm (such as Word2Vec, etc.), wherein Word2Vec (Word to vector, Word to vector method) can be used for quickly and effectively expressing a Word unit into a vector form through training an optimized Word segmentation model according to a given corpus. When the word segmentation model receives the word units obtained through division, the word units can be converted into text feature vectors according to a preset dictionary in the word segmentation model. Wherein each word in the predetermined dictionary corresponds to a vector.

The dictionary in the word segmentation model can be stored in a local memory of the schedule setting device as a part of the word segmentation model, and can also be obtained by communicating with a network server through a network.

The semantic feature vector is a vector used for representing the complete semantic features of the text information, and not only contains the semantic feature information of each word unit in the text information, but also contains the associated information among the word units.

For example, the text information is a sentence, word units obtained by dividing the text information can be represented as words, and the semantic feature vector generated by extracting the hidden state at the moment according to the text feature vector and the word segmentation model feature can be understood as: and generating the feature vector of the sentence according to the word feature vector and the associated information among the words.

103. And separating the text information to obtain time information, and determining schedule time according to the time information and the current time.

In an embodiment, the time information is obtained by separating from the text information, and the schedule time is determined according to the time information and the current time; the method specifically comprises the following steps:

The preset word segmentation model is a sequence tagging model, can be used for entity extraction and extraction of atomic information elements in text information, and usually comprises tags such as appointments, organization/organization names, geographic positions, time/dates and character values, and specific tag definitions can be adjusted according to different tasks.

In one embodiment, models such as HMM, MEMM, CRF, neural networks, etc. in machine learning may be used as the segmentation models.

The preset word segmentation model firstly carries out vectorization on text information through character segmentation (word segmentation) and character embedding (word embedding), and then obtains high-level features, namely part-of-speech features through part-of-speech tagging.

identifying word units in the text information;

The preset word segmentation model can convert a plurality of text feature vectors into a semantic feature vector c with a fixed length, and the process can be realized by a Recurrent Neural Network (RNN), such as a Long Short-Term Memory Network (LSTM) and the like.

In an embodiment, the step of generating a part-of-speech feature vector according to the text feature vector and a hidden layer state at a preset word segmentation model feature extraction time may specifically include:

Wherein, referring to FIG. 5c, the LSTM model is formed by inputting x at time t_tCell state C_tTemporary cell state C_t', hidden layer state h_tForgetting door f_tMemory door i_tOutput gate o_tAnd (4) forming. The LSTM calculation process can be summarized as passing information useful for subsequent calculations by forgetting and remembering new information in the cell state, while useless information is discarded,and outputting the hidden layer state at each time step, wherein forgetting, memorizing and outputting are controlled by a forgetting gate, a memorizing gate and an output gate which are calculated by the hidden layer state at the last moment and the current input.

The structure can make the information input before stored in the network and always transmit to the left, the new input can change the history state stored in the network when the input door is opened, the history state stored when the output door is opened can be accessed, the output after the history state is influenced, and the forgetting door is used for clearing the history information stored before.

In unidirectional long-and-short memory networks, f_tThe information to be forgotten can be selected, called forgetting gate, and the value of the forgetting gate is determined by the hidden layer state at the previous moment and the input of the current feature extraction moment:

f_t＝σ(W_f[h_t-1，x_t]+b_f)

the memory gate determines what new information is stored in the cell state, and when the hidden layer state at the previous time and the input at the current feature extraction time are input, the memory gate can output the value of the memory gate and the temporary cell state:

i_t＝σ(W_i[h_t-1，x_t]+b_i)

C_t’＝tanh(W_C[h_t-1，x_t]+b_C)

the cell state at the current feature extraction time can be determined by the value of the memory gate, the value of the forgetting gate, the temporary cell state, and the cell state at the previous time:

C_t＝f_t*C_t-1+i_t*C_t’

o_tcalled the output gate, determines the value of the output, which can be determined by the hidden state at the previous time and the input word at the current feature extraction time:

o_t＝σ(W_o[h_t-1，x_t]+b_o)

the hidden layer state at the current feature extraction time can be determined by the cell state at the current feature extraction time and the input gate value at the current feature extraction time:

h_t＝o_t*tanhC_t

wherein, W and b are parameters obtained by the model through the training phase and the prediction phase.

Finally, a hidden layer state sequence { h } can be obtained₀，h₁，...，h_n-1}。

Referring to fig. 5a, a leftward LSTM and a rightward LSTM may be combined into a Bi-LSTM, which is a bidirectional long-short time memory network provided in the embodiments of the present application, the biltm includes two LSTMs, a rightward LSTM is located below the biltm, and a leftward LSTM is located above the biltm, and the two LSTMs do not affect each other.

For example, LSTM to the left inputs the text feature vector set forward in turn to get vector h_L0，h_L1，h_L2And sequentially reversely inputting the text feature vector group by the right LSTM to obtain a vector h_R0，h_R1，h_R2Splicing the two to obtain { [ h ]_L0，h_R2][h_L1，h_R1][h_L2，h_R0]I.e. h₀，h₁，h₂}。

(2.1) determining the previous feature extraction time of the current feature extraction time, acquiring the left hidden layer state of the previous feature extraction time, and calculating the left hidden layer state of the current feature extraction time according to the feature extraction time and the left hidden layer state;

and (2.2) determining the next feature extraction time of the current feature extraction time, acquiring the right hidden layer state of the next feature extraction time, and calculating the right hidden layer state of the current feature extraction time according to the feature extraction time and the right hidden layer state.

In step 2.1, the left hidden layer state at the previous moment is the left hidden layer state at the moment of inputting the current text feature vector at the previous time; in step 2.2, the right hidden state at the next time is the right hidden state at the next time when the current text feature vector is input.

Referring to fig. 5a, text feature information is input into a preset word segmentation model, the word segmentation model calculates a left hidden state at a current feature extraction time according to a left hidden state at a previous time, and then calculates a right hidden state at the current feature extraction time according to a right hidden state at a next time.

If the text characteristic information is that a preset word segmentation model is input for the first time, namely the current word segmentation model does not have a left hidden layer state at the previous moment and a right hidden layer state at the next moment, the left hidden layer state at the previous moment is a preset left hidden layer threshold value 0, and the threshold value is often 0; the hidden right state at the next time is a preset hidden right threshold, which can be preset by a technician, and besides, the threshold is often 0.

And (2.3) generating a part-of-speech feature vector according to the left hidden layer state and the right hidden layer state at the current feature extraction time.

The semantic feature vector C may be a combination of the hidden states h of the participle model, may also be a hidden state output at the current time, and may also be a state obtained by performing some transformation on all the hidden states, which is not limited herein.

In an embodiment, the part-of-speech feature vector is generated according to the text feature vector and the hidden layer state at the time of feature extraction of the word segmentation model, and may also be implemented by convolution operation, specifically, a Neural network model including a convolution layer, such as CNN (Convolutional Neural Networks), Res Net (deep residual error Networks), VGG (Visual Geometry Group) Net (network), and the principle of the convolution layer is described in the above embodiments and is not described again.

Referring to fig. 5a, the outputs of the Bi-LSTM are connected (or otherwise combined) to create a single output layer in place. In the simplest approach, this layer can be passed directly to a softmax, which creates a probability distribution over all tags, the most likely tag being selected as the part-of-speech tag. For the entity of the greedy decoding method named tag, decoding is not sufficient, because it does not allow us to impose strong constraints of neighboring tags. The tag I-PER must follow another I/b/b/b. Instead, the CRF layer is typically used for decoding at the top of the double lstm output using a Viterbi decoding algorithm, with the decoding result being represented as word units and corresponding word attributes.

In an embodiment, specifically, the HMM (Hidden Markov Model) may be used to perform sequence labeling on the text information, so as to obtain a word unit sequence corresponding to the text information. Assuming that X1-X3 represent hidden states, (X1, X2, X3) may correspond to (B, I, E) three kinds of tags, each state may generate a transmission signal (which may be denoted as y _ I), each transmission signal may correspond to a certain chinese character, and B _ ij may represent a probability that the ith state transmits a chinese character j, i.e., a transmission probability. There is a possibility of mutual transition between states, for example, a word may be converted from a single word to the end of a word depending on the context of the text message. The probability of transition from the ith state to the jth state is denoted here by a _ ij. The process of recognizing and dividing the text information into word unit sequences by using the HMM model is actually a process of predicting hidden states corresponding to the observation sequences (y1 to y4) (i.e., the chinese character sequences in the text information) when all a _ ij and b _ ij are known. In actual operation, the transmission probability and the transition probability may be obtained through corpus statistics or training, and under the condition that these conditions are known, the Viterbi decoding algorithm may be invoked to obtain the final word segmentation result (i.e., the hidden state sequence). The Viterbi decoding algorithm uses recursion to reduce the amount of computation and uses the context of the entire sequence to make a decision, so that a sequence containing "noise" can be well analyzed. In use, the Viterbi decoding algorithm computes a local probability for each cell in the trellis and includes a back pointer to indicate the most likely path to reach that cell. After the whole calculation process is completed, the most possible state is found at the termination time, and then the backtracking is carried out to the time t being 1 through the back pointer, so that the state sequence on the backtracking path is the most possible hidden state sequence.

In an embodiment, the extracting time information in the text information according to the word unit and the word attribute, and determining a schedule time according to the time information and a current time may specifically include the following steps:

Referring to FIG. 5b, the present embodiment employs a dependency tree to obtain semantic relationships between word units. Among them, a syntax analysis for the purpose of obtaining a local component is called dependency analysis (dependency parsing). The semantic dependency analysis is to analyze the association of each language unit in the sentence and display the semantic association in a dependency structure; the semantic dependency analysis aims at directly acquiring deep semantic information by spanning the constraint of the syntactic structure of the sentence surface layer.

Generally, the phrase structure tree is composed of three parts, namely a terminal node, a non-terminal node and a phrase tag. According to the branch grammar rule, a plurality of end nodes form a phrase which is used as a non-end node to participate in the next specification until the end. The structure of the dependency grammar has no non-terminal point, the semantic relation between words directly occurs to form a dependency pair, one of which is a core word also called a dominant word, and the other is a modified word also called a dependent word. Semantic relationships are represented by a directed arc, called a dependent arc. The direction of the dependent arcs is from the dependent words to the dominant words, and the opposite is certainly possible and can be uniformly expressed according to personal habits. The dependency syntax explains the syntax structure by analyzing the semantic relationship before the components in the language unit, and advocates that the core verb in the sentence is the central component which governs other components. But is not itself subject to any other constituent, all subject constituents being subject to a subject in some relationship.

In an embodiment, the obtaining the semantic relationship between the word units according to the word units and the word attributes may specifically include the following steps:

generating a sentence dependency relationship tree according to the word units and the word attributes corresponding to the text information;

performing rule matching on nodes in the sentence dependency relationship tree based on a preset dependency rule;

when the rule matching is successful, generating a semantic association by taking the node as a central structure;

determining a dependency between the word units based on the sentence dependency tree and the semantic association set.

The semantic association group can be expressed as a triple, and a triple generally comprises a dominant word/a dominant word, a dominant word/a slave word and a dependency relationship.

In one embodiment, generating a sentence dependency tree requires following conditions:

generating a son node of the dependency syntax of each word unit, and mainly storing the relation and the position of the corresponding son word unit;

generating a dependency structure of a parent-child array of each word unit, wherein the dependency structure is mainly used for recording the part of speech of the word, the part of speech of a parent node and the relationship among the parts of speech of the word and the parent node;

and circulating each word unit, finding out the main-predicate dynamic complement relation with the dynamic guest relation, the fixed language post-dynamic guest relation and the intermediary, extracting, and for extracting the words in the main guest, searching the words with the related dependency structures in the extracted words, and removing the unnecessary words.

In one embodiment, the dependency analysis results are referenced to FIG. 5c and the rule extraction results are referenced to FIG. 5 d.

After obtaining word units and word attributes, we perform labeling training on the parts of speech of the existing spoken language class (such as "today", "tomorrow", "next Monday", etc.). Specifically, a CNN/RNN + attention model can be adopted, and a loss function is adopted as a ranking loss, which is superior to a cross entropy. During training, each sample has two labels, namely a correct label y + and an error label c-, and m + and m-correspond to two margin (attributes).

The trained relation classification model input layer comprises word encoding and position encoding, vector representation of text information is generated by using 6 convolution kernels and max forcing, and the vector representation and semantic relation (category) vectors are subjected to dot product to calculate similarity to serve as a time information relation classification result.

At this time, the time information, namely the schedule time, which is exactly needed to establish the schedule can be obtained according to the current time and the part of speech operation.

104. And generating schedule information according to the schedule time and the text information.

The terminal can save the schedule time and the schedule content as schedule information.

In an embodiment, in order to prompt the user in advance, the terminal may set the prompt time according to the schedule time, for example, the prompt time may be set to be 5 minutes in advance, and in addition, the user may also set the prompt time by self-selection.

The terminal can save the schedule time, the schedule contents and the prompt time as schedule information.

105. And carrying out schedule setting on the terminal according to the schedule information.

In an embodiment, referring to fig. 2b, the performing schedule setting on the terminal according to the schedule information may specifically include the following steps:

displaying the schedule information;

Alternatively, the first and second electrodes may be,

displaying the schedule information;

The user can click the terminal interface to realize the confirmation operation, and can trigger the function of modifying the schedule information by pressing the terminal interface for a long time, for example, when the user presses the terminal interface for a long time, a selection frame of the time is displayed, the user can modify the time in the selection frame, and the user can edit the schedule content.

The schedule information comprises schedule time and schedule content, the schedule content can be the events planned to be done by the user, and the schedule time is the time for planning to implement the schedule content.

In one embodiment, the step of "performing schedule setting on the terminal" includes: and setting the terminal to prompt schedule information based on the schedule time.

For example, the user may set in advance a contact manner such as a mailbox or a telephone number, an instant messaging application account and the like for receiving a prompt, and the terminal may send schedule information to the user in a manner of mail, short message, telephone, instant messaging information to prompt the user.

For example, the terminal may prompt the user in the form of an alarm clock and simultaneously display schedule information in a terminal interface.

As can be seen from the above, the embodiment of the present application can acquire the voice information input by the user; recognizing the voice information to obtain text information corresponding to the voice information; separating the text information to obtain time information, and determining schedule time according to the time information and the current time; generating schedule information according to the schedule time and the text information; and carrying out schedule setting on the terminal according to the schedule information. By automatically identifying the time information in the voice information, the schedule can be conveniently and quickly set, and the user experience is improved.

According to the method described in the foregoing embodiment, the calendar setting device is specifically integrated in the terminal device, which will be described in further detail below.

Referring to fig. 2c, a specific process of the schedule setting method according to the embodiment of the present invention is as follows:

201. the terminal acquires voice information input by a user.

In an embodiment, a user may click a voice input control set in the terminal interface to trigger a voice recording function of the signal acquisition device. Referring to fig. 2b, the voice input control may be represented as a microphone icon.

In an embodiment, a user can wake up the voice assistant through a preset specific instruction, and when the voice assistant is woken up, the voice recording function of the signal acquisition device is triggered, and the terminal recognizes voice information through the voice assistant.

202. And the terminal sends the voice information to the server and receives the schedule time returned by the server based on the voice information.

The server is provided with a trained voice recognition model, a word segmentation model and a relation classification model, can recognize voice information to obtain corresponding text information, can also recognize and divide the text information to obtain word units and word attributes corresponding to the text information, determines time information in the text information according to the word units and the word attributes, and then determines schedule time according to current time and the time information.

For example, when the text message is "three-point fine arts class in the afternoon tomorrow" and is No. 9 month 7 today, the schedule time is No. 9 month 8 three-point afternoon.

In an embodiment, in order to continuously perfect the model in the server, the text information may be further processed to serve as a training sample, and the model is continuously trained and optimized. For example, private information may be removed and text information may be encrypted using an encryption algorithm into non-human interpretable symbols to avoid violating user privacy. Meanwhile, in order to reduce the training difficulty, a large amount of redundant information irrelevant to time can be removed, and the user identity and other information are blurred.

203. And the terminal generates schedule information according to the schedule time and the text information.

204. And prompting schedule information by the terminal based on the schedule time.

According to the setting of the user, when the prompting time is reached, the terminal automatically triggers the functions of the mail, the short message, the dialing, the alarm clock and the like, and respectively prompts the user and displays schedule information to the user in the forms of mail content, short message content, telephone voice and messages on a terminal interface.

As can be seen from the above, the embodiment of the present application can acquire the voice information input by the user; recognizing the voice information to obtain text information corresponding to the voice information; separating the text information to obtain time information, and determining schedule time according to the time information and the current time; generating schedule information according to the schedule time and the text information; and carrying out schedule setting on the terminal according to the schedule information. By automatic identification.

For example, as shown in fig. 3, the schedule setting apparatus may include an acquisition unit 301, a recognition unit 302, a separation unit 303, a generation unit 304, and a setting unit 305 as follows:

(1) an obtaining unit 301, configured to obtain voice information input by a user.

(2) The identifying unit 302 is configured to identify the voice information to obtain text information corresponding to the voice information.

In an embodiment, the identification unit 302 may specifically include an extraction subunit and an identification subunit, as follows:

the extraction subunit is used for extracting the audio characteristic information of the voice information;

and the identification subunit is used for acquiring the text information corresponding to the voice information according to the audio characteristic information of the voice information.

In an embodiment, the identifier unit may be specifically configured to:

(3) A separating unit 303, configured to separate time information from the text information, and determine a schedule time according to the time information and a current time.

In an embodiment, the separation subunit 303 may specifically include a dividing subunit and a separation subunit, as follows:

the dividing subunit is used for identifying and dividing the text information by adopting a preset word segmentation model to obtain a plurality of word units corresponding to the text information and determining word attributes corresponding to the word units;

and the separation subunit is used for extracting the time information in the text information according to the word unit and the word attribute.

In an embodiment, the separation subunit may be specifically configured to:

In an embodiment, the dividing subunit may be specifically configured to:

identifying word units in the text information;

In an embodiment, generating a part-of-speech feature vector according to the text feature vector and a hidden layer state at a preset word segmentation model feature extraction time may specifically include:

(4) And a generating unit 304, configured to generate schedule information according to the schedule time and the text information.

(5) A setting unit 305, configured to perform schedule setting on the terminal according to the schedule information.

In an embodiment, the setting unit 305 may specifically be configured to:

displaying the schedule information;

Alternatively, the first and second electrodes may be,

displaying the schedule information;

In an embodiment, the setting unit 305 may be further configured to: and setting the terminal to prompt schedule information based on the schedule time.

As can be seen from the above, in the embodiment of the present application, the obtaining unit may obtain the voice information input by the user; the voice information is identified by an identification unit, and text information corresponding to the voice information is obtained; separating the text information by a separation unit to obtain time information, and determining schedule time according to the time information and the current time; generating schedule information by a generating unit according to the schedule time and the text information; and the setting unit performs schedule setting on the terminal according to the schedule information. By automatically identifying the time information in the voice information, the schedule can be conveniently and quickly set, and the user experience is improved.

In addition, the embodiment of the application also provides computer equipment. Fig. 4 is a schematic diagram showing a structure of a computer device according to an embodiment of the present application, specifically:

the computer device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 4 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby monitoring the computer device as a whole. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The computer device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 via a power management system, so that functions of managing charging, discharging, and power consumption are implemented via the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The computer device may also include an input unit 404, the input unit 404 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application programs stored in the memory 402, thereby implementing various functions as follows:

acquiring voice information input by a user;

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

Therefore, according to the emotion type of the voice information, the user requirements can be deeply understood, reasonable response content can be selected, the emotion of the user can be pacified, and the user experience is improved.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application further provide a computer-readable storage medium, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to execute the steps in any one of the schedule setting methods provided in the embodiments of the present application. For example, the instructions may perform the steps of:

acquiring voice information input by a user;

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps in any schedule setting method provided in the embodiments of the present application, beneficial effects that can be achieved by any schedule setting method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

The schedule setting method, apparatus and storage medium provided by the embodiments of the present application are described in detail above, and the principles and embodiments of the present application are explained herein by applying specific examples, and the description of the above embodiments is only used to help understand the method and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A schedule setting method, comprising:

acquiring voice information input by a user;

2. The schedule setting method of claim 1, wherein the performing schedule setting on the terminal according to the schedule information comprises:

displaying the schedule information;

3. The schedule setting method of claim 2, wherein the schedule setting for the terminal according to the schedule information further comprises:

displaying the schedule information;

4. The schedule setting method of claim 1, wherein the schedule setting of the terminal comprises:

5. The schedule setting method of claim 1, wherein said separating time information from said text information and determining a schedule time based on said time information and a current time comprises:

and extracting time information in the text information according to the word unit and the word attribute, and determining schedule time according to the time information and the current time.

6. The schedule setting method of claim 5, wherein extracting time information in the text information according to the word unit and the word attribute, and determining a schedule time according to the time information and a current time comprises:

7. The schedule setting method according to claim 6, wherein said obtaining semantic relations between said word units according to said word units and said word attributes comprises:

8. The schedule setting method according to claim 5, wherein the identifying and dividing the text information by using a preset word segmentation model to obtain a plurality of word units corresponding to the text information and determining word attributes corresponding to the word units comprises:

identifying word units in the text information;

9. The schedule setting method according to claim 8, wherein the generating a part-of-speech feature vector according to the text feature vector and the hidden state at the time of extracting the feature of the preset word segmentation model comprises:

10. The schedule setting method according to claim 1, wherein the recognizing the voice message to obtain the text message corresponding to the voice message comprises:

extracting audio characteristic information of the voice information;

11. The schedule setting method according to claim 10, wherein the obtaining of the text information corresponding to the voice information according to the audio feature information of the voice information comprises:

12. A schedule setting apparatus, comprising:

the acquiring unit is used for acquiring voice information input by a user;

13. A storage medium having stored thereon a computer program, characterized by causing a computer to execute the schedule information setting method according to any one of claims 11 to 11 when the computer program runs on the computer.