CN112599125A

CN112599125A - Voice office processing method and device, terminal and storage medium

Info

Publication number: CN112599125A
Application number: CN202011401166.6A
Authority: CN
Inventors: 陈连军; 曾小辉; 莫云娟; 邢向南; 李凯; 董贯慧; 高旭峰
Original assignee: Faw Capital Holdings Ltd
Current assignee: Faw Capital Holdings Ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-04-02

Abstract

The invention relates to a voice office processing method, a voice office processing device, a voice office processing terminal and a voice office processing storage medium, and belongs to the technical field of intelligent office. The method comprises the following steps: when a voice office request is received, voice data in the voice office request is acquired; converting the voice data into text data; performing semantic recognition on the character data, and determining office items corresponding to the character data; and executing office operation corresponding to the voice office request according to the text data and office items corresponding to the text data. By adopting the invention, the intelligent office efficiency can be improved.

Description

Voice office processing method and device, terminal and storage medium

Technical Field

The invention relates to the technical field of intelligent office, in particular to a voice office processing method, a voice office processing device, a voice office processing terminal and a storage medium.

Background

The intelligent terminal becomes an indispensable part of people in study, life and work, and at present, the office field is also applied to the intelligent terminal for office processing, for example, a user can order an air ticket, download a file or open a video conference through the terminal.

In the process of implementing the invention, the inventor finds that the related art has at least the following problems:

when a user uses the intelligent terminal to work, various operations need manual operations of the user, and a lot of information, data, documents and the like need to be filled in by the user through typing and other modes, so that the operation is very complicated and limited, and the working processing efficiency of the user is lower.

Disclosure of Invention

The invention provides a voice office processing method, a voice office processing device, a voice office processing terminal and a storage medium, which can solve the problems of low office processing efficiency and the like of a user.

According to a first aspect of the embodiments of the present invention, there is provided a voice office processing method, including:

when a voice office request is received, voice data in the voice office request is acquired;

converting the voice data into text data;

performing semantic recognition on the character data, and determining office items corresponding to the character data;

and executing office operation corresponding to the voice office request according to the text data and office items corresponding to the text data.

Optionally, when a voice office request is received, acquiring voice data corresponding to the voice office request includes:

when a voice office request is received, starting voice acquisition operation;

and when a preset voice acquisition ending condition is met, ending the voice acquisition operation to obtain voice data corresponding to the voice operation processing request.

Optionally, the reaching of the preset voice collection end condition includes:

collecting preset finishing voice data; or the like, or, alternatively,

the execution duration of the voice acquisition operation reaches a preset duration.

Optionally, the performing semantic recognition on the text data and determining an office event corresponding to the text data includes:

performing word segmentation processing on the character data to obtain at least one word segmentation recognition result;

acquiring keyword data corresponding to a plurality of preset office items;

matching the word segmentation recognition result with the keyword data corresponding to the plurality of office items respectively to obtain the correlation degree of the word segmentation recognition result and each office item;

and determining the maximum correlation degree in the plurality of correlation degrees, and determining the office items corresponding to the maximum correlation degree as the office items corresponding to the character data.

Optionally, the performing word segmentation processing on the text data to obtain at least one word segmentation recognition result includes:

performing word segmentation processing on the character data to obtain at least one group of word segmentation data, wherein each group of word segmentation data comprises at least one word;

the matching the word segmentation recognition result with the keyword data corresponding to the plurality of office items respectively to obtain the relevancy of the word segmentation recognition result and each office item includes:

and matching at least one group of word segmentation data corresponding to the character data with the keyword data corresponding to each office item respectively to obtain the relevancy of each group of word segmentation data and each office item respectively.

Optionally, the executing, according to the text data and the office details corresponding to the text data, the office operation corresponding to the voice office request includes:

dividing corresponding information attributes of the word segmentation data corresponding to the maximum correlation degree respectively;

acquiring at least one preset information element of an office item corresponding to the text data;

matching at least one item of information element preset by the office affair with the information attribute corresponding to the word segmentation data one by one, and if each item of information element preset by the office affair has the information attribute matched with each other, executing office operation corresponding to the voice office request according to the information element preset by the office affair and the corresponding information attribute;

and if at least one information element without matched information attributes exists in the information elements preset in the office events, sending out prompt information of insufficient information.

Optionally, the determining the maximum correlation among the multiple correlations includes:

determining an initial maximum correlation degree in the plurality of correlation degrees, and comparing the initial maximum correlation degree with a preset maximum correlation degree threshold value;

determining the initial maximum correlation as a maximum correlation if the initial maximum correlation is greater than the maximum correlation threshold.

According to a second aspect of the embodiments of the present invention, there is provided a voice office processing apparatus including:

the voice office system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring voice data in a voice office request when the voice office request is received;

a conversion unit for converting the voice data into text data;

the recognition unit is used for carrying out semantic recognition on the character data and determining office items corresponding to the character data;

and the execution unit is used for executing the office operation corresponding to the voice office request according to the text data and the office items corresponding to the text data.

Optionally, the obtaining unit is configured to:

when a voice office request is received, starting voice acquisition operation;

Optionally, the identification unit is configured to:

acquiring keyword data corresponding to a plurality of preset office items;

Optionally, the identification unit is configured to:

According to a third aspect of the embodiments of the present invention, there is provided a terminal, including:

one or more processors;

a memory for storing the one or more processor-executable instructions;

wherein the one or more processors are configured to:

the method of the first aspect of the embodiments of the present invention is performed.

According to a fourth aspect of embodiments of the present invention, there is provided a non-transitory computer-readable storage medium, wherein instructions, when executed by a processor of a terminal, enable the terminal to perform the method of the first aspect of embodiments of the present invention.

According to a fifth aspect of embodiments of the present invention, there is provided an application program product, which, when running on a terminal, causes the terminal to perform the method of the first aspect of embodiments of the present invention.

The technical scheme provided by the embodiment of the invention can have the following beneficial effects:

in the invention, a user can control the terminal to perform corresponding office operation through voice, so that the employee experience is improved, voice response is scheduled at any time and any place, and a complex document application is not required to be filled; the intelligent voice office assistant is provided for the staff, a knowledge system of the full-function field is established, questions are asked through the voice in real time, and the question consultation results are fed back in real time. The communication and consultation cost of daily repetitive matters of the cross-department is effectively reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram illustrating a voice office processing method according to an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a voice office processing method according to an exemplary embodiment;

FIG. 3 is a block diagram illustrating a voice office processing apparatus according to an exemplary embodiment;

FIG. 4 is a block diagram illustrating a voice office processing apparatus according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The embodiment of the invention provides a voice office processing method which is realized by a terminal, wherein the terminal can be a smart phone, a desktop computer or a notebook computer and the like. The terminal at least comprises a CPU, a voice acquisition device and the like, wherein the CPU is used for processing corresponding voice office operations, the voice acquisition device is used for acquiring voice data, besides a basic algorithm required by office processing, a voice conversion algorithm, a word segmentation processing algorithm, a semantic recognition algorithm, a character correlation degree matching algorithm and the like are also required to be stored on the terminal so as to carry out corresponding office processing according to the acquired voice data.

Fig. 1 is a flowchart illustrating a voice office processing method according to an exemplary embodiment, which is used in a terminal, as shown in fig. 1, and includes the steps of:

step 101, when a voice office request is received, acquiring voice data in the voice office request;

step 102, converting voice data into character data;

103, performing semantic recognition on the character data, and determining office items corresponding to the character data;

and step 104, executing office operation corresponding to the voice office request according to the text data and office matters corresponding to the text data.

when a voice office request is received, starting voice acquisition operation;

and when the preset voice acquisition ending condition is met, ending the voice acquisition operation to obtain voice data corresponding to the voice operation processing request.

Optionally, reaching a preset voice collection end condition includes:

collecting preset finishing voice data; or the like, or, alternatively,

Optionally, performing semantic recognition on the text data, and determining an office event corresponding to the text data, including:

acquiring keyword data corresponding to a plurality of preset office items;

matching the word segmentation recognition result with keyword data corresponding to a plurality of office items respectively to obtain the correlation degree of the word segmentation recognition result and each office item;

Optionally, performing word segmentation processing on the text data to obtain at least one word segmentation recognition result, including:

matching the word segmentation recognition result with the keyword data corresponding to the plurality of office items respectively to obtain the correlation degree of the word segmentation recognition result and each office item, comprising the following steps:

Optionally, according to the text data and the office details corresponding to the text data, performing office operation corresponding to the voice office request, including:

dividing corresponding information attributes of the word segmentation data corresponding to the maximum correlation degree;

matching at least one item of information element preset by the office affair with the information attribute corresponding to the word segmentation data one by one, and if each item of information element preset by the office affair has the information attribute matched with the corresponding information attribute, executing office operation corresponding to the voice office request according to the information element preset by the office affair and the corresponding information attribute;

Optionally, determining the maximum correlation among the plurality of correlations includes:

if the initial maximum correlation is greater than the maximum correlation threshold, the initial maximum correlation is determined to be the maximum correlation.

Fig. 2 is a flowchart illustrating a voice office processing method according to an exemplary embodiment, which is used in a terminal, as shown in fig. 2, and includes the steps of:

step 201, when receiving a voice office request, starting a voice acquisition operation.

In a possible implementation manner, when the user uses voice to work, the user can start the voice office application on the terminal first. When a user opens the voice office application program or clicks the opening button in the voice office application program, the terminal receives a voice office request, and according to the voice office request, the terminal opens the voice acquisition device to perform voice acquisition operation, namely, the audio recording is started, and at the moment, the user can speak an office requirement by voice, for example, "help me to book an air ticket flying to Shenzhen in the morning in tomorrow", or "download a report" and the like.

And step 202, when a preset voice acquisition ending condition is reached, ending the voice acquisition operation to obtain voice data corresponding to the voice operation processing request.

In a feasible implementation manner, after the user finishes speaking the office requirement of the user, the terminal may end audio recording according to a preset voice acquisition end condition, and the recorded audio data is the voice data to be processed.

Optionally, reaching the preset voice collection end condition may include:

1) and collecting preset finishing voice data. The user can preset finishing voice data, such as 'finishing' and 'over', the terminal can perform voice recognition in real time when acquiring the voice data, and when recognizing that the user speaks the preset finishing voice data, the terminal stops voice acquisition.

2) The execution duration of the voice acquisition operation reaches a preset duration. Therefore, the user can not finish the voice acquisition by himself, but the terminal keeps the audio recording state, the terminal starts to time when recording, and when the preset time length is reached, the terminal automatically stops the voice acquisition.

It should be noted that, in addition to the two preset voice collection end conditions, other end conditions that can be implemented may also be included, for example, the user clicks an end button, and the like, which is not limited in the present invention.

Step 203, converting the voice data into text data.

In a feasible implementation manner, the terminal calls a pre-stored voice-to-text algorithm to convert the collected voice data into text data for subsequent text recognition.

And 204, performing word segmentation processing on the character data to obtain at least one word segmentation recognition result.

In a feasible implementation manner, the terminal calls a pre-stored word segmentation processing algorithm to perform word segmentation processing on the character data obtained in the above step.

Optionally, word segmentation processing is performed on the text data to obtain at least one group of word segmentation data, where each group of word segmentation data includes at least one word.

In a possible implementation manner, when performing word segmentation processing on text data, in order to improve accuracy, multiple word segmentation recognition results may be obtained, where each word segmentation recognition result includes a group of word segmentation data, and each group of word segmentation data includes at least one word obtained after word segmentation processing, that is, the multiple word segmentation recognition results are obtained after word segmentation is performed on the same segment of text data in different ways. For example, the literal data is "help me orders an air ticket flying to Shenzhen", one possible participle result is "help me/order/one/flying to/Shenzhen", and another possible participle result is "help/me/order/flying to/Shenzhen/air ticket".

Step 205, obtaining keyword data corresponding to each of a plurality of preset office items.

The office items refer to specific office categories, and for example, the office items may include travel booking tickets, downloading files, uploading data, sending messages, viewing data, and the like.

In a possible implementation manner, keyword data is preset for each office event, and the keyword data is used for corresponding to the voice data of the user, so as to determine the office event required by the user. For example, the office event is "travel ticket", the keyword data of the office event may be set to "ticket", "flight", "business trip", or the like, and the keyword data of each office event may be plural.

And step 206, matching the word segmentation recognition results with the keyword data corresponding to the plurality of office items respectively to obtain the correlation degree of the word segmentation recognition results and each office item.

In a possible implementation manner, in order to determine which office event the user speaks, the obtained segmentation recognition results are respectively subjected to relevancy matching with the keyword data corresponding to each office event according to a word relevancy matching algorithm, so as to obtain the relevancy of the segmentation recognition results and each office event.

Optionally, since a plurality of segmented word recognition results may exist in one segment of text data, that is, there are multiple sets of segmented word data, when determining the degree of correlation, at least one set of segmented word data corresponding to the text data may be respectively matched with the keyword data corresponding to each office item, so as to obtain the degree of correlation between each set of segmented word data and each office item.

And step 207, determining the maximum correlation degree in the plurality of correlation degrees, and determining the office item corresponding to the maximum correlation degree as the office item corresponding to the character data.

In a possible embodiment, the office affairs corresponding to the maximum relevancy may be closest to the requirement of the user among the obtained multiple relevancy, and therefore, the office affairs corresponding to the maximum relevancy is determined as the office affairs corresponding to the text data.

Optionally, in order to improve the accuracy of the identification, the obtained maximum correlation may be determined, an initial maximum correlation is determined from the multiple correlations, and the initial maximum correlation is compared with a preset maximum correlation threshold. If the initial maximum correlation is greater than the maximum correlation threshold, the initial maximum correlation is determined to be the maximum correlation. If the initial maximum correlation degree is smaller than the maximum correlation degree threshold value, the requirement spoken by the user and the correlation degree of the preset office affairs are in a very low state, if the office affairs corresponding to the current initial maximum correlation degree are processed, the requirement is probably not consistent with the requirement of the user, at this moment, a prompt can be sent to the user to remind the user to re-enter voice or perform office processing in other modes.

And 208, dividing corresponding information attributes of the word segmentation data corresponding to the maximum correlation degree.

And step 209, acquiring at least one preset information element of the office affair corresponding to the text data.

The information attribute is obtained by identifying the word segmentation result according to a preset semantic identification algorithm, for example, the information attribute divided by the word segmentation result "one piece" is "number".

The information elements can be preset according to office matters, the information elements represent necessary and basic requirements of one office matter, if the text data comprises all the information elements of the office matters, the terminal can only operate the corresponding office matters according to the text data, if the text data does not comprise all the information elements of the office matters, the information elements which indicate that the requirement spoken by the user lacks the office matters are described, and the terminal can not operate the corresponding office matters

In a feasible implementation manner, the terminal calls a preset semantic recognition algorithm to divide the information attribute of the participle data of the character data, namely, semantically recognizes each participle, and marks the information attribute of each participle according to the semantic recognition result. For example, the word segmentation result is "help me/book/one piece/fly to/Shenzhen// air ticket", the office item corresponding to the word segmentation result may be "travel book air ticket", the information element corresponding to the office item may be "person, time, place, scheduled travel mode, number", and the information attribute division result of the word segmentation result is:

personnel: i/time: tomorrow/site: shenzhen/reservation travel mode: air ticket/quantity: one sheet.

And step 210, matching at least one item of information element preset in the office affair with the information attribute corresponding to the word segmentation data one by one.

In a feasible implementation manner, at least one information element preset for the office affair is matched with the information attribute corresponding to the participle data one by one according to a semantic recognition algorithm, and if each information element preset for the office affair has an information attribute matched with each other, office operation corresponding to the voice office request is executed according to the information element preset for the office affair and the corresponding information attribute. Taking the example in step 209 above as an example, each information element corresponding to the office event has the information attribute division result of the word segmentation result corresponding to it, which indicates that the information elements corresponding to the office event "travel booking flight ticket" all have, the terminal invokes the function module corresponding to the office event "travel booking flight ticket", and automatically fills in the information elements provided by the user, and then displays the interface filled in the user information to the user, so that the user can confirm whether the information is correct.

If at least one information element without matched information attributes exists in the information elements preset in the office events, sending out prompt information with insufficient information to remind a user to re-enter necessary information; or the terminal calls a functional module corresponding to the office item 'travel booking air ticket' and automatically fills in the information provided by the user, the information which is not provided by the user is temporarily vacant, and an interface filled with partial information is displayed to the user, so that the user can fill in the residual information.

It should be noted that the speech conversion algorithm, the word segmentation processing algorithm, the semantic recognition algorithm, and the word relevancy matching algorithm used in the above steps are all conventional technical means used in the prior art, and detailed description of the specific implementation principle and the using method thereof is omitted in the present invention.

In an exemplary embodiment, there is also provided a voice office apparatus, as shown in fig. 3, including:

an obtaining unit 310, configured to obtain voice data in a voice office request when the voice office request is received;

a conversion unit 320, configured to convert the voice data into text data;

the recognition unit 330 is configured to perform semantic recognition on the text data, and determine an office event corresponding to the text data;

and the execution unit 340 is configured to execute an office operation corresponding to the voice office request according to the text data and the office details corresponding to the text data.

Optionally, the obtaining unit 310 is configured to:

when a voice office request is received, starting voice acquisition operation;

Optionally, the identifying unit 330 is configured to:

acquiring keyword data corresponding to a plurality of preset office items;

Optionally, the identifying unit 330 is configured to:

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 402 comprising instructions, executable by the processor 401 of the apparatus to perform the voice office processing method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, an application program product is also provided, which includes one or more instructions executable by the processor 401 of the apparatus to perform the voice office processing method described above.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A speech office processing method, characterized in that the method comprises:

converting the voice data into text data;

2. The voice office processing method according to claim 1, wherein the acquiring, when receiving a voice office request, voice data corresponding to the voice office request includes:

when a voice office request is received, starting voice acquisition operation;

3. The voice office processing method according to claim 2, wherein the reaching of the preset voice collection end condition includes:

collecting preset finishing voice data; or the like, or, alternatively,

4. The speech office processing method according to claim 1, wherein the performing semantic recognition on the text data and determining the office event corresponding to the text data comprises:

acquiring keyword data corresponding to a plurality of preset office items;

5. The speech office processing method according to claim 4, wherein the performing word segmentation processing on the text data to obtain at least one word segmentation recognition result comprises:

6. The voice office processing method according to claim 5, wherein the executing of the office operation corresponding to the voice office request according to the text data and the office details corresponding to the text data includes:

7. The speech office processing method of claim 4, wherein the determining a maximum degree of correlation among the plurality of degrees of correlation comprises:

8. A speech office processing apparatus, characterized in that the apparatus comprises:

a conversion unit for converting the voice data into text data;

9. The speech office processing apparatus according to claim 8, wherein the acquiring unit is configured to:

when a voice office request is received, starting voice acquisition operation;

10. The speech office processing apparatus of claim 8, wherein the recognition unit is configured to:

acquiring keyword data corresponding to a plurality of preset office items;