CN108681529B

CN108681529B - Multi-language text and voice generation method of flow model diagram

Info

Publication number: CN108681529B
Application number: CN201810250865.1A
Authority: CN
Inventors: 曾庆田; 原桂远; 段华; 刘聪; 李超; 鲁法明; 倪维健; 周长红; 赵华; 林泽东; 刁秀丽; 温彦; 张峰
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2018-03-26
Filing date: 2018-03-26
Publication date: 2022-01-25
Anticipated expiration: 2038-03-26
Also published as: CN108681529A

Abstract

The invention discloses a multilingual text and voice generation method of a flow model diagram, and belongs to the field of flow mining. Firstly, identifying model elements, model node texts and model directed edges in a process model graph, and storing an identified process model as a standard process model file; then, analyzing the model element text by using the multi-language semantic dependency analysis model, analyzing the model structure by using an RPST algorithm, and storing the model element text and the process model structure information by using a flow structure tree with annotations; and then dividing the annotated flow structure tree according to the quantity and the structure complexity of the text information, then generating a multilingual text of the flow model from the annotated flow structure tree by using a deep syntax tree, and finally generating the multilingual voice of the flow model from the multilingual text. The method and the device can correctly identify the flow model in the flow model diagram, correctly analyze the text and the structure of the flow model, and generate the text with correct grammar and the voice with correct pronunciation.

Description

Multi-language text and voice generation method of flow model diagram

Technical Field

The invention belongs to the field of process mining, and particularly relates to a multilingual text and voice generation method of a process model diagram.

Background

By consulting the existing invention patents and documents, no method capable of automatically generating multilingual texts and voices from the flow model diagram exists, and the existing research focuses on three aspects of flow model diagram recognition, flow model text generation and voice synthesis.

The identification of the flow model graph researches how to identify the flow model structure from the picture, and the information of model nodes, model node texts, model directed edges and the like is obtained by using the technologies of knowledge representation, graph matching, symbol identification, geometric reasoning, semantic extraction and the like.

The text generation of the process model researches how to generate an explanatory text of the process model, the text of the process model needs to consider how to analyze the structure of the process model, how to use the text to express different structures of the process model, how to analyze text information of model elements, how to optimize the generated text and the like.

The speech synthesis technology can convert any text information into standard and smooth speech in real time, relates to a plurality of disciplines such as acoustics, linguistics, digital signal processing, computer disciplines and the like, and generates artificial speech through language processing, rhythm processing and acoustic processing.

By studying the existing results, the method for generating the multiple languages and the voice from the flow model diagram is not studied. In the prior art, different directions are respectively researched, so that input and output formats of different modules are different, for example, a flow model diagram is identified to obtain a structure of a flow model and is stored in a self-defined directed graph format, and text generation of the flow model researches how to generate a text from the flow model, and the text is input into a standard flow model file. The invention provides a method for generating multiple languages and voices from a flow model diagram, which overcomes the existing problems. Therefore, the technology and thought proposed by the invention are innovative in the whole view and cannot be realized by the existing method.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides the automatic identification and understanding method of the flow model diagram, which is reasonable in design, overcomes the defects of the prior art and has good effect.

In order to achieve the purpose, the invention adopts the following technical scheme:

a multilingual text and speech generating method of a flow model diagram adopts a flow model diagram recognition and understanding module, a flow model annotated flow structure tree generating module and a flow model multilingual text and speech generating module;

the flow model graph identification and understanding module is configured to identify model elements, node texts and directed edges in the flow model graph, and finally store the identified flow model as a standard flow model file;

a flow model annotated flow structure tree generation module configured to analyze model element text information using multilingual semantic dependency analysis, multilingual translation, translation result screening, and cross-language grammar structure adjustment, analyze a flow model structure using an RPST (the refined flow structure tree) algorithm, and then store the flow structure information and the model element text information using the flow structure tree annotated;

the multi-language text and voice generation module of the flow model is configured to divide the flow structure tree with the annotation, generate a multi-language text from the flow structure tree with the annotation, and finally convert the multi-language text into understandable and fluent multi-language voice by using a voice synthesis technology;

the method for generating the multilingual text and the voice of the flow model diagram comprises the following steps:

step 1: identifying and understanding a flow model diagram;

step 2: generating a flow structure tree with annotations of the flow model;

and step 3: and generating multi-language text and voice of the process model.

Preferably, in step 1, the method specifically comprises the following steps:

step 1.1: identifying model elements; identifying model elements including activities, tasks, events, gateways and arrows in the flow model diagram;

firstly, constructing a basic primitive template of a model element, wherein the primitive template comprises a picture template, a model element type, a model element width and a model element height; then sliding the basic primitive template in the flow model diagram, calculating the similarity of the overlapping area of the picture template and the flow model diagram, selecting a plurality of areas with the highest similarity as a model element identification result, and removing a repeated identification area and an error identification area from the identification result;

step 1.2: recognizing a model node text;

acquiring the positions and sizes of all nodes containing texts, cutting a small picture of an area where the acquired nodes are located by using the picture, and finally identifying characters in the picture by using an OCR (Optical Character Recognition) Character identification technology, namely the texts of the model nodes;

step 1.3: identifying the directed edges of the model nodes;

performing gray processing on the flow model graph to obtain a gray value of each pixel point in the flow model graph, generating a gray value matrix of the flow model graph, and finally traversing in the gray value matrix to find a starting node and an ending node of a directed edge according to the positions of the arrows and the positions of the model nodes;

step 1.4: storing the process model;

in the process model storage, storing the identified process model as a standard process model file by using an xml (EXtensible Markup Language) read-write library tinyxml (xml analysis tool) according to the model type in the process model picture and the standard process model file standard;

inputting: identifying a model node set and a directed edge set in the flow model graph;

and (3) outputting: a standard process model file;

the method specifically comprises the following steps:

step 1.4.1: setting a stored file name and a coding mode;

step 1.4.2: writing information of the process model nodes into the standard file, setting node types, node identifications and node texts when the information of the process model nodes is written, and simultaneously storing input edge information and output edge information of the nodes;

step 1.4.3: writing edge information into the standard file, and setting edge identification, input nodes and output nodes when the edge information of the flow model is written;

step 1.4.4: and outputting a standard process model file.

Preferably, in step 2, the method specifically comprises the following steps:

step 2.1: preprocessing a flow model;

recognizing the languages of the model element texts, constructing a multilingual text template which describes the flow model and comprises a selection structure, a concurrence structure and a cycle structure, and constructing multilingual expressions of professional terms in each field in the flow model;

step 2.2: analyzing a model element text;

obtaining model element text information, analyzing a text by using a multi-language semantic dependency technology to obtain a subject, a verb, an object and a clause, translating the text information by using a multi-language translation technology, adjusting and optimizing a translation result by using a translation result screening technology and a cross-language grammar structure, and generating the model element multi-language text information;

step 2.3: analyzing a model structure;

acquiring the structure of a flow model, traversing the flow model structure, dividing the flow model structure by using an RPST algorithm, and generating a flow structure tree;

step 2.4: generating a flow structure tree with annotations;

and adding the multilingual text information analyzed by the model element text into the flow structure tree to generate the flow structure tree with the annotation.

Preferably, in step 3, the method specifically comprises the following steps:

step 3.1: dividing the flow structure tree with the annotations;

dividing the annotated flow structure tree into a plurality of subtrees according to the length and the structure complexity of text information in the annotated flow structure tree, and ensuring that the text length generated by each subtree does not exceed the longest length generated by voice;

step 3.2: generating a multi-language text;

traversing the annotated flow structure tree, organizing multi-language text information by using a depth syntax tree, describing a flow model structure by using a multi-language sentence template, and then carrying out subject aggregation, verb aggregation and object aggregation on the generated multi-language text to generate the multi-language text of the flow model;

step 3.3: generating multi-language voice;

and setting text, language and speed parameters of the voice synthesis by using a voice synthesis technology, and then generating multi-language voice of the flow model from the multi-language text.

Preferably, in step 3.1, the dividing of the annotated flow structure tree specifically includes the following steps:

inputting: an annotated flow structure tree of the flow model;

and (3) outputting: a sub-tree linked list of the flow model with an annotated flow structure;

step 3.1.1: traversing all child nodes of the Anoctree of the annotated flow structure tree, and storing the child nodes into a fork of the temporary annotated flow structure tree for the traversed nodes according to the sequence of the child nodes in the Anoctree;

step 3.1.2: counting the length of the text information in forest and the number of complex structures; when the number of text messages in forest exceeds 200 or the number of contained complex structures exceeds 10, storing forest in a division result linked list resolveResult, and simultaneously reinitializing forest for later constructing a sub-tree with an annotated flow structure.

Preferably, in step 3.3, the multilingual speech generation specifically includes the following steps:

inputting: a text chain table and a text language of the flow model;

and (3) outputting: outputting the flow model text by voice;

step 3.3.1: initializing conversion parameters of Voice RSS (speech synthesis tool);

step 3.3.2: setting file format, audio format, SSML (Speech Synthesis Markup Language) text format, Speech speed and Language parameters;

step 3.3.3: the method provided by Voice RSS is used to convert text to speech and the file-writing method is used to generate a speech file.

The invention has the following beneficial technical effects:

the process model storage technology comprises the following steps: the invention stores the recognition result of the flow model diagram as a standard flow model file, such as a BPMN flow model file of BPML type and a Petri network flow model file of PNML type. The standard process model file contains standard business modeling symbols, and compared with a user-defined process model structure, the process model is stored as the standard process model file, so that the expansibility of a system can be improved, and the cross-platform availability of the process model can be improved.

The annotated flow structure tree partitioning technology comprises the following steps: the invention provides a method for dividing a flow structure tree with annotations according to the quantity of text information and the structural complexity, which can generate a plurality of subtrees of the flow structure tree with annotations, generate the segmented expression of a flow model and solve the problem that the length of the text is limited by the synthesis of the speech.

Multilingual speech generation techniques: the invention uses the speech synthesis technology to generate the speech of the multi-language text, so that the non-service personnel can understand the flow model diagram through two media of the text and the speech, and the non-service personnel can understand the flow model more easily.

Drawings

Fig. 1 is a basic principle diagram of the present invention.

FIG. 2 is a flow model diagram.

Fig. 3 is a schematic diagram of a process model identification result.

FIG. 4 is a schematic diagram of a stored result of the process model.

FIG. 5 is a schematic diagram of an annotated flow structure tree of the flow model.

FIG. 6 is a diagram illustrating the partitioning result of the structure tree with annotation process.

FIG. 7 is a diagram illustrating the Chinese text generation result of the flow model.

Fig. 8 is a schematic diagram of an english text generation result of the flow model.

Fig. 9 is a schematic diagram of a french text generation result of the flow model.

FIG. 10 is a diagram illustrating attribute information of a first Chinese text corresponding to a speech.

FIG. 11 is a diagram illustrating attribute information of a second segment of Chinese text corresponding to a speech.

FIG. 12 is a diagram illustrating attribute information of a speech corresponding to a third section of Chinese text.

Detailed Description

The invention is described in further detail below with reference to the following figures and detailed description:

as shown in FIG. 1, the present invention comprises a flow model diagram recognition and understanding module, a flow model annotated flow structure tree generation module, and a flow model multilingual text and speech generation module. Starting from a process model diagram, firstly identifying model elements, model node texts and model directed edges in the diagram, and then storing an identified process model as a standard process model file; then, analyzing a flow model structure by using multilingual semantic dependency analysis model element text information and an RPST algorithm, storing the flow model structure analysis result by using a flow structure tree, and then storing the multilingual model element text information into the flow structure tree to generate a flow structure tree with annotations; when generating the multilingual text of the flow model, dividing the flow structure tree with the annotation into a plurality of subtrees according to the quantity and the structural complexity of the text information, then generating the multilingual text of the flow model from the flow structure subtree with the annotation by using the deep syntax tree, finally setting the text, language, speed and other parameters generated by voice, and generating the multilingual voice of the flow model by using a voice synthesis technology. Therefore, the invention provides detailed function modules from the function point of view and provides a detailed implementation technical scheme for each function module based on the basic content of the scheme.

1. Flow model diagram identification and understanding module

The identification and understanding module of the flow model graph mainly identifies model elements, model node texts and model directed edges, and stores the identified flow model as a standard flow model file. The module mainly comprises model element identification, model node text identification, model directed edge identification and process model storage.

In the model element identification, primitive templates of model elements such as activities, events, gateways, arrows and the like are constructed by researching basic composition units of a flow model diagram, and table 1 is information of a part of primitive templates, and each primitive template comprises information such as a primitive picture, an element type, an element width, an element height and the like. And sliding the primitive template in the flow model diagram, calculating the similarity of the primitive template and each area of the flow model diagram by an image similarity calculation method, then selecting a plurality of areas which are most similar from all the areas, removing the identification result with low similarity as a repeated identification area when the two identification result areas are very close to each other, removing the identification result as an error identification area from the identification result when the frame in the identification result area is incomplete or does not contain any symbol pixel, and finally taking the residual area as the identification result of the model element.

TABLE 1 primitive templates for model nodes

In the model node text recognition, the position and size information of all model nodes containing texts is obtained from the model element recognition result, a flow model graph is cut by using an image cutting technology, a small image of the region where the model nodes are located is obtained, and then the text information in the small image of the model nodes, namely the texts of the model nodes, is recognized by using an OCR character recognition technology.

In the model directed edge recognition, firstly, the flow model graph is subjected to gray processing to obtain the gray value of each pixel point in the flow model graph, and a gray value matrix of the flow model graph is generated. And finally, traversing from the gray value matrix according to the arrow position, the directed edge end point and other information to find the starting node of the directed edge.

In the process model storage, according to the process model type and the standard process model file standard, using an EXtensible Markup Language (xml) read-write library tinyxl to store the identified process model as a standard graph annotation file, wherein the BPMN process model storage method is as shown in algorithm 1, and the algorithm 1 inputs the model node set and the directed edge set identified in the process model graph and outputs the standard process model file. Firstly, setting a stored file name and a coding mode by an algorithm (lines 1-3), then writing node information into a standard file by the algorithm (lines 5-19), when writing a flow model node, setting a node type, a node identifier and a node text (lines 6-9), simultaneously storing input edge and output edge information of the node (lines 11-19), finally writing edge information into the standard file by the algorithm (lines 20-24), and when writing the edge information of the flow model, setting the edge identifier, the input node and the output node.

2. Annotated flow structure tree generation module of flow model

The annotated flow structure tree generation module of the flow model mainly analyzes the structure and the text information of the flow model and generates an annotated flow structure tree of the flow model. The module mainly comprises the steps of flow model preprocessing, model text analysis, model structure analysis and annotated flow structure tree generation.

In the process of flow model preprocessing, the language of a model element text is determined by four methods of searching for a special letter or letter combination, searching for the type and the number of the variable phonetic symbols, searching for a special grammar vocabulary and searching for a special punctuation symbol, then a multi-language text template capable of correctly describing the flow model structure is constructed, and finally a multi-language expression mode of professional terms is collected and sorted.

In the model text analysis, firstly, the text information in the process model is obtained, then the text is analyzed through the multi-language semantic dependency technology to obtain information including a subject, a verb, an object, a clause and the like, then the text information is translated through the multi-language translation technology, and the translation result is adjusted and optimized through the translation result screening technology and the cross-language grammar structure to generate the multi-language text information of the model element text.

In the model structure analysis, the structure of a flow model is firstly obtained, then the flow model structure is traversed, the RPST algorithm is used for dividing the flow model structure, and the flow structure tree is used for storing the divided model structure information. And finally, storing the result of the model text analysis into the process structure tree to generate the annotated process structure tree of the process model.

3. Multi-language text and voice generation module of process model

The method mainly comprises the steps of generating the multi-language text and the multi-language voice of the flow model in a multi-language text and voice generating module of the flow model, wherein the module mainly comprises the steps of dividing a flow structure tree with annotations, generating the multi-language text and generating the multi-language voice.

In the process structure tree division with annotation, the process structure tree with annotation is divided according to the quantity and the structure complexity of the text information, and the process structure tree with annotation of the process model is divided into a plurality of process structure subtrees with annotation. The division method of the annotated flow structure tree is shown as algorithm 2, wherein algorithm 2 inputs the annotated flow structure tree of the flow model and outputs an annotated flow structure subtree linked list resolveResult of the flow model. The algorithm first traverses all children of the annotated flow structure tree annoTree, and for the traversed nodes, the algorithm stores the children into the temporal annotated flow structure tree forest according to the sequence of the children in the annoTree, and meanwhile, the algorithm counts the length of text information and the number of complex structures in the forest (lines 12-14). When the number of text messages in forest exceeds 200 or the number of contained complex structures exceeds 10, the algorithm stores forest into resolveResult, and simultaneously reinitializes temporary forest (lines 6-10) to prepare for constructing a subtree with an annotated flow structure later.

The flow structure tree division divides the flow structure tree with the annotations of the flow model into a plurality of subtrees, when the multi-language text is generated by the flow structure subtrees with the annotations, the deep syntax tree is used for organizing text information to generate the multi-language text with correct syntax, meanwhile, the multi-language template is used for describing the structure of the flow model to generate the multi-language text of the flow model, and after the multi-language text is generated, the multi-language text is processed by subject aggregation, verb aggregation and object aggregation to generate the multi-language text with compact structure and correct syntax.

In multi-lingual speech generation, the multi-lingual speech of the flow model is generated using the speech synthesis technique provided by Voice RSS. Because algorithm 2 partitions the annotated flow structure tree into subtrees, the generated multilingual text is segmented. The algorithm 3 generates multi-language voice of the flow model, the algorithm 3 inputs a text chain table and a text language of the flow model, and outputs the voice output of the flow model. For each piece of text input, algorithm 3 first initializes the conversion parameters of Voice RSS (lines 2-3), then the algorithm sets the parameters of file format, audio format, SSML text format, speech speed, language type, etc. (lines 4-9), finally converts the text to speech using the method provided by Voice RSS and generates a speech file using the file-writing method (lines 10-14).

The process model storage technology comprises the following steps: the invention provides a process model storage technology, which stores the identification result of a process model diagram into a standard process model file, and compared with a user-defined directed graph, the standard process model file can complete the structuralization and the documentation of a process structure. Because the model storage technique stores the models as standard files by model type, it is also convenient for non-business or business personnel to use other tools to analyze, manipulate and simulate the process model.

The annotated flow structure tree partitioning technology comprises the following steps: the invention provides a dividing technology of a flow structure tree with annotations, which can divide the flow structure tree with annotations into a plurality of subtrees according to the length of text information and the structural complexity, and then generate a sectional text of a flow model.

Multilingual speech generation techniques: the invention provides a multilingual speech generation technology of a process model, which can generate explanatory speech of the process model, so that non-business personnel can understand the process model not only through texts but also through speech explanation, and the way for the non-business personnel to understand the process model is increased.

The invention is proved to be feasible through experiments, simulation and use, and how the result is

The invention takes a BPMN flow model as an example, and firstly identifies and understands a flow model diagram, then constructs a flow structure tree with annotations and finally generates multilingual text and voice of the flow in an experimental mode. For the flow model shown in fig. 2, the flow model diagram identification and understanding method is used, and the identification result is shown in fig. 3, and it can be seen from fig. 3 that the flow model diagram identification and understanding method can accurately identify the flow model. The identified process model is stored as a standard process model file using process model storage techniques, the process model file being shown in FIG. 4. The process model structure is analyzed by using the RPST algorithm, the process model text information is analyzed by using the multi-language semantic dependency, the generated annotated process structure tree of the process model in FIG. 2 is shown in FIG. 5, the division result of the annotated process structure tree is shown in FIG. 6, and it can be seen from FIG. 6 that the text information quantity and the structure complexity in each tree are similar. Fig. 7 shows a chinese text generated from the structure tree with annotation flow, fig. 8 shows an english text, fig. 9 shows a french text, and the chinese text is three sections, and fig. 10, fig. 11, and fig. 12 show attribute information of speech generated by using the speech generation method to generate three sections of chinese text.

It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.

Claims

1. A multilingual text and speech generating method of a flow model diagram is characterized in that: adopting a flow model diagram recognition and understanding module, a flow model annotated flow structure tree generation module and a flow model multi-language text and voice generation module;

a annotated flow structure tree generation module of the flow model configured to parse the flow model structure using the multilingual semantic dependency analysis, multilingual translation, translation result screening, and cross-language grammar structure adjustment parsing model element text information, using the RPST algorithm parsing the flow model structure, and then storing the flow structure information and the model element text information using the annotated flow structure tree;

the system comprises a multi-language text and voice generation module of a flow model, a flow structure tree with annotations, a voice synthesis module and a flow analysis module, wherein the multi-language text and voice generation module is configured to divide the flow structure tree with the annotations, generate a multi-language text from the flow structure tree with the annotations and finally convert the multi-language text into multi-language voice by using a voice synthesis technology;

step 1: identifying and understanding a flow model diagram;

step 2: generating a flow structure tree with annotations of the flow model;

and step 3: generating multi-language text and voice of the process model;

in step 1, the method specifically comprises the following steps:

step 1.2: recognizing a model node text;

acquiring the positions and sizes of all nodes containing texts, cutting the small pictures of the areas where the nodes are located by using the pictures, and finally recognizing characters in the pictures by using an OCR character recognition technology, namely the texts of the model nodes;

step 1.3: identifying the directed edges of the model nodes;

step 1.4: storing the process model;

in the process model storage, storing the identified process model as a standard process model file by using an xml read-write library tinyxml according to the model type in the process model picture and the standard of the standard process model file;

and (3) outputting: a standard process model file;

the method specifically comprises the following steps:

step 1.4.1: setting a stored file name and a coding mode;

step 1.4.4: outputting a standard process model file;

in step 3, the method specifically comprises the following steps:

step 3.1: dividing the flow structure tree with the annotations;

step 3.2: generating a multi-language text;

step 3.3: generating multi-language voice;

2. The method of multi-lingual text and speech generation for flow model charts of claim 1, wherein: in the step 2, the method specifically comprises the following steps:

step 2.1: preprocessing a flow model;

step 2.2: analyzing a model element text;

step 2.3: analyzing a model structure;

step 2.4: generating a flow structure tree with annotations;

3. The method of multi-lingual text and speech generation for flow model charts of claim 1, wherein: in step 3.1, the dividing of the annotated flow structure tree specifically includes the following steps:

inputting: an annotated flow structure tree of the flow model;

step 3.1.1: all child nodes of the flow structure tree with the annotations are traversed, and for the traversed nodes, the child nodes are stored in the flow structure tree with the annotations temporarily according to the sequence of the traversed nodes in the flow structure tree with the annotations;

step 3.1.2: counting the length of text information and the number of complex structures in the temporary annotated flow structure tree; and when the number of text information in the temporary annotated flow structure tree exceeds 200 or the number of the contained complex structures exceeds 10, storing the temporary annotated flow structure tree into a division result linked list, and simultaneously, re-initializing the temporary annotated flow structure tree to prepare for constructing a sub-tree with an annotated flow structure later.

4. The method of multi-lingual text and speech generation for flow model charts of claim 1, wherein: in step 3.3, the multilingual speech generation specifically includes the following steps:

inputting: a text chain table and a text language of the flow model;

and (3) outputting: outputting the flow model text by voice;

step 3.3.1: initializing a conversion parameter of Voice RSS;

step 3.3.2: setting file format, audio format, SSML text format, speech speed and language parameters;