WO2020232864A1 - Data processing method and related apparatus - Google Patents

Data processing method and related apparatus Download PDF

Info

Publication number
WO2020232864A1
WO2020232864A1 PCT/CN2019/102348 CN2019102348W WO2020232864A1 WO 2020232864 A1 WO2020232864 A1 WO 2020232864A1 CN 2019102348 W CN2019102348 W CN 2019102348W WO 2020232864 A1 WO2020232864 A1 WO 2020232864A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
text data
type
data
preset
Prior art date
Application number
PCT/CN2019/102348
Other languages
French (fr)
Chinese (zh)
Inventor
郭鸿程
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020232864A1 publication Critical patent/WO2020232864A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • This application relates to the field of intelligent decision-making, and in particular to a data processing method and related devices.
  • the method for parents or teachers to check the reading effect is to confirm through homework.
  • children or students often need to do after-school exercises after reading, and parents or teachers pass after-class Practice to test the effect of reading.
  • the embodiments of the present application provide a data processing method and related devices to improve the efficiency of checking reading effects.
  • the first aspect of this application provides a data processing method, including:
  • the summary vector of the text data is input to a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network to obtain multiple predictions Words, the plurality of predicted words are connected as a summary of the text data;
  • a neural network semantic representation model is used to calculate the degree of semantic relevance between the question of the text data and the text in the text data, and the text with the highest degree of semantic relevance is determined as the answer corresponding to the question of the text data.
  • the second aspect of the present application provides a data processing device, including:
  • the acquisition module is used to acquire the image data of the book sent by the terminal;
  • a character recognition module for performing character recognition processing on the image data to obtain text data corresponding to the image data
  • the detection module is configured to perform text type detection on the text data to determine whether the text type of the text data meets the preset text type;
  • the encoding module is used to input the text data into a neural network encoder to obtain a summary vector of the text data when the text type meets the preset text type, wherein the neural network encoder is used to The text data is compressed and encoded;
  • the decoding module is configured to input the summary vector of the text data into a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network Obtaining a plurality of predicted words, and the plurality of predicted words are connected as a summary of the text data;
  • the extraction module is configured to perform word segmentation processing on the abstract of the text data, and extract N keywords in the abstract of the text data in the order of word frequency from large to small, where N is a positive integer;
  • a combination module configured to classify the N keywords by part of speech, and combine the N keywords according to the part of speech of the N keywords in a preset question sentence order to obtain the text data question;
  • the processing module is used to calculate the semantic correlation degree between the text data question and the text in the text data through the neural network semantic representation model, and determine the text with the highest semantic correlation degree as the answer corresponding to the text data question.
  • a third aspect of the present application provides an electronic device for data processing.
  • the electronic device includes a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory , And configured to be executed by the processor, and the program includes instructions for executing the steps in any method of the first aspect of the present application.
  • the fourth aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the part described in any method of the first aspect of the present application Or all steps.
  • FIG. 1 is a flowchart of a data processing method provided by an embodiment of this application.
  • FIG. 3 is a flowchart of another data processing method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of a system structure provided by an embodiment of this application.
  • FIG. 5 is a schematic diagram of performing character recognition processing on image data according to an embodiment of the application.
  • FIG. 6 is a schematic diagram of a data processing device provided by an embodiment of this application.
  • FIG. 7 is a schematic structural diagram of an electronic device in a hardware operating environment involved in an embodiment of the application.
  • the data processing method and related devices provided in the embodiments of the present application can improve the efficiency of checking the reading effect.
  • the artificial intelligence server obtains the image data sent by the terminal, then processes the image data to obtain text data corresponding to the image data, and then processes the text data to obtain a summary of the text data, text data problems, and The answers to the text data questions are returned to the terminal.
  • FIG. 1 is a flowchart of a data processing method according to an embodiment of the application.
  • a data processing method provided by an embodiment of the present application may include:
  • the terminal can be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile Internet device, or other types of terminals.
  • the paper books are scanned first to obtain scanned images of the paper books, and then the terminal sends the scanned images to the artificial intelligence server.
  • the image data is a scanned image
  • the scanned image is scanned and generated by a scanning tool
  • the method for standardizing the image data can be:
  • the image data is processed by an image correction algorithm, where the image correction algorithm includes any one of Radon algorithm, Hough transform, and linear regression algorithm.
  • the image data is processed by an image enhancement algorithm, where the image enhancement algorithm includes any one of histogram equalization, image smoothing, and image sharpening.
  • the image data is processed through an image correction algorithm and an image enhancement algorithm.
  • an artificial intelligence server is required to perform character recognition processing on the image data to obtain text data corresponding to the image data, and the text data can be directly recognized.
  • the method for the artificial intelligence server to perform character recognition processing on the image data to obtain the text data corresponding to the image data may be:
  • Character cutting is performed on the image data to obtain M characters, where M is a positive integer.
  • Perform feature extraction on M characters to obtain M character features, where M characters correspond to M character features one-to-one.
  • the text type includes language type and style type
  • language type includes Chinese, English, Japanese, etc.
  • style type includes modern style (including novel, prose, fairy tale, narrative, explanatory, argumentative, etc.) and ancient style (Including poems, words, songs, fu etc.).
  • the method for the artificial intelligence server to perform text type detection on the text data to determine whether the text type of the text data meets the preset text type may be:
  • Performing language type detection on the text data to obtain the language type of the text data and performing style type detection on the text data to obtain the style type of the text data.
  • the language type of the text data satisfies the preset language type and the style type of the text data satisfies the preset style type
  • the preset style includes modern style.
  • the language type of the text data does not meet the preset language type, or the style type of the text data does not meet the preset style type, or the language type of the text data does not meet the preset language type and the style of the text data
  • the type does not meet the preset text type, it is determined that the text type of the text data does not meet the preset text type.
  • the method includes:
  • the artificial intelligence server sends a language type error message to the terminal, where the language type error message is used to instruct the terminal to generate a pop-up window or interface prompting that the language type of the book is wrong. For example, if the artificial intelligence server recognizes that the language type of the text data sent by the terminal is English, the artificial intelligence server sends a language type error message to the terminal, and when the terminal receives the language type error message, it generates a pop-up window indicating that the language type of the book cannot be English Or interface.
  • a stylistic type error message is sent to the terminal, where the stylistic type error message is used to instruct the terminal to generate a pop-up window or interface indicating that the book’s stylistic type is wrong, for example, manual
  • the smart server recognizes that the style type of the text data sent by the terminal is ancient style, then the artificial intelligence server sends a style type error message to the terminal.
  • the terminal receives the style type error message, it generates a pop-up window indicating that the style of the book cannot be ancient style or interface.
  • a language and style type error message is sent to the terminal, where the language and style type error message is used to indicate the terminal Generate a pop-up window or interface prompting that the language and style of the book are wrong.
  • the artificial intelligence server recognizes that the language type of the text data sent by the terminal is Japanese, and the style of the image data is ancient style, the artificial intelligence server sends to the terminal Language and style type error messages.
  • the terminal receives the language and style type error messages, it generates a pop-up window or interface that prompts that the language type of the book cannot be Japanese and the style type of the book cannot be ancient style.
  • the neural network encoder includes the first recurrent neural network
  • the method of inputting text data into the neural network encoder to obtain the summary vector of the text data may be:
  • the first text in the text data is input into the first recurrent neural network to obtain the first encoding vector; the first encoding vector is passed into the next moment; the first encoding vector and the second in the text data are sent to the next moment
  • the text is input into the first recurrent neural network to obtain the second encoding vector; the second encoding vector is passed into the next moment, until all the text in the text data is input into the first recurrent neural network, and the final encoding vector is determined to be Abstract vector of text data.
  • the neural network encoder is used to compress and encode the text data, and is implemented by a recurrent neural network (RNN).
  • the neural network encoder receives the input text data, and inputs the words in the original text data into the neural network at the beginning , Compress this word into a vector, and then pass the compressed vector to the next moment.
  • the code vector obtained after compressing all the text data is the summary vector of the text data.
  • the neural network decoder includes a second recurrent neural network
  • the method of inputting the summary vector of the text data into the neural network decoder to obtain the summary of the text data may be: input the summary vector of the text data into the first Second recurrent neural network to predict the first output text; pass the first output text into the next moment; at the next moment, input the summary vector of the first output text and text data into the second recurrent neural network to predict the second Output text; the second output text is passed into the next moment until the second recurrent neural network predicts the summary vector of the text data, and the final combination of all output texts is determined as the summary of the text data.
  • the neural network decoder is used to decode the summary vector of the text data, and is also implemented by a recurrent neural network (RNN).
  • RNN recurrent neural network
  • the neural network decoder After the summary vector of the text data is input to the neural network decoder, the neural network decoder The summary vector of the data is predicted to get the output word at one moment, and then the neural network decoder predicts the output word at the next moment according to the output word and summary vector at that moment, and so on, the output word at the previous moment will affect The next output word and all the output words obtained by the neural network decoder are connected together to form the summary of the text data.
  • performing word segmentation processing on the abstract of the text data, and extracting the N keywords in the abstract of the text data in the order of word frequency in descending order may be:
  • the word segmentation method for the abstract of the text data includes a word segmentation method based on string matching, a word segmentation method based on understanding, and a word segmentation method based on statistics.
  • the word segmentation method based on string matching is to match the Chinese character string to be segmented with an entry in a dictionary according to a certain strategy. If a string is found in the dictionary, the matching is successful, that is, a word is recognized.
  • the word segmentation method based on comprehension achieves the effect of word recognition by letting the computer simulate human's understanding of the sentence.
  • the statistical-based word segmentation method should use the basic word segmentation dictionary for string matching and segmentation, and at the same time use statistical methods to identify some new words, that is, the combination of string frequency statistics and string matching, which not only exerts the characteristics of fast matching segmentation speed and high efficiency, It also uses the advantages of no dictionary word segmentation combined with context to identify new words and automatically eliminate ambiguity.
  • the problem of calculating the text data through the neural network semantic representation model and the semantic correlation degree of the text in the text data include:
  • the method for calculating the degree of semantic relevance between the question of the text data and the text in the text data may be a vocabulary overlap method, a string method, a cosine similarity method or a maximum common subsequence method.
  • the specific process is to search for Q segments of text matching the N keywords in the text data, where Q is a positive integer.
  • the question of calculating the text data is related to the Q semantic relevance degrees of the Q segment text, where the Q segment text corresponds to the Q semantic relevance degrees one-to-one. Obtain the highest first semantic relevance degree among the Q semantic relevance degrees, and determine that the text corresponding to the first semantic relevance degree is the answer corresponding to the question of the text data.
  • FIG. 2 is a flowchart of another data processing method provided by another embodiment of the application.
  • another data processing method provided by another embodiment of the present application may include:
  • the terminal sends the image data of the book to the artificial intelligence server.
  • the terminal can be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile Internet device, or other types of terminals.
  • the paper books are scanned first to obtain scanned images of the paper books, and then the terminal sends the scanned images to the artificial intelligence server.
  • the artificial intelligence server performs character recognition processing on the image data to obtain text data corresponding to the image data.
  • the image data is a scanned image
  • the scanned image is scanned and generated by a scanning tool
  • the method for standardizing the image data can be:
  • the image data is processed by an image correction algorithm, where the image correction algorithm includes any one of Radon algorithm, Hough transform, and linear regression algorithm.
  • the image data is processed by an image enhancement algorithm, where the image enhancement algorithm includes any one of histogram equalization, image smoothing, and image sharpening.
  • the image data is processed through an image correction algorithm and an image enhancement algorithm.
  • an artificial intelligence server is required to perform character recognition processing on the image data to obtain text data corresponding to the image data, and the text data can be directly recognized.
  • Character cutting is performed on the image data to obtain M characters, where M is a positive integer.
  • the artificial intelligence server recognizes whether the language type of the text data meets the preset language type.
  • the language types include Chinese, English, Japanese, etc.
  • the preset language types include Chinese.
  • the artificial intelligence server recognizes whether the style type of the text data meets the preset style type.
  • the stylistic types include modern styles (including novels, prose, fairy tales, narratives, explanatory essays, argumentative essays, etc.) and ancient styles (including poems, words, songs, fu, etc.), and the preset styles include modern styles.
  • the artificial intelligence server sends a language and style type error message to the terminal.
  • the terminal generates a pop-up window or interface prompting that the language and style of the book are wrong.
  • the artificial intelligence server recognizes that the language type of the text data is Japanese and the style is ancient style, then the artificial intelligence server sends a language and style type error message to the terminal, and when the terminal receives the language and style type error message, it generates a language that prompts the book
  • the type cannot be Japanese and the style cannot be the pop-up window or interface of the ancient style.
  • FIG. 3 is a flowchart of another data processing method provided by another embodiment of the application.
  • another data processing method provided by another embodiment of the present application may include:
  • the terminal sends the image data of the book to the artificial intelligence server.
  • the terminal can be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile Internet device, or other types of terminals.
  • the books read by children or students are paper books.
  • the paper books are scanned through the terminal to obtain scanned images of the paper books, and then the terminal sends the scanned images to the artificial intelligence server.
  • the artificial intelligence server processes the image data by using an image correction algorithm.
  • the image correction algorithm is required to The image data is processed, and the image correction algorithm includes any one of Radon algorithm, Hough transform and linear regression algorithm.
  • the artificial intelligence server processes the image data by using an image enhancement algorithm.
  • the image enhancement algorithm includes any of histogram equalization, image smoothing, and image sharpening.
  • the artificial intelligence server performs character cutting on the image data to obtain M characters, where M is a positive integer.
  • the artificial intelligence server performs feature extraction on the M characters to obtain M character features.
  • M characters correspond to M character features one-to-one, and feature extraction can be divided into two categories: one is statistical features, the ratio of the number of black points or the number of white points in the character area of the image data is obtained, when the character area is divided into When there are several areas, the black point ratio or white point ratio of each area is combined into a numerical vector of space, and the other type is structural feature.
  • the strokes of the characters are obtained The number and location of endpoints and intersections.
  • the artificial intelligence server compares the M character features with the character feature database to identify M text characters corresponding to the M character features.
  • M character features correspond to M text characters one-to-one.
  • the comparison methods include the comparison method of Euclidean space, relaxation comparison method (Relaxation), dynamic programming comparison method (Dynamic Programming, DP), neural Network database establishment and comparison method, HMM (Hidden Markov Model) and other methods.
  • the artificial intelligence server combines M text characters to obtain text data corresponding to the image data.
  • the artificial intelligence server performs text type detection on the text data to determine whether the text type of the text data meets the preset text type.
  • the text type includes language type and style type
  • language type includes Chinese, English, Japanese, etc.
  • style type includes modern style (including novel, prose, fairy tale, narrative, explanatory, argumentative, etc.) and ancient style (Including poems, words, songs, fu, etc.).
  • the method for the artificial intelligence server to perform text type detection on the text data to determine whether the text type of the text data meets the preset text type may be:
  • the language type of the text data satisfies the preset language type and the style type of the text data satisfies the preset style type
  • the preset style includes modern style.
  • the language type of the text data does not meet the preset language type, or the style type of the text data does not meet the preset style type, or the language type of the text data does not meet the preset language type and the style of the text data
  • the type does not meet the preset text type, it is determined that the text type of the text data does not meet the preset text type.
  • the neural network encoder is used to compress and encode the text data, which is implemented by a recurrent neural network (RNN).
  • the neural network encoder receives the input text data, and inputs the words in the original text data into the neural network at the beginning. Compress this word into a vector, and then pass the compressed vector to the next moment. In the next moment, input the compressed vector at the previous moment and the word in the original text data to the neural network, and then transfer the compressed new vector
  • the code vector obtained after compressing all the text data is the summary vector of the text data.
  • the neural network decoder is used to decode the summary vector of the text data, and it is also implemented by a recurrent neural network (RNN).
  • RNN recurrent neural network
  • the neural network decoder Predicts the output word at a moment by using the summary vector of the, and then the neural network decoder predicts the output word at the next moment according to the output word and summary vector at that moment, and so on, the output word at the previous moment will affect the next An output word, and finally all the output words obtained by the neural network decoder are connected to form a summary of the text data.
  • the method for extracting the N keywords in the abstract of the text data may be:
  • the word segmentation method for the abstract of the text data includes a word segmentation method based on string matching, a word segmentation method based on understanding, and a word segmentation method based on statistics.
  • the word segmentation method based on string matching is to match the Chinese character string to be segmented with an entry in a dictionary according to a certain strategy. If a string is found in the dictionary, the matching is successful, that is, a word is recognized.
  • the word segmentation method based on comprehension achieves the effect of word recognition by letting the computer simulate human's understanding of the sentence.
  • the statistical-based word segmentation method should use the basic word segmentation dictionary for string matching and segmentation, and at the same time use statistical methods to identify some new words, that is, the combination of string frequency statistics and string matching, which not only exerts the characteristics of fast matching segmentation speed and high efficiency, It also uses the advantages of no dictionary word segmentation combined with context to identify new words and automatically eliminate ambiguity.
  • the problem of calculating the text data through the neural network semantic representation model and the semantic correlation degree of the text in the text data include:
  • the method for calculating the degree of semantic relevance between the question of the text data and the text in the text data may be a vocabulary overlap method, a string method, a cosine similarity method or a maximum common subsequence method.
  • the specific process is to search for Q segments of text matching the N keywords in the text data, where Q is a positive integer.
  • the question of calculating the text data is related to the Q semantic relevance degrees of the Q segment text, where the Q segment text corresponds to the Q semantic relevance degrees one-to-one.
  • the system includes an artificial intelligence server and a terminal.
  • the artificial intelligence server communicates with the terminal.
  • the terminal includes a mobile phone and a computer.
  • the user accesses the artificial intelligence server through the terminal.
  • the terminal is a mobile phone
  • the user can use the mobile phone Take photos of the books to be processed, send the photos to the artificial intelligence server, the artificial intelligence server processes the photos, obtains the processing results, and then returns the processing results to the user’s mobile phone.
  • the terminal is a computer
  • the user can connect to the computer through Scanning equipment, such as printers, scans the book, and then sends the scanned image to the artificial intelligence server.
  • the artificial intelligence server processes the scanned image to obtain the processing result, and then returns the processing result to the user's computer.
  • FIG. 5 is a schematic diagram of performing character recognition processing on image data according to an embodiment of the application.
  • the image data is displayed as ABCDE.
  • the image data is cut into characters, and five characters can be obtained, namely A, B, C, D, and E, and then feature extraction of the obtained characters ,
  • feature a, feature b, feature c, feature d, and feature e are feature a, feature b, feature c, feature d, and feature e.
  • the features compare and recognize to determine the text characters corresponding to the features, which are text character A, text
  • the character B, the text character C, the text character D and the text character E are obtained, all the text characters are combined to obtain the text ABCDE.
  • FIG. 6 is a schematic diagram of a data processing apparatus provided by another embodiment of the application.
  • a data processing apparatus provided by another embodiment of the present application may include:
  • the obtaining module 601 is used to obtain image data of books sent by the terminal;
  • the character recognition module 602 is configured to perform character recognition processing on the image data to obtain text data corresponding to the image data;
  • the detection module 603 is configured to perform text type detection on the text data to determine whether the text type of the text data meets the preset text type;
  • the encoding module 604 is configured to input the text data into a neural network encoder to obtain a summary vector of the text data when the text type meets the preset text type, wherein the neural network encoder is used to Compressing and encoding the text data;
  • the decoding module 605 is configured to input the summary vector of the text data into a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network To obtain a plurality of predicted words, and the plurality of predicted words are connected as a summary of the text data;
  • the extraction module 606 is configured to perform word segmentation processing on the abstract of the text data, and extract N keywords in the abstract of the text data in the order of word frequency from large to small, where N is a positive integer;
  • the combination module 607 is configured to classify the N keywords by part of speech, and combine the N keywords according to the part of speech of the N keywords according to a preset question order to obtain the text data question;
  • the processing module 608 is configured to calculate the degree of semantic relevance between the question of the text data and the text in the text data through the neural network semantic representation model, and determine the text with the highest degree of semantic relevance as the answer corresponding to the question of the text data.
  • FIG. 7 is a schematic structural diagram of an electronic device in a hardware operating environment involved in an embodiment of the application.
  • the electronic device of the hardware operating environment involved in the embodiment of the present application may include:
  • the processor 701 is, for example, a CPU.
  • the memory 702 may be a high-speed RAM memory, or a stable memory, such as a disk memory.
  • the communication interface 703 is used to implement connection and communication between the processor 701 and the memory 702.
  • FIG. 7 does not constitute a limitation on the data processing electronic device, and may include more or less components than shown in the figure, or a combination of certain components , Or different component arrangements.
  • the memory 702 may include an operating system, a network communication module, and data processing programs.
  • the operating system is a program that manages and controls the hardware and software resources of an electronic device for data processing, a program that supports data processing, and the operation of other software or programs.
  • the network communication module is used to implement communication between various components in the memory 702, and communication with other hardware and software in the data processing electronic device.
  • the processor 701 is configured to execute the data processing program stored in the memory 702, and implement the following steps:
  • the summary vector of the text data is input to a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network to obtain multiple predictions Words, the plurality of predicted words are connected as a summary of the text data;
  • a neural network semantic representation model is used to calculate the degree of semantic correlation between the question of the text data and the text in the text data, and the text with the highest degree of semantic correlation is determined as the answer corresponding to the question of the text data.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, and the computer program is processed. Execute to achieve the following steps:
  • the summary vector of the text data is input to a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network to obtain multiple predictions Words, the plurality of predicted words are connected as a summary of the text data;
  • a neural network semantic representation model is used to calculate the degree of semantic relevance between the question of the text data and the text in the text data, and the text with the highest degree of semantic relevance is determined as the answer corresponding to the question of the text data.

Abstract

The present application relates to the field of intelligent decisions. Provided are a data processing method and a related apparatus. The data processing method comprises: acquiring image data, of a book, sent by a terminal; carrying out character identification processing on the image data to obtain text data corresponding to the image data; carrying out text type detection on the text data to determine whether a text type of the text data satisfies a preset text type; when the text type satisfies the preset text type, inputting the text data into a neural network encoder to obtain an abstract vector of the text data; inputting the abstract vector of the text data into a neural network decoder to obtain an abstract of the text data; extracting N keywords in the abstract of the text data; combining the N keywords to obtain a question of the text data; and determining an answer corresponding to the question of the text data by means of a neural network semantic representation model. The technical solution of the embodiments of the present application improves the efficiency of checking the reading effect.

Description

一种数据处理的方法及相关装置Method and related device for data processing
本申请要求于2019年5月20日提交中国专利局、申请号为2019104203915、申请名称为“一种数据处理的方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 2019104203915, and the application name is "a method and related device for data processing" on May 20, 2019. The entire content of the patent application is incorporated herein by reference. Applying.
技术领域Technical field
本申请涉及智能决策领域,尤其涉及一种数据处理的方法及相关装置。This application relates to the field of intelligent decision-making, and in particular to a data processing method and related devices.
背景技术Background technique
目前,孩童或者学生阅读完书籍后,家长或者老师检验阅读效果的方法就是通过作业来确认,例如对于课本上的文章,孩童或者学生阅读完后往往需要做课后练习,家长或者老师通过课后练习来检验阅读效果。At present, after children or students have finished reading books, the method for parents or teachers to check the reading effect is to confirm through homework. For example, for articles in textbooks, children or students often need to do after-school exercises after reading, and parents or teachers pass after-class Practice to test the effect of reading.
但是,有时候孩童或者学生阅读的书籍后面没有对应的作业或者练习,如果要检验阅读效果,家长或者老师只有先看一遍书籍,了解书籍的内容,才能对孩童或者学生的阅读效果进行检验,这样,浪费了阅读书籍的时间,而且如果阅读的书籍很长,那么检验阅读效果的效率较低。However, sometimes there is no corresponding homework or exercise behind the books that children or students read. If you want to test the reading effect, parents or teachers have to read the book first and understand the content of the book before you can test the reading effect of the children or students. , It wastes the time of reading books, and if the books read is very long, the efficiency of checking the reading effect is low.
发明内容Summary of the invention
本申请实施例提供一种数据处理的方法及相关装置,以提高检验阅读效果的效率。The embodiments of the present application provide a data processing method and related devices to improve the efficiency of checking reading effects.
本申请第一方面提供一种数据处理的方法,包括:The first aspect of this application provides a data processing method, including:
获取终端发送的书籍的图像数据;Acquiring the image data of the book sent by the terminal;
对所述图像数据进行字符识别处理以得到所述图像数据对应的文本数据;Performing character recognition processing on the image data to obtain text data corresponding to the image data;
对所述文本数据进行文本类型检测以判断所述文本数据的文本类型是否满足预设文本类型;Performing text type detection on the text data to determine whether the text type of the text data meets the preset text type;
当所述文本类型满足所述预设文本类型时,将所述文本数据输入神经网络编码器以得到所述文本数据的摘要向量,其中,所述神经网络编码器用于对所述文本数据进行压缩编码;When the text type satisfies the preset text type, input the text data into a neural network encoder to obtain a summary vector of the text data, wherein the neural network encoder is used to compress the text data coding;
将所述文本数据的摘要向量输入神经网络解码器以得到所述文本数据的摘要,其中,所述神经网络解码器用于通过神经网络对所述文本数据的摘要向量进行预测以得到多个预测的字,所述多个预测的字连接为所述文本数据的摘要;The summary vector of the text data is input to a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network to obtain multiple predictions Words, the plurality of predicted words are connected as a summary of the text data;
对所述文本数据的摘要进行分词处理,按照词频由大到小的顺序提取所述文本数据的摘要中的N个关键词,其中,N为正整数;Perform word segmentation processing on the abstract of the text data, and extract N keywords in the abstract of the text data in the order of word frequency from large to small, where N is a positive integer;
对所述N个关键词进行词性分类,根据所述N个关键词的词性将所述N个关键词按照预设问句语序进行组合以得到所述文本数据的问题;Classify the N keywords by part of speech, and combine the N keywords according to the part of speech of the N keywords according to a preset question sentence order to obtain the text data question;
通过神经网络语义表示模型计算所述文本数据的问题和所述文本数据中的文本的语义相关程度,确定语义相关程度最高的文本为所述文本数据的问题对应的答案。A neural network semantic representation model is used to calculate the degree of semantic relevance between the question of the text data and the text in the text data, and the text with the highest degree of semantic relevance is determined as the answer corresponding to the question of the text data.
本申请第二方面提供了一种数据处理的装置,包括:The second aspect of the present application provides a data processing device, including:
获取模块,用于获取终端发送的书籍的图像数据;The acquisition module is used to acquire the image data of the book sent by the terminal;
字符识别模块,用于对所述图像数据进行字符识别处理以得到所述图像数据对应的文本数据;A character recognition module for performing character recognition processing on the image data to obtain text data corresponding to the image data;
检测模块,用于对所述文本数据进行文本类型检测以判断所述文本数据的文本类型是否满足预设文本类型;The detection module is configured to perform text type detection on the text data to determine whether the text type of the text data meets the preset text type;
编码模块,用于当所述文本类型满足所述预设文本类型时,将所述文本数据输入神经网络编码器以得到所述文本数据的摘要向量,其中,所述神经网络编码器用于对所述文本数据进行压缩编码;The encoding module is used to input the text data into a neural network encoder to obtain a summary vector of the text data when the text type meets the preset text type, wherein the neural network encoder is used to The text data is compressed and encoded;
解码模块,用于将所述文本数据的摘要向量输入神经网络解码器以得到所述文本数据的摘要,其中,所述神经网络解码器用于通过神经网络对所述文本数据的摘要向量进行预测以得到多个预测的字,所述多个预测的字连接为所述文本数据的摘要;The decoding module is configured to input the summary vector of the text data into a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network Obtaining a plurality of predicted words, and the plurality of predicted words are connected as a summary of the text data;
提取模块,用于对所述文本数据的摘要进行分词处理,按照词频由大到小的顺序提取所述文本数据的摘要中的N个关键词,其中,N为正整数;The extraction module is configured to perform word segmentation processing on the abstract of the text data, and extract N keywords in the abstract of the text data in the order of word frequency from large to small, where N is a positive integer;
组合模块,用于对所述N个关键词进行词性分类,根据所述N个关键词的词性将所述N个关键词按照预设问句语序进行组合以得到所述文本数据的问题;A combination module, configured to classify the N keywords by part of speech, and combine the N keywords according to the part of speech of the N keywords in a preset question sentence order to obtain the text data question;
处理模块,用于通过神经网络语义表示模型计算所述文本数据的问题和所述文本数据中的文本的语义相关程度,确定语义相关程度最高的文本为所述文本数据的问题对应的答案。The processing module is used to calculate the semantic correlation degree between the text data question and the text in the text data through the neural network semantic representation model, and determine the text with the highest semantic correlation degree as the answer corresponding to the text data question.
本申请第三方面提供了一种数据处理的电子设备,所述电子设备包括处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述程序包括用于执行本申请第一方面任一方法中的步骤的指令。A third aspect of the present application provides an electronic device for data processing. The electronic device includes a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory , And configured to be executed by the processor, and the program includes instructions for executing the steps in any method of the first aspect of the present application.
本申请第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现本申请第一方面任一方法中所描述的部分或全部步骤。The fourth aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the part described in any method of the first aspect of the present application Or all steps.
可以看到,当孩童或者学生阅读的书籍后面没有对应的作业或者练习时,通过上述技术方案,可以得到书籍的摘要、问题和问题对应的答案,以便于家长或者老师根据摘要了解书籍的内容,通过问题和问题对应的答案对孩童或者学生的阅读效果进行检验,避免了家长或者老师花费大量时间阅读书籍,提高了检验阅读效果的效率。It can be seen that when there is no corresponding homework or exercise at the back of the book read by children or students, through the above technical solutions, the abstract of the book, the question and the answer corresponding to the question can be obtained, so that the parent or teacher can understand the content of the book based on the abstract. The reading effect of children or students is tested through the questions and the answers corresponding to the questions, which avoids parents or teachers from spending a lot of time reading books, and improves the efficiency of checking the reading effect.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative work, other drawings can be obtained based on these drawings.
图1为本申请实施例提供的一种数据处理的方法的流程图;FIG. 1 is a flowchart of a data processing method provided by an embodiment of this application;
图2为本申请实施例提供的另一种数据处理的方法的流程图;2 is a flowchart of another data processing method provided by an embodiment of the application;
图3为本申请实施例提供的另一种数据处理的方法的流程图;FIG. 3 is a flowchart of another data processing method provided by an embodiment of the application;
图4为本申请实施例提供的一种系统结构示意图;FIG. 4 is a schematic diagram of a system structure provided by an embodiment of this application;
图5为本申请实施例提供的一种对图像数据进行字符识别处理的示意图;FIG. 5 is a schematic diagram of performing character recognition processing on image data according to an embodiment of the application;
图6为本申请实施例提供的一种数据处理的装置的示意图;FIG. 6 is a schematic diagram of a data processing device provided by an embodiment of this application;
图7为本申请实施例涉及的硬件运行环境的电子设备结构示意图。FIG. 7 is a schematic structural diagram of an electronic device in a hardware operating environment involved in an embodiment of the application.
具体实施方式Detailed ways
本申请实施例提供的数据处理的方法及相关装置,以提高检验阅读效果的效率。The data processing method and related devices provided in the embodiments of the present application can improve the efficiency of checking the reading effect.
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only It is a part of the embodiments of this application, not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the protection scope of this application.
以下分别进行详细说明。Detailed descriptions are given below.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. in the specification and claims of this application and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.
本申请的实施例中,人工智能服务器获取终端发送的图像数据,然后对图像数据进行处理以得到图像数据对应的文本数据,再对文本数据进行处理以得到文本数据的摘要、文本数据的问题和文本数据的问题对应的答案,并返回给终端。In the embodiment of the present application, the artificial intelligence server obtains the image data sent by the terminal, then processes the image data to obtain text data corresponding to the image data, and then processes the text data to obtain a summary of the text data, text data problems, and The answers to the text data questions are returned to the terminal.
首先参见图1,图1为本申请的一个实施例提供的一种数据处理的方法的流程图。其中,如图1所示,本申请的一个实施例提供的一种数据处理的方法可以包括:First, refer to FIG. 1. FIG. 1 is a flowchart of a data processing method according to an embodiment of the application. Wherein, as shown in FIG. 1, a data processing method provided by an embodiment of the present application may include:
101、获取终端发送的书籍的图像数据。101. Acquire image data of a book sent by a terminal.
其中,终端可以是手机、平板电脑、笔记本电脑、掌上电脑、移动互联网设备、或其他类型的终端。Among them, the terminal can be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile Internet device, or other types of terminals.
如果孩童或者学生阅读的书籍是纸质书籍,那么先对纸质书籍进行扫描以得到纸质书籍的扫描图像,然后终端将扫描图像发给人工智能服务器。If the books read by children or students are paper books, the paper books are scanned first to obtain scanned images of the paper books, and then the terminal sends the scanned images to the artificial intelligence server.
102、对该图像数据进行字符识别处理以得到该图像数据对应的文本数据。102. Perform character recognition processing on the image data to obtain text data corresponding to the image data.
可选的,当该图像数据为扫描图像时,由于扫描图像是由扫描工具扫描生成的,所以可能出现部分未扫描到或者扫描不清晰的问题,也可能出现扫描歪斜的问题,所以在对图像数据进行字符识别处理之前,需要对图像数据进行标准化处理,其中,对图像数据进行标准化处理的方法可以是:Optionally, when the image data is a scanned image, because the scanned image is scanned and generated by a scanning tool, there may be problems that some parts are not scanned or the scan is not clear, or the scan is skewed. Before data is processed for character recognition, it is necessary to standardize the image data. The method for standardizing the image data can be:
当该图像数据的倾斜度超过预设倾斜度阈值时,通过图像校正算法对该图像数据进行处理,其中,图像校正算法包括拉东算法、霍夫变换和线性回归算法的任意一种。或者,当该图像数据的清晰度低于预设清晰度阈值时,通过图像增强算法对该图像数据进行处理,其中,图像增强算法包括直方图均衡、图像平滑、图像锐化的任意一种。或者,当该图像数据的倾斜度超过预设倾斜度阈值以及该图像数据的清晰度低于预设清晰度阈值时,通过图像校正算法以及图像增强算法对该图像数据进行处理。When the inclination of the image data exceeds the preset inclination threshold, the image data is processed by an image correction algorithm, where the image correction algorithm includes any one of Radon algorithm, Hough transform, and linear regression algorithm. Or, when the definition of the image data is lower than the preset definition threshold, the image data is processed by an image enhancement algorithm, where the image enhancement algorithm includes any one of histogram equalization, image smoothing, and image sharpening. Or, when the inclination of the image data exceeds the preset inclination threshold and the definition of the image data is lower than the preset definition threshold, the image data is processed through an image correction algorithm and an image enhancement algorithm.
当该图像数据为扫描图像时,由于扫描图像不能直接被识别,所以需要人工智能服务器对该图像数据进行字符识别处理以得到该图像数据对应的文本数据,文本数据可以直接被识别。其中,人工智能服务器对该图像数据进行字符识别处理以得到该图像数据对应的文本数据的方法可以是:When the image data is a scanned image, since the scanned image cannot be directly recognized, an artificial intelligence server is required to perform character recognition processing on the image data to obtain text data corresponding to the image data, and the text data can be directly recognized. The method for the artificial intelligence server to perform character recognition processing on the image data to obtain the text data corresponding to the image data may be:
对该图像数据进行字符切割以得到M个字符,其中,M为正整数。对M个字符进行特征提取以得到M个字符特征,其中,M个字符与M个字符特征一一对应。将M个字符特征与字符特征数据库进行对比以识别M个字符特征对应的M个文本字符,其中,M个字符特征与M个文本字符一一对应,其中,比对方法包括欧式空间的比对方法、松弛比对法(Relaxation)、动态程序比对法(Dynamic Programming,DP)、类神经网络的数据库建立及比对法、HMM(Hidden Markov Model)等方法。Character cutting is performed on the image data to obtain M characters, where M is a positive integer. Perform feature extraction on M characters to obtain M character features, where M characters correspond to M character features one-to-one. Compare M character features with a character feature database to identify M text characters corresponding to M character features, where M character features correspond to M text characters one-to-one, and the comparison method includes comparison in Euclidean space Methods, relaxation comparison method (Relaxation), dynamic programming method (Dynamic Programming, DP), neural network-like database establishment and comparison method, HMM (Hidden Markov Model) and other methods.
将M个文本字符进行组合以得到该图像数据对应的文本数据。Combine M text characters to obtain text data corresponding to the image data.
103、对该文本数据进行文本类型检测以判断该文本数据的文本类型是否满足预设文本类型。103. Perform text type detection on the text data to determine whether the text type of the text data meets the preset text type.
可选的,文本类型包括语言类型和文体类型,语言类型包括中文、英文、日文等等,文体类型包括现代文体(包括小说、散文、童话、记叙文、说明文、议论文等等)和古代文体(包括诗、词、歌、赋等等)。Optionally, the text type includes language type and style type, language type includes Chinese, English, Japanese, etc., style type includes modern style (including novel, prose, fairy tale, narrative, explanatory, argumentative, etc.) and ancient style ( Including poems, words, songs, fu etc.).
人工智能服务器对该文本数据进行文本类型检测以判断该文本数据的文本类型是否满足预设文本类型的方法可以是:The method for the artificial intelligence server to perform text type detection on the text data to determine whether the text type of the text data meets the preset text type may be:
对该文本数据进行语言类型检测以得到该文本数据的语言类型,对该文本数据进行文体类型检测以得到该文本数据的文体类型。当该文本数据的语言类型满足预设语言类型以及该文本数据的文体类型满足预设文体类型时,确定该文本数据的文本类型满足该预设文本类型,其中,该预设语言类型包括中文,该预设文体类型包括现代文体。当该文本数据的语言类型不满足该预设语言类型、或者该文本数据的文体类型不满足该预设文体类型、或者该文本数据的语言类型不满足该预设语言类型以及该文本数据的文体类型不满足该预设文体类型时,确定该文本数据的文本类型不满足该预设文本类型。Performing language type detection on the text data to obtain the language type of the text data, and performing style type detection on the text data to obtain the style type of the text data. When the language type of the text data satisfies the preset language type and the style type of the text data satisfies the preset style type, it is determined that the text type of the text data satisfies the preset text type, wherein the preset language type includes Chinese, The preset style includes modern style. When the language type of the text data does not meet the preset language type, or the style type of the text data does not meet the preset style type, or the language type of the text data does not meet the preset language type and the style of the text data When the type does not meet the preset text type, it is determined that the text type of the text data does not meet the preset text type.
进一步可选的,人工智能服务器确定该文本数据的文本类型不满足该预设文本类型之后,包括:Further optionally, after the artificial intelligence server determines that the text type of the text data does not meet the preset text type, the method includes:
当该文本数据的语言类型不满足预设语言类型时,人工智能服务器向终端发送语言类型错误消息,其中,语言类型错误消息用于指示终端生成提示该书籍的语言类型错误的弹窗或者界面,例如,人工智能服务器识别出终端发送的文本数据的语言类型为英文,那么人工智能服务器向终端发送语言类型错误消息,终端接收语言类型错误消息时,生成提示书籍的语言类型不能为英文的弹窗或者界面。When the language type of the text data does not meet the preset language type, the artificial intelligence server sends a language type error message to the terminal, where the language type error message is used to instruct the terminal to generate a pop-up window or interface prompting that the language type of the book is wrong. For example, if the artificial intelligence server recognizes that the language type of the text data sent by the terminal is English, the artificial intelligence server sends a language type error message to the terminal, and when the terminal receives the language type error message, it generates a pop-up window indicating that the language type of the book cannot be English Or interface.
当该文本数据的文体类型不满足预设文体类型时,向终端发送文体类型错误消息,其中,文体类型错误消息用于指示终端生成提示该书籍的文体类型错误的弹窗或者界面,例如,人工智能服务器识别出终端发送的文本数据的文体类型为古代文体,那么人工智能服务器向终端发送文体类型错误消息,终端接收文体类型错误消息时,生成提示书籍的文体类型不能为古代文体的弹窗或者界面。When the stylistic type of the text data does not meet the preset stylistic type, a stylistic type error message is sent to the terminal, where the stylistic type error message is used to instruct the terminal to generate a pop-up window or interface indicating that the book’s stylistic type is wrong, for example, manual The smart server recognizes that the style type of the text data sent by the terminal is ancient style, then the artificial intelligence server sends a style type error message to the terminal. When the terminal receives the style type error message, it generates a pop-up window indicating that the style of the book cannot be ancient style or interface.
当该文本数据的语言类型不满足预设语言类型以及该文本数据的文体类型不满足预设文体类型时,向终端发送语言及文体类型错误消息,其中,语言及文体类型错误消息用于指示终端生成提示该书籍的语言及文体类型错误的弹窗或者界面,例如,人工智能服务器识别出终端发送的文本数据的语言类型为日文,图像数据的文体类型为古代文体,那么人工智能服务器向终端发送语言及文体类型错误消息,终端接收语言及文体类型错误消息时,生成提示书籍的语言类型不能为日文以及书籍的文体类型不能为古代文体的弹窗或者界面。When the language type of the text data does not meet the preset language type and the style type of the text data does not meet the preset style type, a language and style type error message is sent to the terminal, where the language and style type error message is used to indicate the terminal Generate a pop-up window or interface prompting that the language and style of the book are wrong. For example, if the artificial intelligence server recognizes that the language type of the text data sent by the terminal is Japanese, and the style of the image data is ancient style, the artificial intelligence server sends to the terminal Language and style type error messages. When the terminal receives the language and style type error messages, it generates a pop-up window or interface that prompts that the language type of the book cannot be Japanese and the style type of the book cannot be ancient style.
104、当文本类型满足该预设文本类型时,将该文本数据输入神经网络编码器以得到该文本数据的摘要向量。104. When the text type meets the preset text type, input the text data into the neural network encoder to obtain a summary vector of the text data.
在一个可能的示例中,神经网络编码器包括第一递归神经网络,将文本数据输入神经网络编码器以得到文本数据的摘要向量的方法可以是:In a possible example, the neural network encoder includes the first recurrent neural network, and the method of inputting text data into the neural network encoder to obtain the summary vector of the text data may be:
当前时刻将文本数据中的第一文本输入第一递归神经网络,以得到第一编码向量;将第一编码向量传入下一时刻;下一时刻将第一编码向量和文本数据中的第二文本输入第一递归神经网络,以得到第二编码向量;将第二编码向量传入下一时刻,直到文本数据中的所有文本都输入到第一递归神经网络中,确定最后得到的编码向量为文本数据的摘要向量。At the current moment, the first text in the text data is input into the first recurrent neural network to obtain the first encoding vector; the first encoding vector is passed into the next moment; the first encoding vector and the second in the text data are sent to the next moment The text is input into the first recurrent neural network to obtain the second encoding vector; the second encoding vector is passed into the next moment, until all the text in the text data is input into the first recurrent neural network, and the final encoding vector is determined to be Abstract vector of text data.
具体的,该神经网络编码器用于对该文本数据进行压缩编码,由递归神经网络(RNN)实现,神经网络编码器接收输入的文本数据,在开始时刻将原文本数据中的字输入到神经网络,将这个字压缩成一个向量,然后将压缩得到的向量传入下一时刻,下一时刻将上一时刻的压缩向量和原文本数据中的字输入到神经网络,再将压缩得到的新向量传入下一时刻,在压缩完所有的文本数据后得到的编码向量,即是文本数据的摘要向量。Specifically, the neural network encoder is used to compress and encode the text data, and is implemented by a recurrent neural network (RNN). The neural network encoder receives the input text data, and inputs the words in the original text data into the neural network at the beginning , Compress this word into a vector, and then pass the compressed vector to the next moment. In the next moment, input the compressed vector at the previous moment and the word in the original text data to the neural network, and then the compressed new vector Pass in the next moment, the code vector obtained after compressing all the text data is the summary vector of the text data.
105、将该文本数据的摘要向量输入神经网络解码器以得到该文本数据的摘要。105. Input the summary vector of the text data into a neural network decoder to obtain a summary of the text data.
在一个可能的示例中,神经网络解码器包括第二递归神经网络,将文本数据的摘要向量输入神经网络解码器以得到文本数据的摘要的方法可以是:当前时刻将文本数据的摘要向量输入第二递归神经网络,以预测得到第一输出文本;将第一输出文本传入下一时刻;下一时刻将第一输出文本和文本数据的摘要向量输入第二递归神经网络,以预测得到第二输出文本;将第二输出文本传入下一时刻,直到第二递归神经网络对文本数据的摘要向量预测结束,确定最后得到的所有输出文本的组合为文本数据的摘要。In a possible example, the neural network decoder includes a second recurrent neural network, and the method of inputting the summary vector of the text data into the neural network decoder to obtain the summary of the text data may be: input the summary vector of the text data into the first Second recurrent neural network to predict the first output text; pass the first output text into the next moment; at the next moment, input the summary vector of the first output text and text data into the second recurrent neural network to predict the second Output text; the second output text is passed into the next moment until the second recurrent neural network predicts the summary vector of the text data, and the final combination of all output texts is determined as the summary of the text data.
具体的,该神经网络解码器用于对该文本数据的摘要向量进行解码,也由递归神经网络(RNN)实现,将该文本数据的摘要向量输入神经网络解码器后,神经网络解码器对该文本数据的摘要向量进行预测得到一个时刻的输出字,然后神经网络解码器再根据该时刻的输出字和摘要向量进行预测得到下一个时刻的输出字,以此类推,上一个时刻的输出字会影响下一个输出字,最后神经网络解码器得到的所有的输出字连接起来即为该文本数据的摘要。Specifically, the neural network decoder is used to decode the summary vector of the text data, and is also implemented by a recurrent neural network (RNN). After the summary vector of the text data is input to the neural network decoder, the neural network decoder The summary vector of the data is predicted to get the output word at one moment, and then the neural network decoder predicts the output word at the next moment according to the output word and summary vector at that moment, and so on, the output word at the previous moment will affect The next output word and all the output words obtained by the neural network decoder are connected together to form the summary of the text data.
106、对该文本数据的摘要进行分词处理,按照词频由大到小的顺序提取该文本数据的摘要中的N个关键词,其中,N为正整数。106. Perform word segmentation processing on the abstract of the text data, and extract N keywords in the abstract of the text data according to the order of word frequency from large to small, where N is a positive integer.
可选的,对该文本数据的摘要进行分词处理,按照词频由大到小的顺序提取该文本数据的摘要中的N个关键词的方法可以是:Optionally, performing word segmentation processing on the abstract of the text data, and extracting the N keywords in the abstract of the text data in the order of word frequency in descending order may be:
对该文本数据的摘要进行分词处理以得到该文本数据的摘要对应的K个分词,其中, K为大于N的正整数。计算该K个分词对应的K个词频,其中,该K个分词与该K个词频一一对应。按照词频由大到小的顺序确定该K个分词中的N个分词,提取该N个分词。Perform word segmentation processing on the summary of the text data to obtain K word segmentation corresponding to the summary of the text data, where K is a positive integer greater than N. Calculate the K word frequencies corresponding to the K word segmentation, where the K word segments correspond to the K word frequencies one-to-one. Determine the N participles of the K participles according to the order of the word frequency, and extract the N participles.
其中,对该文本数据的摘要进行分词处理的方法有基于字符串匹配的分词方法、基于理解的分词方法和基于统计的分词方法。Among them, the word segmentation method for the abstract of the text data includes a word segmentation method based on string matching, a word segmentation method based on understanding, and a word segmentation method based on statistics.
基于字符串匹配的分词方法是按照一定的策略将待分词的汉字串与一个词典中的词条进行匹配,若在词典中找到某个字符串,则匹配成功,即识别出一个词。基于理解的分词方法是通过让计算机模拟人对句子的理解,达到识别词的效果。基于统计的分词方法要使用基本的分词词典进行串匹配分词,同时使用统计方法识别一些新的词,即将串频统计和串匹配结合起来,既发挥匹配分词切分速度快、效率高的特点,又利用了无词典分词结合上下文识别生词、自动消除歧义的优点。The word segmentation method based on string matching is to match the Chinese character string to be segmented with an entry in a dictionary according to a certain strategy. If a string is found in the dictionary, the matching is successful, that is, a word is recognized. The word segmentation method based on comprehension achieves the effect of word recognition by letting the computer simulate human's understanding of the sentence. The statistical-based word segmentation method should use the basic word segmentation dictionary for string matching and segmentation, and at the same time use statistical methods to identify some new words, that is, the combination of string frequency statistics and string matching, which not only exerts the characteristics of fast matching segmentation speed and high efficiency, It also uses the advantages of no dictionary word segmentation combined with context to identify new words and automatically eliminate ambiguity.
107、对该N个关键词进行词性分类,根据所述N个关键词的词性将该N个关键词按照预设问句语序进行组合以得到该文本数据的问题。107. Perform a part-of-speech classification on the N keywords, and combine the N keywords according to the part of speech of the N keywords according to a preset question sentence order to obtain the text data question.
108、通过神经网络语义表示模型计算该文本数据的问题和该文本数据中的文本的语义相关程度,确定语义相关程度最高的文本为该文本数据的问题对应的答案。108. Calculate the degree of semantic relevance between the question of the text data and the text in the text data through the neural network semantic representation model, and determine the text with the highest degree of semantic relevance as the answer corresponding to the question of the text data.
其中,通过神经网络语义表示模型计算该文本数据的问题和该文本数据中的文本的语义相关程度包括:Among them, the problem of calculating the text data through the neural network semantic representation model and the semantic correlation degree of the text in the text data include:
将该文本数据的问题和该文本数据中的文本输入到神经网络语义表示模型中,使用神经网络对该文本数据的问题和该文本数据中的文本进行编码,通过对语义的挖掘获得其向量表示,最终通过计算该文本数据的问题和该文本数据中的文本的语义向量的相似度得到其语义相关程度。其中,计算该文本数据的问题与该文本数据中的文本的语义相关程度的方法可以是词汇重叠法、字符串法、余弦相似度法或者最大共同子序列法。Input the question of the text data and the text in the text data into the neural network semantic representation model, use the neural network to encode the question of the text data and the text in the text data, and obtain its vector representation through semantic mining Finally, the degree of semantic relevance is obtained by calculating the similarity between the question of the text data and the semantic vector of the text in the text data. Wherein, the method for calculating the degree of semantic relevance between the question of the text data and the text in the text data may be a vocabulary overlap method, a string method, a cosine similarity method or a maximum common subsequence method.
具体过程为,在该文本数据中查找与该N个关键词匹配的Q段文本,其中,Q为正整数。计算该文本数据的问题与该Q段文本的Q个语义相关程度,其中,该Q段文本与该Q个语义相关程度一一对应。获取该Q个语义相关程度中最高的第一语义相关程度,确定该第一语义相关程度对应的文本为该文本数据的问题对应的答案。The specific process is to search for Q segments of text matching the N keywords in the text data, where Q is a positive integer. The question of calculating the text data is related to the Q semantic relevance degrees of the Q segment text, where the Q segment text corresponds to the Q semantic relevance degrees one-to-one. Obtain the highest first semantic relevance degree among the Q semantic relevance degrees, and determine that the text corresponding to the first semantic relevance degree is the answer corresponding to the question of the text data.
参见图2,图2为本申请的另一个实施例提供的另一种数据处理的方法的流程图。其中,如图2所示,本申请的另一个实施例提供的另一种数据处理的方法可以包括:Refer to FIG. 2, which is a flowchart of another data processing method provided by another embodiment of the application. Wherein, as shown in FIG. 2, another data processing method provided by another embodiment of the present application may include:
201、终端将书籍的图像数据发给人工智能服务器。201. The terminal sends the image data of the book to the artificial intelligence server.
其中,终端可以是手机、平板电脑、笔记本电脑、掌上电脑、移动互联网设备、或其他类型的终端。Among them, the terminal can be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile Internet device, or other types of terminals.
如果孩童或者学生阅读的书籍是纸质书籍,那么先对纸质书籍进行扫描以得到纸质书籍的扫描图像,然后终端将扫描图像发给人工智能服务器。If the books read by children or students are paper books, the paper books are scanned first to obtain scanned images of the paper books, and then the terminal sends the scanned images to the artificial intelligence server.
202、人工智能服务器对该图像数据进行字符识别处理以得到该图像数据对应的文本数据。202. The artificial intelligence server performs character recognition processing on the image data to obtain text data corresponding to the image data.
可选的,当该图像数据为扫描图像时,由于扫描图像是由扫描工具扫描生成的,所以可能出现部分未扫描到或者扫描不清晰的问题,也可能出现扫描歪斜的问题,所以在对图像数据进行字符识别处理之前,需要对图像数据进行标准化处理,其中,对图像数据进行 标准化处理的方法可以是:Optionally, when the image data is a scanned image, because the scanned image is scanned and generated by a scanning tool, there may be problems that some parts are not scanned or the scan is not clear, or the scan is skewed. Before data is processed for character recognition, it is necessary to standardize the image data. The method for standardizing the image data can be:
当该图像数据的倾斜度超过预设倾斜度阈值时,通过图像校正算法对该图像数据进行处理,其中,图像校正算法包括拉东算法、霍夫变换和线性回归算法的任意一种。When the inclination of the image data exceeds the preset inclination threshold, the image data is processed by an image correction algorithm, where the image correction algorithm includes any one of Radon algorithm, Hough transform, and linear regression algorithm.
或者,当该图像数据的清晰度低于预设清晰度阈值时,通过图像增强算法对该图像数据进行处理,其中,图像增强算法包括直方图均衡、图像平滑、图像锐化的任意一种。Or, when the definition of the image data is lower than the preset definition threshold, the image data is processed by an image enhancement algorithm, where the image enhancement algorithm includes any one of histogram equalization, image smoothing, and image sharpening.
或者,当该图像数据的倾斜度超过预设倾斜度阈值以及该图像数据的清晰度低于预设清晰度阈值时,通过图像校正算法以及图像增强算法对该图像数据进行处理。Or, when the inclination of the image data exceeds the preset inclination threshold and the definition of the image data is lower than the preset definition threshold, the image data is processed through an image correction algorithm and an image enhancement algorithm.
当该图像数据为扫描图像时,由于扫描图像不能直接被识别,所以需要人工智能服务器对该图像数据进行字符识别处理以得到该图像数据对应的文本数据,文本数据可以直接被识别。When the image data is a scanned image, since the scanned image cannot be directly recognized, an artificial intelligence server is required to perform character recognition processing on the image data to obtain text data corresponding to the image data, and the text data can be directly recognized.
其中,人工智能服务器对该图像数据进行字符识别处理以得到该图像数据对应的文本数据的方法可以是:The method for the artificial intelligence server to perform character recognition processing on the image data to obtain the text data corresponding to the image data may be:
对该图像数据进行字符切割以得到M个字符,其中,M为正整数。Character cutting is performed on the image data to obtain M characters, where M is a positive integer.
对M个字符进行特征提取以得到M个字符特征,其中,M个字符与M个字符特征一一对应。Perform feature extraction on M characters to obtain M character features, where M characters correspond to M character features one-to-one.
将M个字符特征与字符特征数据库进行对比以识别M个字符特征对应的M个文本字符,其中,M个字符特征与M个文本字符一一对应,其中,比对方法包括欧式空间的比对方法、松弛比对法(Relaxation)、动态程序比对法(Dynamic Programming,DP)、类神经网络的数据库建立及比对法、HMM(Hidden Markov Model)等方法。Compare M character features with a character feature database to identify M text characters corresponding to M character features, where M character features correspond to M text characters one-to-one, and the comparison method includes comparison in Euclidean space Methods, relaxation comparison method (Relaxation), dynamic programming method (Dynamic Programming, DP), neural network-like database establishment and comparison method, HMM (Hidden Markov Model) and other methods.
将M个文本字符进行组合以得到该图像数据对应的文本数据。Combine M text characters to obtain text data corresponding to the image data.
203、人工智能服务器识别该文本数据的语言类型是否满足预设语言类型。203. The artificial intelligence server recognizes whether the language type of the text data meets the preset language type.
其中,语言类型包括中文、英文、日文等等,预设语言类型包括中文。Among them, the language types include Chinese, English, Japanese, etc., and the preset language types include Chinese.
204、当该文本数据的语言类型不满足该预设语言类型时,人工智能服务器识别该文本数据的文体类型是否满足预设文体类型。204. When the language type of the text data does not meet the preset language type, the artificial intelligence server recognizes whether the style type of the text data meets the preset style type.
其中,文体类型包括现代文体(包括小说、散文、童话、记叙文、说明文、议论文等等)和古代文体(包括诗、词、歌、赋等等),预设文体类型包括现代文体。Among them, the stylistic types include modern styles (including novels, prose, fairy tales, narratives, explanatory essays, argumentative essays, etc.) and ancient styles (including poems, words, songs, fu, etc.), and the preset styles include modern styles.
205、当该文本数据的文体类型不满足该预设文体类型时,人工智能服务器向终端发送语言及文体类型错误消息。205. When the style type of the text data does not meet the preset style type, the artificial intelligence server sends a language and style type error message to the terminal.
206、终端生成提示该书籍的语言及文体类型错误的弹窗或者界面。206. The terminal generates a pop-up window or interface prompting that the language and style of the book are wrong.
例如,人工智能服务器识别出文本数据的语言类型为日文,文体类型为古代文体,那么人工智能服务器向终端发送语言及文体类型错误消息,终端接收语言及文体类型错误消息时,生成提示书籍的语言类型不能为日文以及文体类型不能为古代文体的弹窗或者界面。For example, if the artificial intelligence server recognizes that the language type of the text data is Japanese and the style is ancient style, then the artificial intelligence server sends a language and style type error message to the terminal, and when the terminal receives the language and style type error message, it generates a language that prompts the book The type cannot be Japanese and the style cannot be the pop-up window or interface of the ancient style.
参见图3,图3为本申请的另一个实施例提供的另一种数据处理的方法的流程图。其中,如图3所示,本申请的另一个实施例提供的另一种数据处理的方法可以包括:Refer to Fig. 3, which is a flowchart of another data processing method provided by another embodiment of the application. Wherein, as shown in FIG. 3, another data processing method provided by another embodiment of the present application may include:
301、终端将书籍的图像数据发给人工智能服务器。301. The terminal sends the image data of the book to the artificial intelligence server.
其中,终端可以是手机、平板电脑、笔记本电脑、掌上电脑、移动互联网设备、或其他类型的终端。Among them, the terminal can be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile Internet device, or other types of terminals.
孩童或者学生阅读的书籍是纸质书籍,先通过终端对纸质书籍进行扫描以得到纸质书籍的扫描图像,然后终端将扫描图像发给人工智能服务器。The books read by children or students are paper books. The paper books are scanned through the terminal to obtain scanned images of the paper books, and then the terminal sends the scanned images to the artificial intelligence server.
302、当该图像数据的倾斜度超过预设倾斜度阈值时,人工智能服务器通过图像校正算法对该图像数据进行处理。302. When the inclination of the image data exceeds a preset inclination threshold, the artificial intelligence server processes the image data by using an image correction algorithm.
当该图像数据为扫描图像时,由于扫描图像是由扫描工具扫描生成的,所以可能出现部分未扫描到或者扫描不清晰的问题,也可能出现扫描歪斜的问题,所以需要通过图像校正算法对该图像数据进行处理,其中,图像校正算法包括拉东算法、霍夫变换和线性回归算法的任意一种。When the image data is a scanned image, because the scanned image is scanned and generated by a scanning tool, there may be some problems that are not scanned or the scan is not clear, and the scan skew may also occur. Therefore, the image correction algorithm is required to The image data is processed, and the image correction algorithm includes any one of Radon algorithm, Hough transform and linear regression algorithm.
303、当该图像数据的清晰度低于预设清晰度阈值时,人工智能服务器通过图像增强算法对该图像数据进行处理。303. When the definition of the image data is lower than the preset definition threshold, the artificial intelligence server processes the image data by using an image enhancement algorithm.
其中,图像增强算法包括直方图均衡、图像平滑、图像锐化的任意一种。Among them, the image enhancement algorithm includes any of histogram equalization, image smoothing, and image sharpening.
304、人工智能服务器对该图像数据进行字符切割以得到M个字符,其中,M为正整数。304. The artificial intelligence server performs character cutting on the image data to obtain M characters, where M is a positive integer.
305、人工智能服务器对M个字符进行特征提取以得到M个字符特征。305. The artificial intelligence server performs feature extraction on the M characters to obtain M character features.
其中,M个字符与M个字符特征一一对应,特征提取可分为两类:一类为统计的特征,获取该图像数据的字符区域内的黑点数比或白点数比,当字符区域分成好几个区域时,这一个个区域黑点数比或白点数比联合成空间的一个数值向量,而另一类为结构的特征,对该图像数据的字符进行细线化处理后,获取字符的笔划端点和交叉点的数量及位置。Among them, M characters correspond to M character features one-to-one, and feature extraction can be divided into two categories: one is statistical features, the ratio of the number of black points or the number of white points in the character area of the image data is obtained, when the character area is divided into When there are several areas, the black point ratio or white point ratio of each area is combined into a numerical vector of space, and the other type is structural feature. After the characters of the image data are thinned, the strokes of the characters are obtained The number and location of endpoints and intersections.
306、人工智能服务器将M个字符特征与字符特征数据库进行对比以识别M个字符特征对应的M个文本字符。306. The artificial intelligence server compares the M character features with the character feature database to identify M text characters corresponding to the M character features.
其中,M个字符特征与M个文本字符一一对应,其中,比对方法包括欧式空间的比对方法、松弛比对法(Relaxation)、动态程序比对法(Dynamic Programming,DP)、类神经网络的数据库建立及比对法、HMM(Hidden Markov Model)等方法。Among them, M character features correspond to M text characters one-to-one. Among them, the comparison methods include the comparison method of Euclidean space, relaxation comparison method (Relaxation), dynamic programming comparison method (Dynamic Programming, DP), neural Network database establishment and comparison method, HMM (Hidden Markov Model) and other methods.
307、人工智能服务器将M个文本字符进行组合以得到该图像数据对应的文本数据。307. The artificial intelligence server combines M text characters to obtain text data corresponding to the image data.
308、人工智能服务器对该文本数据进行文本类型检测以判断该文本数据的文本类型是否满足预设文本类型。308. The artificial intelligence server performs text type detection on the text data to determine whether the text type of the text data meets the preset text type.
可选的,文本类型包括语言类型和文体类型,语言类型包括中文、英文、日文等等,文体类型包括现代文体(包括小说、散文、童话、记叙文、说明文、议论文等等)和古代文体(包括诗、词、歌、赋等等)。Optionally, the text type includes language type and style type, language type includes Chinese, English, Japanese, etc., style type includes modern style (including novel, prose, fairy tale, narrative, explanatory, argumentative, etc.) and ancient style ( Including poems, words, songs, fu, etc.).
人工智能服务器对该文本数据进行文本类型检测以判断该文本数据的文本类型是否满足预设文本类型的方法可以是:The method for the artificial intelligence server to perform text type detection on the text data to determine whether the text type of the text data meets the preset text type may be:
对该文本数据进行语言类型检测以得到该文本数据的语言类型,对该文本数据进行文体类型检测以得到该文本数据的文体类型。Performing language type detection on the text data to obtain the language type of the text data, and performing style type detection on the text data to obtain the style type of the text data.
当该文本数据的语言类型满足预设语言类型以及该文本数据的文体类型满足预设文体类型时,确定该文本数据的文本类型满足该预设文本类型,其中,该预设语言类型包括中文,该预设文体类型包括现代文体。When the language type of the text data satisfies the preset language type and the style type of the text data satisfies the preset style type, it is determined that the text type of the text data satisfies the preset text type, wherein the preset language type includes Chinese, The preset style includes modern style.
当该文本数据的语言类型不满足该预设语言类型、或者该文本数据的文体类型不满足 该预设文体类型、或者该文本数据的语言类型不满足该预设语言类型以及该文本数据的文体类型不满足该预设文体类型时,确定该文本数据的文本类型不满足该预设文本类型。When the language type of the text data does not meet the preset language type, or the style type of the text data does not meet the preset style type, or the language type of the text data does not meet the preset language type and the style of the text data When the type does not meet the preset text type, it is determined that the text type of the text data does not meet the preset text type.
309、当文本类型满足该预设文本类型时,将该文本数据输入神经网络编码器以得到该文本数据的摘要向量。309. When the text type meets the preset text type, input the text data into the neural network encoder to obtain a summary vector of the text data.
其中,该神经网络编码器用于对该文本数据进行压缩编码,由递归神经网络(RNN)实现,神经网络编码器接收输入的文本数据,在开始时刻将原文本数据中的字输入到神经网络,将这个字压缩成一个向量,然后将压缩得到的向量传入下一时刻,下一时刻将上一时刻的压缩向量和原文本数据中的字输入到神经网络,再将压缩得到的新向量传入下一时刻,在压缩完所有的文本数据后得到的编码向量,即是文本数据的摘要向量。Among them, the neural network encoder is used to compress and encode the text data, which is implemented by a recurrent neural network (RNN). The neural network encoder receives the input text data, and inputs the words in the original text data into the neural network at the beginning. Compress this word into a vector, and then pass the compressed vector to the next moment. In the next moment, input the compressed vector at the previous moment and the word in the original text data to the neural network, and then transfer the compressed new vector At the next moment, the code vector obtained after compressing all the text data is the summary vector of the text data.
310、将该文本数据的摘要向量输入神经网络解码器以得到该文本数据的摘要。310. Input the summary vector of the text data into a neural network decoder to obtain a summary of the text data.
其中,该神经网络解码器用于对该文本数据的摘要向量进行解码,也由递归神经网络(RNN)实现,将该文本数据的摘要向量输入神经网络解码器后,神经网络解码器对该文本数据的摘要向量进行预测得到一个时刻的输出字,然后神经网络解码器再根据该时刻的输出字和摘要向量进行预测得到下一个时刻的输出字,以此类推,上一个时刻的输出字会影响下一个输出字,最后神经网络解码器得到的所有的输出字连接起来即为该文本数据的摘要。Among them, the neural network decoder is used to decode the summary vector of the text data, and it is also implemented by a recurrent neural network (RNN). After the summary vector of the text data is input to the neural network decoder, the neural network decoder Predicts the output word at a moment by using the summary vector of the, and then the neural network decoder predicts the output word at the next moment according to the output word and summary vector at that moment, and so on, the output word at the previous moment will affect the next An output word, and finally all the output words obtained by the neural network decoder are connected to form a summary of the text data.
311、提取该文本数据的摘要中的N个关键词,其中,N为正整数。311. Extract N keywords in the abstract of the text data, where N is a positive integer.
可选的,提取该文本数据的摘要中的N个关键词的方法可以是:Optionally, the method for extracting the N keywords in the abstract of the text data may be:
对该文本数据的摘要进行分词处理以得到该文本数据的摘要对应的K个分词,其中,K为大于N的正整数。Perform word segmentation processing on the summary of the text data to obtain K word segmentation corresponding to the summary of the text data, where K is a positive integer greater than N.
计算该K个分词对应的K个词频,其中,该K个分词与该K个词频一一对应。Calculate the K word frequencies corresponding to the K word segmentation, where the K word segments correspond to the K word frequencies one-to-one.
按照词频由大到小的顺序确定该K个分词中的N个分词,提取该N个分词。Determine the N participles of the K participles according to the order of the word frequency, and extract the N participles.
其中,对该文本数据的摘要进行分词处理的方法有基于字符串匹配的分词方法、基于理解的分词方法和基于统计的分词方法。Among them, the word segmentation method for the abstract of the text data includes a word segmentation method based on string matching, a word segmentation method based on understanding, and a word segmentation method based on statistics.
基于字符串匹配的分词方法是按照一定的策略将待分词的汉字串与一个词典中的词条进行匹配,若在词典中找到某个字符串,则匹配成功,即识别出一个词。基于理解的分词方法是通过让计算机模拟人对句子的理解,达到识别词的效果。基于统计的分词方法要使用基本的分词词典进行串匹配分词,同时使用统计方法识别一些新的词,即将串频统计和串匹配结合起来,既发挥匹配分词切分速度快、效率高的特点,又利用了无词典分词结合上下文识别生词、自动消除歧义的优点。The word segmentation method based on string matching is to match the Chinese character string to be segmented with an entry in a dictionary according to a certain strategy. If a string is found in the dictionary, the matching is successful, that is, a word is recognized. The word segmentation method based on comprehension achieves the effect of word recognition by letting the computer simulate human's understanding of the sentence. The statistical-based word segmentation method should use the basic word segmentation dictionary for string matching and segmentation, and at the same time use statistical methods to identify some new words, that is, the combination of string frequency statistics and string matching, which not only exerts the characteristics of fast matching segmentation speed and high efficiency, It also uses the advantages of no dictionary word segmentation combined with context to identify new words and automatically eliminate ambiguity.
312、将该N个关键词进行组合以得到该文本数据的问题。312. Combine the N keywords to obtain the text data question.
313、通过神经网络语义表示模型对该文本数据的问题和该文本数据进行处理,以得到该文本数据的问题对应的答案。313. Process the text data question and the text data through the neural network semantic representation model to obtain the answer corresponding to the text data question.
其中,通过神经网络语义表示模型计算该文本数据的问题和该文本数据中的文本的语义相关程度包括:Among them, the problem of calculating the text data through the neural network semantic representation model and the semantic correlation degree of the text in the text data include:
将该文本数据的问题和该文本数据中的文本输入到神经网络语义表示模型中,使用神经网络对该文本数据的问题和该文本数据中的文本进行编码,通过对语义的挖掘获得其向 量表示,最终通过计算该文本数据的问题和该文本数据中的文本的语义向量的相似度得到其语义相关程度。其中,计算该文本数据的问题与该文本数据中的文本的语义相关程度的方法可以是词汇重叠法、字符串法、余弦相似度法或者最大共同子序列法。Input the question of the text data and the text in the text data into the neural network semantic representation model, use the neural network to encode the question of the text data and the text in the text data, and obtain its vector representation through semantic mining Finally, the degree of semantic relevance is obtained by calculating the similarity between the question of the text data and the semantic vector of the text in the text data. Wherein, the method for calculating the degree of semantic relevance between the question of the text data and the text in the text data may be a vocabulary overlap method, a string method, a cosine similarity method or a maximum common subsequence method.
具体过程为,在该文本数据中查找与该N个关键词匹配的Q段文本,其中,Q为正整数。The specific process is to search for Q segments of text matching the N keywords in the text data, where Q is a positive integer.
计算该文本数据的问题与该Q段文本的Q个语义相关程度,其中,该Q段文本与该Q个语义相关程度一一对应。The question of calculating the text data is related to the Q semantic relevance degrees of the Q segment text, where the Q segment text corresponds to the Q semantic relevance degrees one-to-one.
获取该Q个语义相关程度中最高的第一语义相关程度,确定该第一语义相关程度对应的文本为该文本数据的问题对应的答案。Obtain the highest first semantic relevance degree among the Q semantic relevance degrees, and determine that the text corresponding to the first semantic relevance degree is the answer corresponding to the question of the text data.
参见图4,图4为本申请一个实施例提供的一种系统结构示意图。其中,如图4所示,系统包括人工智能服务器和终端,人工智能服务器与终端通信连接,其中,终端包括手机和电脑,用户通过终端访问人工智能服务器,当终端为手机时,用户可以通过手机对要进行处理的书籍进行拍照,将照片发送给人工智能服务器,人工智能服务器对照片进行处理,得到处理结果,然后将处理结果返回给用户手机,当终端为电脑时,用户可以通过与电脑连接的打印机等扫描设备对书籍进行扫描,然后将扫描图像发送给人工智能服务器,人工智能服务器对扫描图像进行处理,得到处理结果,然后将处理结果返回给用户电脑。Refer to FIG. 4, which is a schematic diagram of a system structure provided by an embodiment of this application. Among them, as shown in Figure 4, the system includes an artificial intelligence server and a terminal. The artificial intelligence server communicates with the terminal. The terminal includes a mobile phone and a computer. The user accesses the artificial intelligence server through the terminal. When the terminal is a mobile phone, the user can use the mobile phone Take photos of the books to be processed, send the photos to the artificial intelligence server, the artificial intelligence server processes the photos, obtains the processing results, and then returns the processing results to the user’s mobile phone. When the terminal is a computer, the user can connect to the computer through Scanning equipment, such as printers, scans the book, and then sends the scanned image to the artificial intelligence server. The artificial intelligence server processes the scanned image to obtain the processing result, and then returns the processing result to the user's computer.
参见图5,图5为本申请一个实施例提供的一种对图像数据进行字符识别处理的示意图。其中,如图5所示,图像数据上显示的为ABCDE,首先对图像数据进行字符切割,可以得到五个字符,分别为A、B、C、D、E,然后对得到的字符进行特征提取,以分别得到五个字符的特征,依次为特征a、特征b、特征c、特征d和特征e,得到特征后,进行对比识别,以确定特征对应的文本字符,依次为文本字符A、文本字符B、文本字符C、文本字符D和文本字符E,得到文本字符后,对所有的文本字符进行组合,即得到文本ABCDE。Referring to FIG. 5, FIG. 5 is a schematic diagram of performing character recognition processing on image data according to an embodiment of the application. Among them, as shown in Figure 5, the image data is displayed as ABCDE. First, the image data is cut into characters, and five characters can be obtained, namely A, B, C, D, and E, and then feature extraction of the obtained characters , To obtain the features of the five characters respectively, which are feature a, feature b, feature c, feature d, and feature e. After the features are obtained, compare and recognize to determine the text characters corresponding to the features, which are text character A, text After the character B, the text character C, the text character D and the text character E are obtained, all the text characters are combined to obtain the text ABCDE.
参见图6,图6为本申请的另一个实施例提供的一种数据处理的装置的示意图。其中,如图6所示,本申请的另一个实施例提供的一种数据处理的装置可以包括:Referring to FIG. 6, FIG. 6 is a schematic diagram of a data processing apparatus provided by another embodiment of the application. Wherein, as shown in FIG. 6, a data processing apparatus provided by another embodiment of the present application may include:
获取模块601,用于获取终端发送的书籍的图像数据;The obtaining module 601 is used to obtain image data of books sent by the terminal;
字符识别模块602,用于对所述图像数据进行字符识别处理以得到所述图像数据对应的文本数据;The character recognition module 602 is configured to perform character recognition processing on the image data to obtain text data corresponding to the image data;
检测模块603,用于对所述文本数据进行文本类型检测以判断所述文本数据的文本类型是否满足预设文本类型;The detection module 603 is configured to perform text type detection on the text data to determine whether the text type of the text data meets the preset text type;
编码模块604,用于当所述文本类型满足所述预设文本类型时,将所述文本数据输入神经网络编码器以得到所述文本数据的摘要向量,其中,所述神经网络编码器用于对所述文本数据进行压缩编码;The encoding module 604 is configured to input the text data into a neural network encoder to obtain a summary vector of the text data when the text type meets the preset text type, wherein the neural network encoder is used to Compressing and encoding the text data;
解码模块605,用于将所述文本数据的摘要向量输入神经网络解码器以得到所述文本数据的摘要,其中,所述神经网络解码器用于通过神经网络对所述文本数据的摘要向量进行预测以得到多个预测的字,所述多个预测的字连接为所述文本数据的摘要;The decoding module 605 is configured to input the summary vector of the text data into a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network To obtain a plurality of predicted words, and the plurality of predicted words are connected as a summary of the text data;
提取模块606,用于对所述文本数据的摘要进行分词处理,按照词频由大到小的顺序提取所述文本数据的摘要中的N个关键词,其中,N为正整数;The extraction module 606 is configured to perform word segmentation processing on the abstract of the text data, and extract N keywords in the abstract of the text data in the order of word frequency from large to small, where N is a positive integer;
组合模块607,用于对所述N个关键词进行词性分类,根据所述N个关键词的词性将所述N个关键词按照预设问句语序进行组合以得到所述文本数据的问题;The combination module 607 is configured to classify the N keywords by part of speech, and combine the N keywords according to the part of speech of the N keywords according to a preset question order to obtain the text data question;
处理模块608,用于通过神经网络语义表示模型计算所述文本数据的问题和所述文本数据中的文本的语义相关程度,确定语义相关程度最高的文本为所述文本数据的问题对应的答案。The processing module 608 is configured to calculate the degree of semantic relevance between the question of the text data and the text in the text data through the neural network semantic representation model, and determine the text with the highest degree of semantic relevance as the answer corresponding to the question of the text data.
本申请数据处理的装置的具体实施可参见上述数据处理的方法的各实施例,在此不做赘述。For the specific implementation of the data processing device of the present application, please refer to the various embodiments of the above data processing method, which will not be repeated here.
参见图7,图7为本申请的实施例涉及的硬件运行环境的电子设备结构示意图。其中,如图7所示,本申请的实施例涉及的硬件运行环境的电子设备可以包括:Referring to FIG. 7, FIG. 7 is a schematic structural diagram of an electronic device in a hardware operating environment involved in an embodiment of the application. Wherein, as shown in FIG. 7, the electronic device of the hardware operating environment involved in the embodiment of the present application may include:
处理器701,例如CPU。The processor 701 is, for example, a CPU.
存储器702,可选的,存储器可以为高速RAM存储器,也可以是稳定的存储器,例如磁盘存储器。The memory 702, optionally, the memory may be a high-speed RAM memory, or a stable memory, such as a disk memory.
通信接口703,用于实现处理器701和存储器702之间的连接通信。The communication interface 703 is used to implement connection and communication between the processor 701 and the memory 702.
本领域技术人员可以理解,图7中示出的数据处理的电子设备的结构并不构成对数据处理的电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure of the data processing electronic device shown in FIG. 7 does not constitute a limitation on the data processing electronic device, and may include more or less components than shown in the figure, or a combination of certain components , Or different component arrangements.
如图7所示,存储器702中可以包括操作系统、网络通信模块以及数据处理的程序。操作系统是管理和控制数据处理的电子设备硬件和软件资源的程序,支持数据处理的程序以及其他软件或程序的运行。网络通信模块用于实现存储器702内部各组件之间的通信,以及与数据处理的电子设备中其他硬件和软件之间通信。As shown in FIG. 7, the memory 702 may include an operating system, a network communication module, and data processing programs. The operating system is a program that manages and controls the hardware and software resources of an electronic device for data processing, a program that supports data processing, and the operation of other software or programs. The network communication module is used to implement communication between various components in the memory 702, and communication with other hardware and software in the data processing electronic device.
在图7所示的数据处理的电子设备中,处理器701用于执行存储器702中存储的数据处理的程序,实现以下步骤:In the data processing electronic device shown in FIG. 7, the processor 701 is configured to execute the data processing program stored in the memory 702, and implement the following steps:
获取终端发送的书籍的图像数据;Acquiring the image data of the book sent by the terminal;
对所述图像数据进行字符识别处理以得到所述图像数据对应的文本数据;Performing character recognition processing on the image data to obtain text data corresponding to the image data;
对所述文本数据进行文本类型检测以判断所述文本数据的文本类型是否满足预设文本类型;Performing text type detection on the text data to determine whether the text type of the text data meets the preset text type;
当所述文本类型满足所述预设文本类型时,将所述文本数据输入神经网络编码器以得到所述文本数据的摘要向量,其中,所述神经网络编码器用于对所述文本数据进行压缩编码;When the text type satisfies the preset text type, input the text data into a neural network encoder to obtain a summary vector of the text data, wherein the neural network encoder is used to compress the text data coding;
将所述文本数据的摘要向量输入神经网络解码器以得到所述文本数据的摘要,其中,所述神经网络解码器用于通过神经网络对所述文本数据的摘要向量进行预测以得到多个预测的字,所述多个预测的字连接为所述文本数据的摘要;The summary vector of the text data is input to a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network to obtain multiple predictions Words, the plurality of predicted words are connected as a summary of the text data;
对所述文本数据的摘要进行分词处理,按照词频由大到小的顺序提取所述文本数据的摘要中的N个关键词,其中,N为正整数;Perform word segmentation processing on the abstract of the text data, and extract N keywords in the abstract of the text data in the order of word frequency from large to small, where N is a positive integer;
对所述N个关键词进行词性分类,根据所述N个关键词的词性将所述N个关键词按照预设问句语序进行组合以得到所述文本数据的问题;Classify the N keywords by part of speech, and combine the N keywords according to the part of speech of the N keywords according to a preset question sentence order to obtain the text data question;
通过神经网络语义表示模型计算所述文本数据的问题和所述文本数据中的文本的语义 相关程度,确定语义相关程度最高的文本为所述文本数据的问题对应的答案。A neural network semantic representation model is used to calculate the degree of semantic correlation between the question of the text data and the text in the text data, and the text with the highest degree of semantic correlation is determined as the answer corresponding to the question of the text data.
本申请数据处理的电子设备的具体实施可参见上述数据处理的方法的各实施例,在此不做赘述。For the specific implementation of the electronic device for data processing in this application, please refer to each embodiment of the above data processing method, which will not be repeated here.
本申请的另一个实施例提供了一种计算机可读存储介质,该计算机可读存储介质可以为非易失性的计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行以实现以下步骤:Another embodiment of the present application provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium stores a computer program, and the computer program is processed. Execute to achieve the following steps:
获取终端发送的书籍的图像数据;Acquiring the image data of the book sent by the terminal;
对所述图像数据进行字符识别处理以得到所述图像数据对应的文本数据;Performing character recognition processing on the image data to obtain text data corresponding to the image data;
对所述文本数据进行文本类型检测以判断所述文本数据的文本类型是否满足预设文本类型;Performing text type detection on the text data to determine whether the text type of the text data meets the preset text type;
当所述文本类型满足所述预设文本类型时,将所述文本数据输入神经网络编码器以得到所述文本数据的摘要向量,其中,所述神经网络编码器用于对所述文本数据进行压缩编码;When the text type satisfies the preset text type, input the text data into a neural network encoder to obtain a summary vector of the text data, wherein the neural network encoder is used to compress the text data coding;
将所述文本数据的摘要向量输入神经网络解码器以得到所述文本数据的摘要,其中,所述神经网络解码器用于通过神经网络对所述文本数据的摘要向量进行预测以得到多个预测的字,所述多个预测的字连接为所述文本数据的摘要;The summary vector of the text data is input to a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network to obtain multiple predictions Words, the plurality of predicted words are connected as a summary of the text data;
对所述文本数据的摘要进行分词处理,按照词频由大到小的顺序提取所述文本数据的摘要中的N个关键词,其中,N为正整数;Perform word segmentation processing on the abstract of the text data, and extract N keywords in the abstract of the text data in the order of word frequency from large to small, where N is a positive integer;
对所述N个关键词进行词性分类,根据所述N个关键词的词性将所述N个关键词按照预设问句语序进行组合以得到所述文本数据的问题;Classify the N keywords by part of speech, and combine the N keywords according to the part of speech of the N keywords according to a preset question sentence order to obtain the text data question;
通过神经网络语义表示模型计算所述文本数据的问题和所述文本数据中的文本的语义相关程度,确定语义相关程度最高的文本为所述文本数据的问题对应的答案。A neural network semantic representation model is used to calculate the degree of semantic relevance between the question of the text data and the text in the text data, and the text with the highest degree of semantic relevance is determined as the answer corresponding to the question of the text data.
本申请计算机可读存储介质的具体实施可参见上述数据处理的方法的各实施例,在此不做赘述。For the specific implementation of the computer-readable storage medium of the present application, please refer to the various embodiments of the foregoing data processing method, which will not be repeated here.
还需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。It should also be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that this application is not limited by the described sequence of actions , Because according to this application, some steps can be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by this application. In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.

Claims (20)

  1. 一种数据处理的方法,其特征在于,包括:A data processing method, characterized in that it comprises:
    获取终端发送的书籍的图像数据;Acquiring the image data of the book sent by the terminal;
    对所述图像数据进行字符识别处理以得到所述图像数据对应的文本数据;Performing character recognition processing on the image data to obtain text data corresponding to the image data;
    对所述文本数据进行文本类型检测以判断所述文本数据的文本类型是否满足预设文本类型;Performing text type detection on the text data to determine whether the text type of the text data meets the preset text type;
    当所述文本类型满足所述预设文本类型时,将所述文本数据输入神经网络编码器以得到所述文本数据的摘要向量,其中,所述神经网络编码器用于对所述文本数据进行压缩编码;When the text type satisfies the preset text type, input the text data into a neural network encoder to obtain a summary vector of the text data, wherein the neural network encoder is used to compress the text data coding;
    将所述文本数据的摘要向量输入神经网络解码器以得到所述文本数据的摘要,其中,所述神经网络解码器用于通过神经网络对所述文本数据的摘要向量进行预测以得到多个预测的字,所述多个预测的字连接为所述文本数据的摘要;The summary vector of the text data is input to a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network to obtain multiple predictions Words, the plurality of predicted words are connected as a summary of the text data;
    对所述文本数据的摘要进行分词处理,按照词频由大到小的顺序提取所述文本数据的摘要中的N个关键词,其中,N为正整数;Perform word segmentation processing on the abstract of the text data, and extract N keywords in the abstract of the text data in the order of word frequency from large to small, where N is a positive integer;
    对所述N个关键词进行词性分类,根据所述N个关键词的词性将所述N个关键词按照预设问句语序进行组合以得到所述文本数据的问题;Classify the N keywords by part of speech, and combine the N keywords according to the part of speech of the N keywords according to a preset question sentence order to obtain the text data question;
    通过神经网络语义表示模型计算所述文本数据的问题和所述文本数据中的文本的语义相关程度,确定语义相关程度最高的文本为所述文本数据的问题对应的答案。A neural network semantic representation model is used to calculate the degree of semantic relevance between the question of the text data and the text in the text data, and the text with the highest degree of semantic relevance is determined as the answer corresponding to the question of the text data.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述图像数据进行字符识别处理以得到所述图像数据对应的文本数据之前,包括:The method according to claim 1, wherein before performing character recognition processing on the image data to obtain text data corresponding to the image data, the method comprises:
    当所述图像数据的倾斜度超过预设倾斜度阈值时,通过图像校正算法对所述图像数据进行处理,其中,所述图像校正算法包括拉东算法、霍夫变换和线性回归算法的任意一种;When the inclination of the image data exceeds the preset inclination threshold, the image data is processed by an image correction algorithm, wherein the image correction algorithm includes any one of Radon algorithm, Hough transform, and linear regression algorithm Species
    或者,当所述图像数据的清晰度低于预设清晰度阈值时,通过图像增强算法对所述图像数据进行处理,其中,所述图像增强算法包括直方图均衡、图像平滑、图像锐化的任意一种;Or, when the definition of the image data is lower than a preset definition threshold, the image data is processed by an image enhancement algorithm, where the image enhancement algorithm includes histogram equalization, image smoothing, and image sharpening. Any kind
    或者,当所述图像数据的倾斜度超过所述预设倾斜度阈值以及所述图像数据的清晰度低于所述预设清晰度阈值时,通过所述图像校正算法以及所述图像增强算法对所述图像数据进行处理。Alternatively, when the inclination of the image data exceeds the preset inclination threshold and the definition of the image data is lower than the preset definition threshold, the image correction algorithm and the image enhancement algorithm are used to compare The image data is processed.
  3. 根据权利要求2所述的方法,其特征在于,所述对所述图像数据进行字符识别处理以得到所述图像数据对应的文本数据包括:The method according to claim 2, wherein said performing character recognition processing on said image data to obtain text data corresponding to said image data comprises:
    对所述图像数据进行字符切割以得到M个字符,其中,M为正整数;Perform character cutting on the image data to obtain M characters, where M is a positive integer;
    对所述M个字符进行特征提取以得到M个字符特征,其中,所述M个字符与所述M个字符特征一一对应;Performing feature extraction on the M characters to obtain M character features, wherein the M characters correspond to the M character features one-to-one;
    将所述M个字符特征与字符特征数据库进行对比以识别所述M个字符特征对应的M个文本字符,其中,所述M个字符特征与所述M个文本字符一一对应;Comparing the M character features with a character feature database to identify the M text characters corresponding to the M character features, wherein the M character features correspond to the M text characters one to one;
    将所述M个文本字符进行组合以得到所述图像数据对应的文本数据。The M text characters are combined to obtain text data corresponding to the image data.
  4. 根据权利要求1所述的方法,其特征在于,所述文本类型包括语言类型和文体类型,所述对所述文本数据进行文本类型检测以判断所述文本数据的文本类型是否满足预设文本类型包括:The method according to claim 1, wherein the text type includes a language type and a style type, and the text type detection is performed on the text data to determine whether the text type of the text data meets a preset text type include:
    对所述文本数据进行语言类型检测以得到所述文本数据的语言类型;Performing language type detection on the text data to obtain the language type of the text data;
    对所述文本数据进行文体类型检测以得到所述文本数据的文体类型;Performing stylistic type detection on the text data to obtain the stylistic type of the text data;
    当所述语言类型满足预设语言类型以及所述文体类型满足预设文体类型时,确定所述文本类型满足所述预设文本类型;When the language type meets the preset language type and the style type meets the preset style type, determining that the text type meets the preset text type;
    当所述语言类型不满足所述预设语言类型、或者所述文体类型不满足所述预设文体类型、或者所述语言类型不满足所述预设语言类型以及所述文体类型不满足所述预设文体类型时,确定所述文本类型不满足所述预设文本类型。When the language type does not meet the preset language type, or the style type does not meet the preset style type, or the language type does not meet the preset language type and the style type does not meet the When the text type is preset, it is determined that the text type does not satisfy the preset text type.
  5. 根据权利要求4所述的方法,其特征在于,所述确定所述文本类型不满足所述预设文本类型之后,包括:The method according to claim 4, wherein after the determining that the text type does not satisfy the preset text type, the method comprises:
    当所述语言类型不满足所述预设语言类型时,向所述终端发送语言类型错误消息,其中,所述语言类型错误消息用于指示所述终端生成提示所述书籍的语言类型错误的弹窗或者界面;When the language type does not satisfy the preset language type, a language type error message is sent to the terminal, where the language type error message is used to instruct the terminal to generate a bulletin indicating that the language type of the book is wrong. Window or interface;
    当所述文体类型不满足所述预设文体类型时,向所述终端发送文体类型错误消息,其中,所述文体类型错误消息用于指示所述终端生成提示所述书籍的文体类型错误的弹窗或者界面;When the style type does not meet the preset style type, a style type error message is sent to the terminal, where the style type error message is used to instruct the terminal to generate a bullet indicating that the book's style type is wrong. Window or interface;
    当所述语言类型不满足所述预设语言类型以及所述文体类型不满足所述预设文体类型时,向所述终端发送语言及文体类型错误消息,其中,所述语言及文体类型错误消息用于指示所述终端生成提示所述书籍的语言及文体类型错误的弹窗或者界面。When the language type does not meet the preset language type and the style type does not meet the preset style type, a language and style type error message is sent to the terminal, wherein the language and style type error message It is used to instruct the terminal to generate a pop-up window or interface that prompts the language and style of the book to be wrong.
  6. 根据权利要求1所述的方法,其特征在于,所述神经网络编码器包括第一递归神经网络,所述将所述文本数据输入神经网络编码器以得到所述文本数据的摘要向量包括:The method according to claim 1, wherein the neural network encoder comprises a first recurrent neural network, and the inputting the text data into the neural network encoder to obtain a summary vector of the text data comprises:
    当前时刻将所述文本数据中的第一文本输入所述第一递归神经网络,以得到第一编码向量;Inputting the first text in the text data into the first recurrent neural network at the current moment to obtain a first encoding vector;
    将所述第一编码向量传入下一时刻;Pass the first coding vector into the next moment;
    下一时刻将所述第一编码向量和所述文本数据中的第二文本输入所述第一递归神经网络,以得到第二编码向量;Input the first code vector and the second text in the text data into the first recurrent neural network at the next moment to obtain a second code vector;
    将所述第二编码向量传入下一时刻,直到所述文本数据中的所有文本都输入到所述第一递归神经网络中,确定最后得到的编码向量为所述文本数据的摘要向量。The second coding vector is passed into the next moment until all the text in the text data is input into the first recurrent neural network, and it is determined that the finally obtained coding vector is the summary vector of the text data.
  7. 根据权利要求6所述的方法,其特征在于,所述神经网络解码器包括第二递归神经网络,所述将所述文本数据的摘要向量输入神经网络解码器以得到所述文本数据的摘要包括:The method according to claim 6, wherein the neural network decoder comprises a second recurrent neural network, and the input of the summary vector of the text data into the neural network decoder to obtain the summary of the text data comprises :
    当前时刻将所述文本数据的摘要向量输入所述第二递归神经网络,以预测得到第一输出文本;Inputting the summary vector of the text data into the second recurrent neural network at the current moment to predict the first output text;
    将所述第一输出文本传入下一时刻;Pass the first output text to the next moment;
    下一时刻将所述第一输出文本和所述文本数据的摘要向量输入所述第二递归神经网 络,以预测得到第二输出文本;Input the first output text and the summary vector of the text data into the second recurrent neural network at the next moment to predict the second output text;
    将所述第二输出文本传入下一时刻,直到所述第二递归神经网络对所述文本数据的摘要向量预测结束,确定最后得到的所有输出文本的组合为所述文本数据的摘要。The second output text is passed into the next moment until the second recurrent neural network predicts the summary vector of the text data, and it is determined that the combination of all the output texts finally obtained is the summary of the text data.
  8. 根据权利要求1所述的方法,其特征在于,所述对所述文本数据的摘要进行分词处理,按照词频由大到小的顺序提取所述文本数据的摘要中的N个关键词,包括:The method according to claim 1, wherein the performing word segmentation processing on the abstract of the text data, and extracting the N keywords in the abstract of the text data according to the order of word frequency from large to small, comprises:
    对所述文本数据的摘要进行分词处理以得到所述文本数据的摘要对应的K个分词,其中,K为大于N的正整数;Performing word segmentation processing on the abstract of the text data to obtain K segmentation corresponding to the abstract of the text data, where K is a positive integer greater than N;
    计算所述K个分词对应的K个词频,其中,所述K个分词与所述K个词频一一对应;Calculating the K word frequencies corresponding to the K word segmentation, wherein the K word segmentation corresponds to the K word frequencies one to one;
    按照词频由大到小的顺序确定所述K个分词中的N个分词;Determine the N participles of the K participles in descending order of word frequency;
    提取所述N个分词。Extract the N word segmentation.
  9. 根据权利要求8所述的方法,其特征在于,所述通过神经网络语义表示模型计算所述文本数据的问题和所述文本数据中的文本的语义相关程度,确定语义相关程度最高的文本为所述文本数据的问题对应的答案,包括:The method according to claim 8, wherein the problem of the text data is calculated by the neural network semantic representation model and the degree of semantic relevance of the text in the text data is determined, and the text with the highest degree of semantic relevance is determined The answers to the questions that describe the text data include:
    在所述文本数据中查找与所述N个关键词匹配的Q段文本,其中,Q为正整数;Searching for Q segments of text matching the N keywords in the text data, where Q is a positive integer;
    计算所述文本数据的问题与所述Q段文本的Q个语义相关程度,其中,所述Q段文本与所述Q个语义相关程度一一对应;Calculating the question of the text data and the Q semantic relevance degrees of the Q segment text, wherein the Q segment text corresponds to the Q semantic relevance degrees one to one;
    获取所述Q个语义相关程度中最高的第一语义相关程度;Obtaining the highest first semantic relevance degree among the Q semantic relevance degrees;
    确定所述第一语义相关程度对应的文本为所述文本数据的问题对应的答案。It is determined that the text corresponding to the first degree of semantic relevance is the answer corresponding to the question of the text data.
  10. 一种数据处理的装置,其特征在于,所述装置包括:A data processing device, characterized in that the device includes:
    获取模块,用于获取终端发送的书籍的图像数据;The acquisition module is used to acquire the image data of the book sent by the terminal;
    字符识别模块,用于对所述图像数据进行字符识别处理以得到所述图像数据对应的文本数据;A character recognition module for performing character recognition processing on the image data to obtain text data corresponding to the image data;
    检测模块,用于对所述文本数据进行文本类型检测以判断所述文本数据的文本类型是否满足预设文本类型;The detection module is configured to perform text type detection on the text data to determine whether the text type of the text data meets the preset text type;
    编码模块,用于当所述文本类型满足所述预设文本类型时,将所述文本数据输入神经网络编码器以得到所述文本数据的摘要向量,其中,所述神经网络编码器用于对所述文本数据进行压缩编码;The encoding module is used to input the text data into a neural network encoder to obtain a summary vector of the text data when the text type meets the preset text type, wherein the neural network encoder is used to The text data is compressed and encoded;
    解码模块,用于将所述文本数据的摘要向量输入神经网络解码器以得到所述文本数据的摘要,其中,所述神经网络解码器用于通过神经网络对所述文本数据的摘要向量进行预测以得到多个预测的字,所述多个预测的字连接为所述文本数据的摘要;The decoding module is configured to input the summary vector of the text data into a neural network decoder to obtain a summary of the text data, wherein the neural network decoder is used to predict the summary vector of the text data through a neural network Obtaining a plurality of predicted words, and the plurality of predicted words are connected as a summary of the text data;
    提取模块,用于对所述文本数据的摘要进行分词处理,按照词频由大到小的顺序提取所述文本数据的摘要中的N个关键词,其中,N为正整数;The extraction module is configured to perform word segmentation processing on the abstract of the text data, and extract N keywords in the abstract of the text data in the order of word frequency from large to small, where N is a positive integer;
    组合模块,用于对所述N个关键词进行词性分类,根据所述N个关键词的词性将所述N个关键词按照预设问句语序进行组合以得到所述文本数据的问题;A combination module, configured to classify the N keywords by part of speech, and combine the N keywords according to the part of speech of the N keywords in a preset question sentence order to obtain the text data question;
    处理模块,用于通过神经网络语义表示模型计算所述文本数据的问题和所述文本数据中的文本的语义相关程度,确定语义相关程度最高的文本为所述文本数据的问题对应的答案。The processing module is used to calculate the semantic correlation degree between the text data question and the text in the text data through the neural network semantic representation model, and determine the text with the highest semantic correlation degree as the answer corresponding to the text data question.
  11. 根据权利要求10所述的装置,其特征在于,所述装置还包括图像处理模块,所述图像处理模块用于:The device according to claim 10, wherein the device further comprises an image processing module, and the image processing module is configured to:
    当所述图像数据的倾斜度超过预设倾斜度阈值时,通过图像校正算法对所述图像数据进行处理,其中,所述图像校正算法包括拉东算法、霍夫变换和线性回归算法的任意一种;When the inclination of the image data exceeds the preset inclination threshold, the image data is processed by an image correction algorithm, wherein the image correction algorithm includes any one of Radon algorithm, Hough transform, and linear regression algorithm Species
    或者,当所述图像数据的清晰度低于预设清晰度阈值时,通过图像增强算法对所述图像数据进行处理,其中,所述图像增强算法包括直方图均衡、图像平滑、图像锐化的任意一种;Or, when the definition of the image data is lower than a preset definition threshold, the image data is processed by an image enhancement algorithm, where the image enhancement algorithm includes histogram equalization, image smoothing, and image sharpening. Any kind
    或者,当所述图像数据的倾斜度超过所述预设倾斜度阈值以及所述图像数据的清晰度低于所述预设清晰度阈值时,通过所述图像校正算法以及所述图像增强算法对所述图像数据进行处理。Alternatively, when the inclination of the image data exceeds the preset inclination threshold and the definition of the image data is lower than the preset definition threshold, the image correction algorithm and the image enhancement algorithm are used to compare The image data is processed.
  12. 根据权利要求11所述的装置,其特征在于,所述字符识别模块具体用于:The device according to claim 11, wherein the character recognition module is specifically configured to:
    对所述图像数据进行字符切割以得到M个字符,其中,M为正整数;Perform character cutting on the image data to obtain M characters, where M is a positive integer;
    对所述M个字符进行特征提取以得到M个字符特征,其中,所述M个字符与所述M个字符特征一一对应;Performing feature extraction on the M characters to obtain M character features, wherein the M characters correspond to the M character features one-to-one;
    将所述M个字符特征与字符特征数据库进行对比以识别所述M个字符特征对应的M个文本字符,其中,所述M个字符特征与所述M个文本字符一一对应;Comparing the M character features with a character feature database to identify the M text characters corresponding to the M character features, wherein the M character features correspond to the M text characters one to one;
    将所述M个文本字符进行组合以得到所述图像数据对应的文本数据。The M text characters are combined to obtain text data corresponding to the image data.
  13. 根据权利要求10所述的装置,其特征在于,所述文本文本类型包括语言类型和文体类型,所述检测模块具体用于:The device according to claim 10, wherein the text type includes a language type and a style type, and the detection module is specifically configured to:
    对所述文本数据进行语言类型检测以得到所述文本数据的语言类型;Performing language type detection on the text data to obtain the language type of the text data;
    对所述文本数据进行文体类型检测以得到所述文本数据的文体类型;Performing stylistic type detection on the text data to obtain the stylistic type of the text data;
    当所述语言类型满足预设语言类型以及所述文体类型满足预设文体类型时,确定所述文本类型满足所述预设文本类型;When the language type meets the preset language type and the style type meets the preset style type, determining that the text type meets the preset text type;
    当所述语言类型不满足所述预设语言类型、或者所述文体类型不满足所述预设文体类型、或者所述语言类型不满足所述预设语言类型以及所述文体类型不满足所述预设文体类型时,确定所述文本类型不满足所述预设文本类型。When the language type does not meet the preset language type, or the style type does not meet the preset style type, or the language type does not meet the preset language type and the style type does not meet the When the text type is preset, it is determined that the text type does not satisfy the preset text type.
  14. 根据权利要求13所述的装置,其特征在于,所述装置还包括提示模块,所述提示模块具体用于:The device according to claim 13, wherein the device further comprises a prompt module, and the prompt module is specifically configured to:
    当所述语言类型不满足所述预设语言类型时,向所述终端发送语言类型错误消息,其中,所述语言类型错误消息用于指示所述终端生成提示所述书籍的语言类型错误的弹窗或者界面;When the language type does not satisfy the preset language type, a language type error message is sent to the terminal, where the language type error message is used to instruct the terminal to generate a bulletin indicating that the language type of the book is wrong. Window or interface;
    当所述文体类型不满足所述预设文体类型时,向所述终端发送文体类型错误消息,其中,所述文体类型错误消息用于指示所述终端生成提示所述书籍的文体类型错误的弹窗或者界面;When the style type does not meet the preset style type, a style type error message is sent to the terminal, where the style type error message is used to instruct the terminal to generate a bullet indicating that the book's style type is wrong. Window or interface;
    当所述语言类型不满足所述预设语言类型以及所述文体类型不满足所述预设文体类型时,向所述终端发送语言及文体类型错误消息,其中,所述语言及文体类型错误消息用于指示所述终端生成提示所述书籍的语言及文体类型错误的弹窗或者界面。When the language type does not meet the preset language type and the style type does not meet the preset style type, a language and style type error message is sent to the terminal, wherein the language and style type error message It is used to instruct the terminal to generate a pop-up window or interface that prompts the language and style of the book to be wrong.
  15. 根据权利要求10所述的装置,其特征在于,所述神经网络编码器包括第一递归神经网络,所述编码模块具体用于:The apparatus according to claim 10, wherein the neural network encoder comprises a first recurrent neural network, and the encoding module is specifically configured to:
    当前时刻将所述文本数据中的第一文本输入所述第一递归神经网络,以得到第一编码向量;Inputting the first text in the text data into the first recurrent neural network at the current moment to obtain a first encoding vector;
    将所述第一编码向量传入下一时刻;Pass the first coding vector into the next moment;
    下一时刻将所述第一编码向量和所述文本数据中的第二文本输入所述第一递归神经网络,以得到第二编码向量;Input the first code vector and the second text in the text data into the first recurrent neural network at the next moment to obtain a second code vector;
    将所述第二编码向量传入下一时刻,直到所述文本数据中的所有文本都输入到所述第一递归神经网络中,确定最后得到的编码向量为所述文本数据的摘要向量。The second coding vector is passed into the next moment until all the text in the text data is input into the first recurrent neural network, and it is determined that the finally obtained coding vector is the summary vector of the text data.
  16. 根据权利要求15所述的装置,其特征在于,所述神经网络解码器包括第二递归神经网络,所述解码模块具体用于:The apparatus according to claim 15, wherein the neural network decoder comprises a second recurrent neural network, and the decoding module is specifically configured to:
    当前时刻将所述文本数据的摘要向量输入所述第二递归神经网络,以预测得到第一输出文本;Inputting the summary vector of the text data into the second recurrent neural network at the current moment to predict the first output text;
    将所述第一输出文本传入下一时刻;Pass the first output text to the next moment;
    下一时刻将所述第一输出文本和所述文本数据的摘要向量输入所述第二递归神经网络,以预测得到第二输出文本;Input the first output text and the summary vector of the text data into the second recurrent neural network at the next moment to predict the second output text;
    将所述第二输出文本传入下一时刻,直到所述第二递归神经网络对所述文本数据的摘要向量预测结束,确定最后得到的所有输出文本的组合为所述文本数据的摘要。The second output text is passed into the next moment until the second recurrent neural network predicts the summary vector of the text data, and it is determined that the combination of all the output texts finally obtained is the summary of the text data.
  17. 根据权利要求10所述的装置,其特征在于,所述提取模块具体用于:The device according to claim 10, wherein the extraction module is specifically configured to:
    对所述文本数据的摘要进行分词处理以得到所述文本数据的摘要对应的K个分词,其中,K为大于N的正整数;Performing word segmentation processing on the abstract of the text data to obtain K segmentation corresponding to the abstract of the text data, where K is a positive integer greater than N;
    计算所述K个分词对应的K个词频,其中,所述K个分词与所述K个词频一一对应;Calculating the K word frequencies corresponding to the K word segmentation, wherein the K word segmentation corresponds to the K word frequencies one to one;
    按照词频由大到小的顺序确定所述K个分词中的N个分词;Determine the N participles of the K participles in descending order of word frequency;
    提取所述N个分词。Extract the N word segmentation.
  18. 根据权利要求17所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 17, wherein the processing module is specifically configured to:
    在所述文本数据中查找与所述N个关键词匹配的Q段文本,其中,Q为正整数;Searching for Q segments of text matching the N keywords in the text data, where Q is a positive integer;
    计算所述文本数据的问题与所述Q段文本的Q个语义相关程度,其中,所述Q段文本与所述Q个语义相关程度一一对应;Calculating the question of the text data and the Q semantic relevance degrees of the Q segment text, wherein the Q segment text corresponds to the Q semantic relevance degrees one to one;
    获取所述Q个语义相关程度中最高的第一语义相关程度;Obtaining the highest first semantic relevance degree among the Q semantic relevance degrees;
    确定所述第一语义相关程度对应的文本为所述文本数据的问题对应的答案。It is determined that the text corresponding to the first degree of semantic relevance is the answer corresponding to the question of the text data.
  19. 一种数据处理的电子设备,其特征在于,所述电子设备包括处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述程序包括用于执行权利要求1至9任一项方法中的步骤的指令。An electronic device for data processing, characterized in that the electronic device includes a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are The configuration is executed by the processor, and the program includes instructions for executing the steps in any one of the methods of claims 1 to 9.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现权利要求1至9任意一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the method according to any one of claims 1 to 9.
PCT/CN2019/102348 2019-05-20 2019-08-23 Data processing method and related apparatus WO2020232864A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910420391.5A CN110222168B (en) 2019-05-20 2019-05-20 Data processing method and related device
CN201910420391.5 2019-05-20

Publications (1)

Publication Number Publication Date
WO2020232864A1 true WO2020232864A1 (en) 2020-11-26

Family

ID=67821511

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/102348 WO2020232864A1 (en) 2019-05-20 2019-08-23 Data processing method and related apparatus

Country Status (2)

Country Link
CN (1) CN110222168B (en)
WO (1) WO2020232864A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663537A (en) * 2023-07-26 2023-08-29 中信联合云科技有限责任公司 Big data analysis-based method and system for processing selected question planning information

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110730389B (en) * 2019-12-19 2020-03-31 恒信东方文化股份有限公司 Method and device for automatically generating interactive question and answer for video program
CN111242741B (en) * 2020-01-15 2023-08-04 新石器慧通(北京)科技有限公司 Scene-based commodity document generation method and system and unmanned retail vehicle
CN112863010B (en) * 2020-12-29 2022-08-05 宁波友好智能安防科技有限公司 Video image processing system of anti-theft lock

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107680580A (en) * 2017-09-28 2018-02-09 百度在线网络技术(北京)有限公司 Text transformation model training method and device, text conversion method and device
CN108334492A (en) * 2017-12-05 2018-07-27 腾讯科技(深圳)有限公司 Text participle, instant message treating method and apparatus
WO2018157703A1 (en) * 2017-03-02 2018-09-07 腾讯科技(深圳)有限公司 Natural language semantic extraction method and device, and computer storage medium
CN109522553A (en) * 2018-11-09 2019-03-26 龙马智芯(珠海横琴)科技有限公司 Name recognition methods and the device of entity
CN109726281A (en) * 2018-12-12 2019-05-07 Tcl集团股份有限公司 A kind of text snippet generation method, intelligent terminal and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002351310A1 (en) * 2001-12-06 2003-06-23 The Trustees Of Columbia University In The City Of New York System and method for extracting text captions from video and generating video summaries
CN106409290B (en) * 2016-09-29 2019-06-25 深圳市唯特视科技有限公司 A method of child's intelligent sound education based on image analysis
CN108319668B (en) * 2018-01-23 2021-04-20 义语智能科技(上海)有限公司 Method and equipment for generating text abstract
CN108537283A (en) * 2018-04-13 2018-09-14 厦门美图之家科技有限公司 A kind of image classification method and convolutional neural networks generation method
CN109325180B (en) * 2018-09-21 2021-01-05 北京字节跳动网络技术有限公司 Article abstract pushing method and device, terminal equipment, server and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018157703A1 (en) * 2017-03-02 2018-09-07 腾讯科技(深圳)有限公司 Natural language semantic extraction method and device, and computer storage medium
CN107680580A (en) * 2017-09-28 2018-02-09 百度在线网络技术(北京)有限公司 Text transformation model training method and device, text conversion method and device
CN108334492A (en) * 2017-12-05 2018-07-27 腾讯科技(深圳)有限公司 Text participle, instant message treating method and apparatus
CN109522553A (en) * 2018-11-09 2019-03-26 龙马智芯(珠海横琴)科技有限公司 Name recognition methods and the device of entity
CN109726281A (en) * 2018-12-12 2019-05-07 Tcl集团股份有限公司 A kind of text snippet generation method, intelligent terminal and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663537A (en) * 2023-07-26 2023-08-29 中信联合云科技有限责任公司 Big data analysis-based method and system for processing selected question planning information
CN116663537B (en) * 2023-07-26 2023-11-03 中信联合云科技有限责任公司 Big data analysis-based method and system for processing selected question planning information

Also Published As

Publication number Publication date
CN110222168B (en) 2023-08-18
CN110222168A (en) 2019-09-10

Similar Documents

Publication Publication Date Title
CN110096570B (en) Intention identification method and device applied to intelligent customer service robot
CN108829757B (en) Intelligent service method, server and storage medium for chat robot
CN109800306B (en) Intention analysis method, device, display terminal and computer readable storage medium
CN111046133B (en) Question and answer method, equipment, storage medium and device based on mapping knowledge base
WO2020232864A1 (en) Data processing method and related apparatus
CN106328147B (en) Speech recognition method and device
CN110909137A (en) Information pushing method and device based on man-machine interaction and computer equipment
CN112016553B (en) Optical Character Recognition (OCR) system, automatic OCR correction system, method
CN112487139A (en) Text-based automatic question setting method and device and computer equipment
CN112287069B (en) Information retrieval method and device based on voice semantics and computer equipment
CN111858878B (en) Method, system and storage medium for automatically extracting answer from natural language text
CN111339305A (en) Text classification method and device, electronic equipment and storage medium
CN112100377B (en) Text classification method, apparatus, computer device and storage medium
CN111581367A (en) Method and system for inputting questions
KR20200087977A (en) Multimodal ducument summary system and method
CN114245203A (en) Script-based video editing method, device, equipment and medium
CN112632244A (en) Man-machine conversation optimization method and device, computer equipment and storage medium
CN115759119B (en) Financial text emotion analysis method, system, medium and equipment
CN111126084A (en) Data processing method and device, electronic equipment and storage medium
CN112559725A (en) Text matching method, device, terminal and storage medium
CN112632956A (en) Text matching method, device, terminal and storage medium
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN116303951A (en) Dialogue processing method, device, electronic equipment and storage medium
CN116186244A (en) Method for generating text abstract, method and device for training abstract generation model
CN116483314A (en) Automatic intelligent activity diagram generation method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929245

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19929245

Country of ref document: EP

Kind code of ref document: A1