CN111460782A

CN111460782A - Information processing method, device and equipment

Info

Publication number: CN111460782A
Application number: CN202010252333.9A
Authority: CN
Inventors: 徐达峰
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2020-07-28
Anticipated expiration: 2040-04-01
Also published as: CN117113962A; CN111460782B

Abstract

The embodiment of the specification discloses an information processing method, device and equipment. The information processing scheme comprises the following steps: acquiring an electronic image corresponding to information to be input; classifying the electronic image into a preset image type; performing target detection on the classified electronic image according to the image type so as to identify a plurality of target image contents contained in the electronic image as corresponding information elements of the target image contents in a computer system; and typesetting the information elements according to a preset typesetting rule to generate the digital information corresponding to the information to be input.

Description

Information processing method, device and equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an information processing method, apparatus, and device.

Background

With the development of industrial internet, especially the current whole industry is in the change of consumer internet to industrial internet, the development of the technologies such as 5G communication, artificial intelligence and the like undoubtedly assists the whole change, especially the technological progress in deep learning, so that the capability of reading and recognizing the image by a machine is obviously broken through.

At this time, applications using machine reading and recognition, such as document scanning, object recognition, and the like, are significantly increased.

For example, Office L ens (a piece of software published by microsoft corporation) enables users to change external information including text and graphics such as paper documents, business cards, whiteboards, posters, etc. into electronic pictures to be recorded into a computer system by means of photographing, scanning, etc., and even to export the electronic pictures to formatted files such as word, ppt, etc. through some simple post-processing such as OCR (Optical Character Recognition), thereby implementing an auxiliary extension of content recording capability of Office suites.

For example, Google L ens (a piece of software released by Google corporation), a user may scan products, animals, text, or other things that exist in the environment, and by networking to Google backend servers, rely on a large amount of sample data in the servers, quickly identify things in the scanned scene.

However, in the application of reading and recognizing images by a conventional machine, although the information input and recognition processing can be realized after photographing and scanning standard scenes (such as standard printed matters and standard objects), the diversified use requirements of users in daily life are still difficult to meet.

Therefore, a more convenient and direct information processing scheme is needed.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide an information processing method, apparatus, and device, so as to identify external information of a computer from nature and convert the external information into digital information that is convenient for processing.

The embodiment of the specification adopts the following technical scheme:

an embodiment of the present specification provides an information processing method, including:

acquiring an electronic image corresponding to information to be input;

classifying the electronic image into a preset image type;

performing target detection on the classified electronic image according to the image type so as to identify a plurality of target image contents contained in the electronic image as corresponding information elements of the target image contents in a computer system;

and typesetting the information elements according to a preset typesetting rule to generate the digital information corresponding to the information to be input.

An embodiment of the present specification further provides an information processing apparatus, including:

the acquisition module acquires an electronic image corresponding to the information to be input;

a classification module that classifies the electronic image into a preset image type;

the detection module is used for carrying out target detection on the classified electronic images according to the image types so as to identify a plurality of target image contents contained in the electronic images as corresponding information elements of the target image contents in a computer system;

and the typesetting module is used for typesetting the information elements according to a preset typesetting rule to generate the digital information corresponding to the information to be input.

An embodiment of the present specification further provides an electronic device for information processing, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to:

acquiring an electronic image corresponding to information to be input;

classifying the electronic image into a preset image type;

The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects:

the user can directly carry out operations such as shooting and scanning on information in a natural scene, and the digitalized information corresponding to various effective information can be obtained after identification processing, so that the use requirement of the user for conveniently inputting the external information of the computer can be met, and the use experience of the user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a schematic diagram of information processing provided in an embodiment of the present specification.

Fig. 2 is a flowchart of an information processing method provided in an embodiment of the present specification.

Fig. 3 is a schematic diagram of an electronic image in an information processing method provided in an embodiment of the present specification.

Fig. 4 is a schematic diagram of a character recognized from an electronic image in an information processing method provided in an embodiment of the present specification.

Fig. 5 is a schematic diagram of a graph identified from an electronic image in an information processing method provided in an embodiment of the present specification.

Fig. 6(a) is a schematic diagram of an electronic image of a text in an information processing method provided in an embodiment of the present specification.

Fig. 6(b) is a schematic diagram of digitized information of a text in an information processing method provided in an embodiment of the present specification.

Fig. 7(a) is a schematic diagram of an electronic image of a brain diagram in an information processing method provided in an embodiment of the present specification.

Fig. 7(b) is a schematic diagram of digitized information of a brain graph in an information processing method provided in an embodiment of the present specification.

Fig. 8(a) is a schematic diagram of an electronic image of a table in an information processing method provided in an embodiment of the present specification.

Fig. 8(b) is a schematic diagram of digitized information of a table in an information processing method provided in an embodiment of the present specification.

Fig. 9 is a schematic diagram of a character recognition model in an information processing method according to an embodiment of the present disclosure.

Fig. 10 is a schematic diagram illustrating training of a character recognition model in an information processing method according to an embodiment of the present disclosure.

Fig. 11 is a schematic diagram of an object detection model in an information processing method according to an embodiment of the present disclosure.

Fig. 12 is a schematic diagram of plane division in an information processing method according to an embodiment of the present disclosure.

Fig. 13 is a schematic diagram of deployment and implementation in an information processing method provided in an embodiment of the present specification.

Fig. 14 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any inventive step based on the embodiments of the present disclosure, shall fall within the scope of protection of the present application.

Although the traditional application schemes of reading and recognizing images by machines, such as Office lenses and Google lenses, can take pictures and scan standard scenes (such as standard printed matters and standard objects) to realize information input and recognition processing, users still have more limitations in use and are difficult to meet diversified use requirements in daily life of the users.

Based on this, the embodiments of the present specification provide an information processing method, apparatus and device.

As shown in fig. 1, in the information processing scheme provided in the embodiment of the present specification, a user directly converts information in a natural scene into an electronic image that can be processed by a computer, then identifies and processes the electronic image, obtains various types of effective information included in the image from the electronic image, and generates corresponding digital knowledge information after layout rendering. Therefore, the user can directly input the information in the natural scene and generate the corresponding digital information.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

As shown in fig. 2, an embodiment of the present specification provides an information processing method, including:

and S102, acquiring an electronic image corresponding to the information to be input.

The information to be entered may include information in a natural scene that needs to be entered into the computer system, such as information about objects, text containing words and/or texts, or other things existing in the natural environment, and the information needs to be entered into the computer system, so that the computer system converts the information in the application scene into digital knowledge information for utilization.

In specific implementation, a user can directly photograph, scan and the like the information in the natural scene needing to be entered into the computer system, for example, photograph by using a camera of the mobile terminal, so that the scene information to be entered into the computer system is changed into an electronic image which can be processed by the computer system.

In addition, the electronic image can also be an electronic image which corresponds to the information to be recorded and is stored in a database.

As shown in fig. 3, an electronic image corresponding to a brain image handwritten by a user is obtained by photographing from a natural scene. The electronic image contains handwritten characters, lines (such as arrows), wire frame shapes and the like, and the outer frame of "INTERNET" in the image is originally a handwritten red frame (which is converted into a gray image in fig. 3).

It should be noted that, since the electronic image may be an image directly obtained from a natural scene, the obtained electronic image may be a grayscale image or an RGB image.

In specific implementation, the electronic image directly obtained from a natural scene may be affected by various uncertain factors, such as reflection on a whiteboard and interference noise.

Therefore, after the image of the information to be input is acquired from the natural scene, preprocessing can be performed to generate an electronic image corresponding to the information to be input. The preprocessing may include image preprocessing processes such as binarization processing and image scaling.

In this way, through preprocessing, for example, selecting a gray scale image with 256 brightness levels by a proper threshold value, a binary image which can still reflect the whole and local features of the image is obtained, so that the image protrudes the outline of an interested target, and the data volume carried by the image is reduced, thereby facilitating subsequent processing.

In specific implementation, when the image acquired from the natural scene is an RGB image, the image may be subjected to a graying process to obtain a grayscale image, and then the grayed image may be subjected to a binarization process to obtain an electronic image containing the target of interest with a clear outline.

The graying processing is to convert the RGB tristimulus values of any one pixel point in the RGB image into the same numerical value, so as to replace the original RGB tristimulus values of the pixel point with the numerical value (i.e. the gray value), and the value range of the gray value can be 0-255.

It should be noted that, the way of implementing graying is various, such as an average algorithm:

f(i,j)＝(R(i,j)+G(i,j)+B(i,j))/3

wherein f (i, j) is the gray value of a pixel point (the coordinate of the pixel point is (i, j)) in the image, and R (i, j), G (i, j) and B (i, j) are the R (red value), G (green value) and B (blue value) of the pixel point respectively.

Therefore, the graying method in the preprocessing is not particularly limited here.

The binarization processing is to further simplify the data of the grayed image by setting a threshold value, so that the outline of the image becomes clearer.

It should be noted that, there are various setting schemes for the threshold in binarization, and a binarization threshold scheme may be selected according to practical applications, for example, an OTSU (maximum inter-class variance) algorithm is commonly used, and the optimal binarization threshold is obtained by dividing the image into A, B classes and using the inter-class variance.

Therefore, the binarization in the preprocessing is not limited here.

And step S104, classifying the electronic image into a preset image type.

In a specific implementation, the electronic image often includes image content such as graphics and/or words, and the image characteristics of the electronic image including the image content such as graphics and/or words are often significant, such as text, tables, flowcharts, brain diagrams (i.e., mind maps), and the like.

Therefore, the electronic images can be classified according to the image characteristics of the electronic images so as to be divided into corresponding preset types, so that the subsequent steps can be conveniently and pertinently processed, and the image recognition effect can be improved.

In some embodiments, classification of electronic images may be accomplished through feature engineering.

In specific implementation, common classifiers such as KNN (k-Nearest Neighbors) classifiers and deep learning classifiers can be used for classification according to the selection of factors such as efficiency and implementation complexity in practical application.

For example, a sample belongs to a class according to which most of k Nearest Neighbors in the feature space belong to the class, and has the characteristics of the sample on the class, so that the electronic image can be quickly classified into corresponding preset types by using a KNN (k-Nearest Neighbors) classification algorithm.

In a specific implementation, the similarity degree of the two types of images can be characterized through the similarity degree.

For example, a distance metric, such as Euclidean distance, may be employed to characterize similarity.

Wherein, the Euclidean distance can be expressed as follows:

I₁、I₂respectively representing the pixel vectors of the two images, and p represents the p-th pixel point.

It should be noted that the preset image type can be set according to implementation scenarios, such as tables, flowcharts, brain diagrams, figures, animals, and so on, and will not be described herein again.

And S106, carrying out target detection on the classified electronic image according to the image type so as to identify a plurality of target image contents contained in the electronic image as corresponding information elements of the target image contents in a computer system.

The information element refers to an element that carries or transmits visual information, such as text, lines, shapes, and the like, and thus the information element may be referred to as an image element.

Therefore, according to the image type of the electronic image, the target detection can be rapidly and pertinently carried out on the image content in the electronic image so as to identify various effective information (namely the image content) contained in the electronic image, such as graphs (such as lines, shapes), characters and the like.

In addition, various types of image contents included in the electronic image can be detected by object detection. On one hand, the computer cannot directly recognize and process the image contents such as the characters, lines, wire frame graphs and the like; on the other hand, if the image contents are not calibrated, the subsequent processing may be affected, for example, the accuracy of the processing result is reduced, and the processing difficulty is increased.

The description is continued with reference to fig. 3 described above. In the figure, the lines forming the wire frame are not straight lines as can be drawn by a computer, and the formed wire frame is not a long (square) square wire frame as can be drawn and identified by the computer; the hand-drawn arrow pattern is not the arrow pattern that can be drawn by a computer, and the hand-written characters are also significantly different from the standard print characters that can be recognized by a computer.

Therefore, after the target is detected, the image content obtained after the target is detected can be identified as the corresponding information element of the image content in the computer system.

For example, in the obtained electronic image, the shape of effective information such as lines and frames in the electronic image may be deformed due to the captured light, angle, and the like, and in this case, after the lines and frames are recognized, the lines and frames can be recognized as a figure such as lines and frames that can be drawn by a computer.

For example, in the obtained electronic image, the characters in the electronic image may be different from the printed characters that can be recognized and processed by the computer due to the reason of photographing or the characters being handwritten characters, and the characters can be recognized as the corresponding printed characters in the computer.

As shown in fig. 4, the characters included in the electronic image are recognized as the print characters corresponding to the characters in the computer system;

for example, the handwritten word "SOCiA L" in the figure may be recognized as the printed word "SOCiA L", and after recognition, the corresponding printed word of the handwritten word in the computer system may also be marked in the vicinity of the handwritten word.

It should be noted that although the recognition result labeled in the figure may have slight differences from the characters in the dictionary, these differences can be calibrated by the commonly used language processing means, such as dictionary, NP L (natural voice), markov chain, etc. for example, the handwritten character "marking" is preliminarily recognized and labeled as the typewritten character "IMARkeTiNGI" (because of the close border, the border is easily recognized as "I" in the preliminary recognition), but the calibration can be performed by other means (such as dictionary), such as calibrating the preliminary recognition result to "MARKETING".

As shown in fig. 5, the lines and wire frames included in the electronic image are recognized as corresponding lines and wire frames in the computer system.

As shown in the figure, the hand-drawn wire frame, and the identified and labeled regular graphic region, such as square (wire frame graphic) region, can be the light-colored wire frame at the periphery of the hand-drawn wire frame in the figure. In addition, a label word can be added in the area, such as the label word is square.

As shown in the figure, the arrows drawn by hand are identified and labeled as line regions, which may be light-colored line frames around the arrows drawn by hand in the figure. In addition, a label word can be added in the region, for example, the label word is line.

It should be noted that, in fig. 4 and 5, during the labeling, the position of the content of the original image is not considered, so that there may be mutual overlapping between the label and the content of the original image, and therefore, different colors may be used for distinguishing in the specific implementation, so as to avoid the distinguishing inconvenience caused by the overlapping, for example, the labeled wire frame is cyan, the labeled line is green, and the labeled text is blue.

Therefore, through image classification and target detection, effective information associated with information to be input in the electronic image is identified, and irrelevant targets, such as the writing pen image in fig. 3, do not need to be identified.

Therefore, after the target detection is carried out on the electronic image, the image content contained in the electronic image is identified as the corresponding information element of the image content in the computer system, so that the influence brought by the acquisition process of the electronic image can be reduced, the subsequent processing is facilitated, and the processing accuracy and the identification efficiency are improved.

And S108, typesetting the information elements according to a preset typesetting rule to generate the digital information corresponding to the information to be input.

The digital information may include an expression form of a certain type of information corresponding to the information to be entered in the knowledge base, so as to facilitate operations (such as viewing, editing, storing, outputting, and the like) on the entered information.

For example, after the information elements are typeset and rendered, a digital UI (graphical interface) corresponding to the information to be entered is generated, and a final expression form of a certain type of content in the knowledge base, such as a final expression form of contents such as a mind (brain map), a sheet (table), a list (list) and the like, is further formed.

Therefore, after the effective information in the electronic image is obtained, the image contents such as characters, frames and/or lines in the image can be typeset and rendered again according to the corresponding typesetting rules to generate corresponding digital information, such as a digital interface (UI), so that the user can conveniently input the information through the digital information.

In a specific implementation, the layout rule may be a layout rule corresponding to the type of the image, and may be preset according to an actual application, so as to perform layout on the information elements corresponding to the information to be entered in a more targeted manner.

For example, in a text scenario, the text may be a text with only characters or a text with more than one characters, and the composition rule may be a rule for extracting and composing the characters and/or graphics in the text, such as characters, words, sentences, paragraphs, spaces, graphics, and so on, and no description is expanded one by one.

For example, in the brain graph scene, the typesetting rule may be a rule for extracting and rearranging the text and the graphics in the brain graph, such as highlighting and rendering a central node (keyword/idea, graphic) in the brain graph, for example, the central keyword is located inside the central graphic, and other text is located at the end of the corresponding relationship line in the brain graph, and the description is not repeated one by one.

For example, in a table scene, the typesetting rule may be a rule for typesetting a table and characters in the table, such as rendering the table as a standard table, filling corresponding characters in the table, and the like, and description is not repeated one by one.

For ease of understanding, the digitized information is illustrated schematically below by way of example.

Fig. 6 to 8 are schematic diagrams of an electronic image of information to be entered and corresponding digitized information acquired from a natural scene by using the information processing method provided by the embodiment of the present description.

FIG. 6 is a schematic diagram of a text scene.

As shown in fig. 6(a), a user uses a mobile terminal in which the information processing method provided in the embodiment of the present description is deployed, and obtains an electronic image corresponding to text content handwritten on paper directly from a natural scene by taking a picture.

As shown in fig. 6(b), after being processed, the corresponding text information is finally output, including: the scene theme is 'list is automatically generated', and the text content corresponding to the information to be input is '1', which is a line of text 'and' 2 ', at the moment, the title is' 2 'and' 3 ', a line of test words'.

Fig. 7 is a schematic diagram of a brain map scene.

As shown in fig. 7(a), a user uses a mobile terminal in which the information processing method provided in the present embodiment is deployed, and obtains an electronic image corresponding to a handwritten brain map directly from a natural scene by scanning.

As shown in fig. 7(b), after being processed, the brain map is finally output, which includes: the scene subject 'automatically generates a brain graph', the central keyword 'RISK' is rendered by adopting colors, the brain graph content is orderly typeset, and the like.

FIG. 8 is a diagram of a table scenario.

As shown in fig. 8(a), a user uses a mobile terminal in which the information processing method provided in the embodiment of the present description is deployed, and obtains an electronic image corresponding to a form handwritten on a whiteboard directly from a natural scene by taking a picture.

As shown in fig. 8(b), after being processed, the table information is finally output, including: the scene theme 'automatically generates a form', and the form is adopted to display the handwritten content on the original white board, such as time, items and the like.

Therefore, by converting the information elements to generate corresponding digital information, information which is inconvenient for computer processing in natural scenes is recorded and converted into digital information which can be identified and processed by a computer.

In some embodiments, the digitized information may also be presented via a digitizing interface, as previously described in fig. 6-8.

Furthermore, a plurality of processing interfaces can be provided for a user through a digital interface, so that the user can conveniently perform corresponding processing operations on the digital information through the processing interfaces, for example, through the corresponding processing interfaces, the user can perform processing operations such as viewing, editing, storing, outputting (such as collecting, sharing, uploading and the like) and the like on the digital information corresponding to the information to be input.

According to the steps S102-S108, the user can directly shoot the natural scene, so that various effective information contained in the shot electronic image is obtained through recognition processing, the information is typeset and rendered and converted into corresponding digital information, the user can conveniently input the external information of the computer into a computer system, the user can conveniently operate the converted digital information, and the user experience can be improved.

In some embodiments, when performing target detection, if the image type corresponding to the electronic image is an image type including characters, then the target detection process of the characters can be used as an independent processing process, that is, characters in the electronic image are detected from the image based on OCR, so as to improve the detection effect of the characters, and facilitate subsequent operations on the characters, for example, handwritten characters can be recognized and converted into print characters, which is convenient for a computer to process the characters and is also convenient for a user to process the characters.

In a specific implementation, the OCR technology is selected to be "ResNet + L STM", that is, ResNet and L STM are used together to construct a character recognition model as shown in fig. 9, so that the model is used to perform target detection on characters.

The method comprises the steps of firstly performing convolution and pooling on an electronic image through ResNet (residual neural network) to extract character features, then L STM extracting characters in the electronic image according to the character features and a preset character feature set, and finally obtaining recognition and classification results of the characters to finish target detection of the characters.

It should be noted that the ResNet structure may be selected according to the actual application requirements, such as Res18, Res34, Res50, Res101, Res152, and other mature structures, and the convolutional layer and the pooling layer are stacked structures.

In the specific implementation, since training processes such as ResNet, L TSM and the like are relatively mature, the training process of the character recognition model is schematically illustrated here.

As shown in fig. 10, the training process of the character recognition model includes building and training processes of an input layer, a CNN hidden layer, an output layer, and the like.

First, the input layer represents each sample in the training sample set in a computer-readable tensor form.

Such as: 32525 represents a 25 × 25 pixel color map and is constructed as a tensor (i.e., 3-dimensional matrix) in terms of RGB so that the output of the input layer can be used as the input to the next layer of the network.

Secondly, in the CNN training hidden layer, a plurality of convolution layers are stacked into a pooling layer to form a convolution network.

There are several filters (filters, as convolution kernel) in the convolution, each filter has a set of fixed weights, and the size of the filter can be customized according to the actual application needs. Of course, the size of the filter should be smaller than the pixel size of the image.

Such as: for the arabic numeral "9", since the word "9" is composed of several curves, where the upper half is similar to a circle and the lower half is similar to a curve, the convolution process is used to extract two features, i.e., the circle and the curve, respectively. Here, the pooling layer may employ a max pooling algorithm (max pooling) that may be done for down-sampling purposes, and also to amplify features to remove noise and avoid over-fitting situations.

And then, sequentially sliding and filtering all areas of the sample by the specified step length through the filter, and carrying out inner product operation to obtain a multi-dimensional result as the input of the next layer of network.

And finally, in the output layer, considering that the output of the last step of pooling task is a multi-dimensional matrix, the multi-dimensional actual dimension can be reduced to one dimension by adopting a flatten layer, and the classification can be carried out by sending softmax through a full-connection network so as to obtain a classification result for output.

Through the training process, a final character recognition model can be obtained, and the model can be deployed in a terminal used by a user, so that the user can conveniently and directly obtain character data from a natural scene.

In some embodiments, in view of the variety of characters appearing in natural scenes, the character recognition effect of the character feature set can be improved by supplementing a large number of character feature samples to the character feature set.

The samples in the character feature set can construct a large amount of character sample data through transfer learning besides the daily collected samples so as to supplement a large amount of sample data to the character feature set, thereby not only making up a large amount of samples required by character recognition, but also improving the character recognition effect.

It should be noted that both the migration learning manner and the specific algorithm can be selected according to the actual application requirements, wherein the migration learning manner may include a sample migration manner, a feature migration manner, a parameter (model) migration manner, a relationship migration manner, and the like, and the specific algorithm of the migration learning may be an existing algorithm, such as an image-to-image (image-to-image conversion), so that the feature set of the handwritten characters can be obtained by the migration learning from a large number of images containing the handwritten characters. Therefore, the migration learning is not particularly limited here.

In some embodiments, object-detection may be performed using a MobileNet-SSD. The mobile net-SSD is a lightweight deep network model mainly proposed for being applicable to the mobile terminal, and the standard Convolution kernel is decomposed and calculated by using a deep Separable Convolution (Depthwise Separable Convolution), so that the calculation amount can be reduced, the deep learning application of the mobile terminal and the embedded terminal is satisfied, and further, the model is deployed at the user terminal, so that the user terminal can conveniently complete target detection.

It should be noted that the MobileNet can be selected according to the actual application, and the MobileNet is not specifically limited herein.

For ease of understanding, embodiments of the present specification provide a MobileNet-SDD architecture.

As shown in fig. 11, the input picture size is 300 × 300, and features can be extracted from feature maps of six different scales through MobileNet-SDD to perform detection. Wherein the six different dimensions include 38 x 512, 19 x 1024, 10 x 512, 5 x 256, 3 x 256, and 1 x 256.

Therefore, the target detection can be carried out on the electronic image by adopting the MobileNet-SSD based on the preset data set so as to identify a plurality of image contents contained in the electronic image.

In some embodiments, the preset data set may be an existing data set, such as VOCdevkit, VOC2012, or the like.

In some embodiments, the preset data set may also be a self-created data set, such as a data set constructed according to the application data of the actual application scenario according to the VOCdevkit and/or VOC2012 data set format. Therefore, detection is more targeted through the self-established data set, and the detection effect can be improved.

In some embodiments, a plurality of image contents detected from the electronic image can be added to the data set, so as to further train the MobileNet-SSD by using the data set, thereby improving the detection effect.

In some embodiments, the information elements can be typeset by adopting an intelligent layout, so that the digitalized information corresponding to the information to be input is beautiful and practical.

In specific implementation, typesetting can be realized through an intelligent coordinate scheme. Specifically, typesetting the information elements according to a preset typesetting rule to generate the digitized information corresponding to the information to be input, which may include:

generating a character area object corresponding to a character element in the information elements and generating a graphic area object corresponding to a graphic element in the information elements;

determining layout parameters to be occupied by the information elements, wherein the layout parameters comprise coordinate values for typesetting the character region objects and coordinate values for typesetting the graphic region objects;

and typesetting the character area object and the graphic area object according to the layout parameters according to a preset typesetting rule to generate digital information corresponding to the information to be input.

Therefore, the character area and the graphic area are distinguished by the coordinates, so that the intelligent typesetting can be conveniently carried out by utilizing the coordinate parameters.

In some embodiments, in the typesetting of the character region object and the graphic region object through the coordinates, there may be regions where characters and graphics intersect each other. In this case, a plane division scheme can be adopted for solution.

In a specific implementation, the principle of the plane segmentation scheme is as follows:

when there may be a plurality of object intersections in a certain area, the first quadrant of the area may be divided into two parts, i.e. upper and lower parts, by the dividing line shown in the figure, wherein the slope of the dividing line is determined by the following principle: the slope of the dividing line is such that a straight line from a point nearest to the dividing line, at which any one of the character region object and the graphic region object that intersect exists, to the dividing line is farthest.

For example, as shown in fig. 12, the object a and the object B are crossed, and the crossed objects can be divided and laid out by re-dividing (dividing) the crossed area.

In this way, the area of the region near the lower part in the object B will be larger, so that the object B can be attributed to the object C below, and thus the object a and the object B are separated. For example, the region of object B is considered as a character region and the region of object A is considered as a graphic region, so that after object B is pulled to the region of object C, the characters and graphics will not intersect.

In some embodiments, an information processing method provided in the embodiments of the present specification may be used to form an information processing model. Thus, after the model is deployed, the information processing method provided in the embodiment of the present specification may further form feedback according to the use of the user, as shown in fig. 13, use the feedback of the user as a supplemented new sample, and train the information processing model after labeling the feature region on the supplemented new sample. Therefore, the trained model is redeployed, the information processing effect can be further improved, and the use experience of the user is improved.

Based on the same inventive concept, the embodiment of the specification also provides an apparatus for information processing, an electronic device and a non-volatile computer storage medium.

Fig. 14 is a schematic structural diagram of an information processing apparatus provided in this specification.

As shown in fig. 14, the information processing apparatus 10 includes: the acquisition module 11 is used for acquiring an electronic image corresponding to information to be input; a classification module 12 that classifies the electronic image into a preset image type; the detection module 13 is used for performing target detection on the classified electronic images according to the image types so as to identify a plurality of target image contents contained in the electronic images as corresponding information elements of the target image contents in a computer system; and the typesetting module 14 is used for typesetting the information elements according to a preset typesetting rule to generate the digital information corresponding to the information to be input.

Optionally, when the image type is an image type including characters, performing target detection on the classified electronic image according to the image type, including:

extracting the characteristics of the characters contained in the electronic image by adopting a residual error neural network;

and extracting the characters in the electronic image by adopting a long-short term memory network according to the characteristics and a preset character characteristic set so as to complete target detection on the characters contained in the electronic image.

Optionally, the information processing apparatus 10 further includes: and the transfer learning module 15 supplements the sample data in the character feature set through transfer learning.

Optionally, performing target detection on the classified electronic image according to the image type, including:

and performing target detection on the classified electronic image by utilizing a MobileNet-SSD based on a preset data set according to the image type.

Optionally, the information processing apparatus 10 further includes:

the data set module 16 constructs the data set according to the VOCdevkit and/or VOC2012 data set format.

Optionally, after the target detection, the data set module 16 is further configured to:

classifying the target image contents;

adding the categorized target image content to the data set.

Optionally, the information processing apparatus 10 further includes: and the training module 17 is used for training the MobileNet-SSD based on the data set after the classified target image content is added to the data set.

Optionally, the information processing apparatus 10 further includes:

and the display module 18 displays the digital information in the digital interface.

Optionally, the information processing apparatus 10 further includes:

and the interface module 19 is used for providing a plurality of processing interfaces through the digital interface so as to perform processing operation corresponding to the processing interfaces on the digital information through the processing interfaces.

Optionally, typesetting the information elements according to a preset typesetting rule to generate the digital information corresponding to the information to be input, including:

Optionally, the information processing apparatus 10 further includes: a segmentation module 20;

the segmentation module 20 is configured to:

judging whether the character area object and the graphic area object are crossed or not;

if so, determining a crossed area where the character area object and the graphic area object are crossed;

segmenting the intersection region, wherein the slope of a segmentation line segmenting the intersection region should satisfy: the slope is such that a straight line distance from a point, which is closest to a dividing line, of any one of the character region object and the graphic region object, at which there is a cross, to the dividing line is farthest;

and adjusting layout parameters of the character region object and the graphic region object which are crossed according to the divided crossed regions.

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

acquiring an electronic image corresponding to information to be input;

classifying the electronic image into a preset image type;

Embodiments of the present specification also provide a non-volatile computer storage medium for information processing, storing computer-executable instructions configured to:

acquiring an electronic image corresponding to information to be input;

classifying the electronic image into a preset image type;

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the system, the apparatus, the device, and the non-volatile computer storage medium, since they correspond to the method, the description is simple, and the relevant points can be referred to the partial description of the method embodiments.

The system, apparatus, device, and non-volatile computer storage medium and method provided in the embodiments of this specification correspond to each other, and they also have similar advantageous technical effects to the corresponding method.

In the 90 th generation of 20 th century, it is obvious that improvements in Hardware (for example, improvements in Circuit structures such as diodes, transistors and switches) or software (for improvement in method flow) can be distinguished for a technical improvement, however, as technology develops, many of the improvements in method flow today can be regarded as direct improvements in Hardware Circuit structures, designers almost all obtain corresponding Hardware Circuit structures by Programming the improved method flow into Hardware circuits, and therefore, it cannot be said that an improvement in method flow cannot be realized by Hardware entity modules, for example, Programmable logic devices (Programmable logic devices L organic devices, P L D) (for example, Field Programmable Gate Arrays (FPGAs) are integrated circuits whose logic functions are determined by user Programming of devices), and a digital system is "integrated" on a P L D "by self Programming of designers without requiring many kinds of integrated circuits manufactured and manufactured by special chip manufacturers to design and manufacture, and only a Hardware software is written in Hardware programs such as Hardware programs, software programs, such as Hardware programs, software, Hardware programs, software programs, Hardware programs, software, Hardware programs, software, Hardware programs, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software.

A controller may be implemented in any suitable manner, e.g., in the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers (PLC's) and embedded microcontrollers, examples of which include, but are not limited to, microcontrollers 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone L abs C8051F320, which may also be implemented as part of the control logic of a memory.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An information processing method comprising:

acquiring an electronic image corresponding to information to be input;

classifying the electronic image into a preset image type;

2. The method of claim 1, wherein when the image type is an image type including characters, performing object detection on the classified electronic image according to the image type, comprises:

3. The method of claim 2, further comprising: and supplementing sample data in the character feature set through transfer learning.

4. The method of claim 1, performing object detection on the classified electronic image according to the image type, comprising:

5. The method of claim 4, further comprising:

the data set is constructed according to the VOCdevkit and/or VOC2012 data set format.

6. The method of claim 4, after target detection, the method further comprising:

classifying the target image contents;

adding the categorized target image content to the data set.

7. The method of claim 6, after adding the categorized target image content to the data set, the method further comprising: training the MobileNet-SSD based on the data set.

8. The method of claim 1, further comprising: and displaying the digital information in a digital interface.

9. The method of claim 8, further comprising: and providing a plurality of processing interfaces through the digital interface so as to perform processing operation corresponding to the processing interfaces on the digital information through the processing interfaces.

10. The method according to claim 1, wherein typesetting the information elements according to a preset typesetting rule to generate the digitized information corresponding to the information to be entered, comprises:

11. The method of claim 10, further comprising:

12. An information processing apparatus comprising:

13. The apparatus of claim 12, wherein when the image type is an image type including characters, performing object detection on the classified electronic image according to the image type comprises:

14. The apparatus of claim 12, performing object detection on the classified electronic image according to the image type, comprising:

15. The apparatus of claim 14, the apparatus further comprising:

a data set module to construct the data set according to the VOCdevkit and/or VOC2012 data set format.

16. The apparatus of claim 12, the apparatus further comprising:

and the display module displays the digital information in a digital interface.

17. The apparatus of claim 16, the apparatus further comprising:

and the interface module provides a plurality of processing interfaces through the digital interface so as to perform processing operation corresponding to the processing interfaces on the digital information through the processing interfaces.

18. The apparatus according to claim 12, wherein typesetting the information elements according to a preset typesetting rule to generate the digitized information corresponding to the information to be entered includes:

19. The apparatus of claim 18, the apparatus further comprising: a segmentation module;

the segmentation module is configured to:

20. An electronic device for information processing, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

acquiring an electronic image corresponding to information to be input;

classifying the electronic image into a preset image type;