CN112766263A

CN112766263A - Identification method for multi-layer stock control relation share graph

Info

Publication number: CN112766263A
Application number: CN202110083415.XA
Authority: CN
Inventors: 张贝贝; 仵晨伟; 郭仲穗; 郑浩然; 魏嵬
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2021-05-07
Anticipated expiration: 2041-01-21
Also published as: CN112766263B

Abstract

The invention discloses a method for identifying a multi-layered shareholding relationship share graph. The steps are as follows: step 1, inputting a share graph to be identified; step 2, using a Faster R-CNN network to extract companies (individuals), arrows, arrows with lines and percentages coordinates; step 3, according to the divide and conquer idea, divide the share chart to be identified into multiple single-layer one-to-many or many-to-one share charts; step 4, for each one-to-many or many-to-one share chart, according to the arrow The coordinates determine the coordinates of the corner points, determine the direction of the arrow according to the coordinates of the corner points of the arrow, divide the company (individual) into the pointing object and the pointed object, and bind the number of the pointed object and the pointed object to the percentage; use OCR The identification method identifies the characters in the company (individual); step 5, constructs a directed weighted graph of the holding process according to the pointing relationship. The invention solves the problem that the original share chart in the prior art cannot directly reflect the company's shares.

Description

Identification method for multi-layer stock control relation share graph

Technical Field

The invention belongs to the technical field of image recognition, and relates to a recognition method for a multilayer stock control relationship share graph.

Background

With the change of internet technology, the field of artificial intelligence is developed more vigorously, and the proportion of related technologies and products in daily life of people is increased. The image recognition technology is an important field in artificial intelligence, is the basis of a plurality of practical technologies, such as stereoscopic vision, motion analysis, data fusion and the like, and has important application value in the fields of navigation, weather forecast, natural resource analysis, environment monitoring, physiological lesion research and the like. The specific identification and analysis of the complex image is an important field of artificial intelligence, and the target identification of the current image is mature for identifying the characteristics of license plates, human faces, pedestrians and the like; therefore, researchers hope to identify and analyze more complex relationship images (such as stock images), so that related personnel get rid of the traditional manual stock analysis method, stock right distribution can be mastered efficiently and accurately, and work efficiency is improved.

However, most of the existing stock maps come from annual or quarterly reports and related software (such as sky-eye search) published by companies, the pictures are complicated, the architecture of the stocks of the companies is difficult to understand intuitively, and in addition, the analysis is not only performed on one map and one stock of one company, so that the work is time-consuming, labor-consuming and difficult to clear. In addition, currently, no research on identification of share graphs by using an image identification technology exists at home and abroad, and no technology on aspects such as analysis of share relation graphs is researched.

Disclosure of Invention

The invention aims to provide a method for identifying a multi-layer stock control relation stock graph, which solves the problem that the original stock graph in the prior art cannot visually reflect the stocks of a company.

The technical scheme adopted by the invention is that,

a method for identifying a multi-layer stock control relationship share graph comprises the following specific steps:

step 1, inputting a share graph to be identified of a multilayer stock control relationship;

step 2, extracting coordinates of companies (individuals), arrows with lines and percentages in the picture by adopting a Faster R-CNN network;

step 3, dividing the share graph to be identified into a plurality of single-layer one-to-many or many-to-one share graphs by using the coordinates of the arrowheads with lines according to the dividing and treating thought;

step 4, determining the corner coordinates of each single-layer one-to-many or many-to-one stock image according to the arrow coordinates, and determining the direction of the arrow according to the arrow corner coordinates; dividing a company (person) into a pointing object and a pointed object according to the direction of an arrow, and then binding the pointed object and the pointed object with more parties and percentages in a one-to-one mode; finally, recognizing characters in the pointing object and the pointed object by using an OCR recognition method;

and 5, constructing an object-arrow-percentage-pointed object-oriented stock control flow directional weighting graph according to the pointing relation obtained in the step 3.

The invention is also characterized in that:

the step 2 comprises the following steps:

step 2.1, a large number of share graphs are adopted, and companies (individuals), arrows with lines and percentages in the graphs are manually marked to serve as a data set; wherein the share graph is manually divided into a plurality of single-layer one-to-many or many-to-one share graphs, and an arrow exceeding the single-layer one-to-many or many-to-one share graphs is defined as a strip line arrow;

step 2.2, building a VGG-16 network model, wherein the VGG-16 comprises 13 convolution layers, 3 full connection layers and 5 pooling layers;

step 2.3, training a data set by the VGG-16 network model;

and 2.4, detecting the share graph to be recognized by adopting the trained VGG-16 network model, and outputting a detection result, wherein the detection result is coordinates of a company (individual), an arrow and a percentage.

In step 2, the sizes of convolution kernels adopted by the 13 convolution layers are 3x3 convolution, stride 1 is adopted, padding is same as same, and each convolution layer uses a relu activation function; respectively generating positive anchors and corresponding bounding box regression offsets, and then calculating prosages;

the adopted pooling nuclear parameters of the pooling layer are all 2 multiplied by 2, stride is 2, max pooling mode; the pro-usals of the convolutional layer are used to extract the pro-visual feature from the feature maps and send it to the subsequent full-connection and softmax network for classification (i.e. what object the pro-visual is).

The step 3 is:

step 3.1, setting the upper bound, the lower bound, the left bound and the right bound of the area with the line arrow as U, D, L, R based on the coordinate of the certain line arrow obtained in step 2, and further sequentially searching and expanding the coordinate of the company (personal) name according to the four bounds, wherein the specific operation is as follows:

and expanding the upper bound U: when the absolute value of the difference between the upper bound U of the arrowed area and the lower bound D 'of the company (individual) name area is within the error mu, the upper bound U of the arrowed area is expanded to the upper bound U' of the company (individual) name area; expanding the lower bound D: when the absolute value of the difference between the lower bound D of the arrowed area and the upper bound U 'of the company (individual) name area is within the error mu, the lower bound D of the arrowed area is expanded to the lower bound D' of the company (individual) name area; and (3) expanding L: finding a group of company (personal) names under the condition that the difference between the lower boundary of the company (personal) name area and U is within the error mu range, then finding the difference between the left boundary of the group of company (personal) names and L, and expanding L into the left boundary L' of the company (personal) name area with the minimum difference; expanding R: finding a group of company (personal) names under the condition that the difference between the upper bound of the company (personal) name area and D is within the error mu range, then finding the difference between the right boundary of the group of company (personal) names and R, and expanding R into the right boundary R' of the company (personal) name area with the minimum difference; the area formed by the upper boundary U ', the lower boundary D', the left boundary L 'and the right boundary R' is the final expanded target range of the arrow coordinate with the line;

and 3.2, traversing the coordinates of the arrowheads with lines of all the stock images, repeatedly executing the step 3.1 until the whole coordinates of the trend of each arrowhead are completely expanded, and finally dividing the stock image to be identified into a plurality of single-layer one-to-many or many-to-one stock images.

The error mu is within a range of 10-30 pixels.

Step 4 comprises the following operations for each single-layer one-to-many or many-to-one strand graph:

step 4.1, determining corner point coordinates according to arrow coordinates for a single-layer one-to-many or many-to-one stock picture, and determining the direction of an arrow according to the arrow corner point coordinates:

three corner points of a certain arrow are set as (A (x)₁，y₁)，B(x₁，y₁)，C(x₃，y₃)): suppose y₁,y₂Is less than a given threshold e₁Then, the two points of the angular points A and B are considered to be on a horizontal line, and then y is judged₃And y₁If y is₃＞y₁The arrow is considered to be downward if y₃<y₁The arrow is considered to be up; traversing all the coordinates of the arrow corner points, and judging the directions one by one;

step 4.2, dividing the company (personal) name into a pointing object and a pointed object according to the pointing direction of an arrow, and then binding the pointed object and the pointed object with a large number of the pointing objects and the pointed objects in a one-to-one manner according to the percentage, wherein the inputted stock drawings are all single-layer, so that the stock drawings can be divided into two groups according to the size of the ordinate of the company name, according to the pointing direction of the arrow obtained in step 3.1, if the pointing direction is upward, the group with the largest ordinate in the company (personal) coordinate is the pointed object, and if the pointing direction of the arrow is downward, the group with the smallest ordinate in the company (personal) coordinate is the pointed object; and then one-to-one binding the pointed object and the more part of the pointed object with the percentage is carried out: the minimum abscissa and the maximum abscissa of the coordinates of four points of one of the pointed object and the pointed object are set to (x)_min,x_max) Then, thenFind the percent abscissa at (x)_min,x_max) Then binding the two in a specific data structure (such as a dictionary), traversing the remaining objects of one party with a large number, and carrying out one-to-one binding with the percentage;

and 4.3, recognizing characters in the coordinates of the pointing object and the pointed object by utilizing an OCR technology.

The step 5 comprises the following steps:

step 5.1, establishing an empty directed graph G, and using the company (individual) names obtained in the step 3.3 as nodes to be sequentially added into the directed graph G to obtain a basic directed graph G' only storing the nodes;

step 5.2, on the basis of the directed graph G' in the step 5.1, converting the pointing relationship in the step 3.2 into a triple [ u, v, w ], wherein u is a starting point and represents a pointing object; v is an end point and represents a pointed object, w is a weight and represents the percentage of stock occupation, the converted triple is used as a parameter and is added into a directed graph G ', and finally, a stock control flow directed weighted graph G' is formed.

The invention has the advantages that

The stock image is identified and analyzed by utilizing the fast R-CNN technology and the image identification technology of the deep learning framework, the defects of time and labor waste and difficulty in understanding when the stock analysis is carried out on individuals or companies are overcome, the defects of research on the aspect at home and abroad are made up, and an efficient and accurate method is provided.

Drawings

FIG. 1 is a schematic diagram of a single-layer one-to-many or many-to-one graph recognition and analysis method according to the present invention;

FIG. 2 is a schematic view of VGG-16 network structure of fast R-CNN in the single-layer one-to-many or many-to-one graph identification and analysis method of the present invention;

FIG. 3 is a diagram of the shares input in embodiment 1 of the method for identifying and parsing a single-layer one-to-many or many-to-one share diagram of the present invention;

FIG. 4 is a graph of the results obtained after performing step 3 in example 1 of the method for identifying and parsing a single-layer one-to-many or many-to-one strand graph according to the present invention;

fig. 5 is a complex network diagram finally obtained in embodiment 1 of the method for identifying and resolving a single-layer one-to-many or many-to-one share diagram according to the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, a method for identifying a multi-layer stock control relationship share graph includes the following specific steps:

In the step 1, for a share graph to be identified of a multilayer stock control relation, the share graph needs to be scaled to a fixed size;

the step 2 comprises the following steps:

step 2.3, training a data set by the VGG-16 network model;

The step 3 is:

The error mu is within a range of 10-30 pixels.

step 4.2, dividing the company (personal) name into a pointed object and a pointed object according to the pointing direction of an arrow, binding the pointed object and the pointed object with a larger number of the pointed objects and the pointed objects one by one with a percentage, dividing the stock drawings into two groups according to the size of the ordinate of the company name because the input stock drawings are all single-layer, and if the arrow points upwards, dividing the company (personal) name into one group with the largest ordinate in the company (personal) coordinatesThe group is the pointed object, if the arrow points downwards, the group with the smallest ordinate in the company (personal) coordinates is the pointed object; and then one-to-one binding the pointed object and the more part of the pointed object with the percentage is carried out: the minimum abscissa and the maximum abscissa of the coordinates of four points of one of the pointed object and the pointed object are set to (x)_min,x_max) Then find the abscissa of the percentage in (x)_min,x_max) Then binding the two in a specific data structure (such as a dictionary), traversing the remaining objects of one party with a large number, and carrying out one-to-one binding with the percentage;

The step 5 comprises the following steps:

Example 1

Step 1 is executed, and a share graph to be identified is input as a graph 3;

executing step 2, wherein the data sets mainly come from a Chinese bidding network and a huge tide information network, the total quantity value exceeds 100G, and the number of the original data sets is 3200 because a single image of a stock right image contains the characteristics of a plurality of target images, the number of the existing data sets is overturned by utilizing an open-cv library, and the like, the number of the expanded data sets is 11000, and the number of the target images of each category exceeds 60000; the OCR technology calls an existing mature OCR interface (such as an API of hundred-degree OCR) to perform recognition, so that the recognition rate is improved;

step 3 is executed, wherein mu is 15, the output result is shown in figure 4, as can be seen, each frame in the figure is a one-to-many or many-to-one share graph of one layer;

and 4, executing the step 4, wherein the complex network for constructing the directional relation is a visual network constructed based on graph theory and a complex network modeling tool NetworkX, and the finally obtained stock control flow directional weighting graph is shown in FIG. 4.

Claims

1. a kind of identification method for multi-layer holding relationship share chart, is characterized in that, concrete steps are:

Step 1, input the share map of the multi-level holding relationship to be identified;

Step 2: Use the Faster R-CNN network to extract the coordinates of the company (individual), arrow, arrow with line and percentage in the picture;

Step 3, according to the idea of divide and conquer, use the coordinates with line arrows to divide the share chart to be identified into multiple single-layer one-to-many or many-to-one share charts;

Step 4, for each single-layer one-to-many or many-to-one share chart, determine the corner coordinates according to the arrow coordinates, and determine the direction of the arrow according to the arrow corner coordinates; according to the direction of the arrow, the company (individual) is divided into pointing objects and The pointed object, then bind the pointed object and the one with the largest number of pointed objects to the percentage one-to-one; finally, use the OCR recognition method to identify the text in the pointed object and the pointed object;

Step 5: According to the pointing relationship obtained in Step 3, construct a directed weighted graph of the holding process of "object-arrow-percentage-pointed object".

2. a kind of identification method for multi-layer holding relationship share chart as claimed in claim 1, is characterized in that, described step 2 comprises:

Step 2.1, take a large number of share charts and manually label the companies (individuals), arrows, arrows with lines and percentages in the chart as a dataset; the share chart is manually divided into multiple single-layer one-to-many or many-to-one For the share chart, the arrows beyond the single-layer one-to-many or many-to-one share chart are defined as arrows with lines;

Step 2.2, establish a VGG-16 network model, the VGG-16 includes 13 convolutional layers, 3 fully connected layers, and 5 pooling layers;

Step 2.3, the VGG-16 network model trains the dataset;

In step 2.4, the trained VGG-16 network model is used to detect the stock graph to be identified, and the detection result is output, and the detection result is the coordinates of the company (individual), arrow and percentage.

3. a kind of identification method for multi-layer holding relationship share chart as claimed in claim 1, is characterized in that, the size of the convolution kernel that 13 described convolution layers in step 2 adopts is 3x3 convolution, adopts step 2. The width stride=1, the padding method is padding=same, and each convolutional layer uses a relu activation function; respectively generate positive anchors and the corresponding bounding box regression offset, and then calculate proposals;

The pooling kernel parameters used in the pooling layer are all 2×2, stride=2, max pooling method; use the proposals of the convolutional layer to extract the proposal feature from the feature maps and send it to the subsequent full connection and The softmax network is used for classification (that is, what object is the classification proposal).

4. a kind of identification method for multi-layer holding relationship share chart as claimed in claim 1, is characterized in that, described step 3 is:

Step 3.1, based on the coordinates of a line arrow obtained in step 2, set the upper bound, lower bound, left bound and right bound of the area with line arrow as U, D, L, R, and then according to the four bounds in turn the company The coordinates of the (personal) name are searched and expanded, and the specific operations are as follows:

Extend the upper bound U: when the absolute value of the difference between the upper bound U of the area with a line arrow and the lower bound D' of a company (person) name area is within the range of error μ, the upper bound U of the area with a line arrow will be Extend the upper bound U' of this company (person) name area; expand the lower bound D: when the absolute value of the difference between the lower bound D of the area with a line arrow and the upper bound U' of a certain company (person) name area is within the error μ When it is within the range, expand the lower bound D of the area with line arrows to the lower bound D' of the company (person) name area; expand L: find a group of company (person) names whose condition is the company (person) name The difference between the lower bound of the area and U is within the range of error μ, and then the difference between the left boundary of this group of company (personal) names and L is obtained, and L is expanded to the area of the company (personal) name with the smallest difference. Left boundary L'; Extend R: find out a group of company (personal) names, the condition is that the difference between the upper bound of the company (personal) name area and D is within the range of error μ, and then find this group of companies The difference between the right boundary of the (personal) name and R, expand R to the right boundary R' of the company (personal) name area with the smallest difference; at this time, the upper bound U', the lower bound D', and the left bound L' , the area formed by the right boundary R' is the last extended target range with the coordinates of the line arrow;

Step 3.2, traverse the coordinates of the arrows with line in all share charts, repeat step 3.1 until the overall coordinates of each arrow direction are expanded, and finally divide the share chart to be identified into multiple single-layer one-to-many or multiple pairs A stock chart.

5 . The method for identifying a multi-layer shareholding relationship share graph according to claim 4 , wherein the error μ takes a range of 10 to 30 pixels. 6 .

6. a kind of identification method for multi-layer holding relationship share chart as claimed in claim 1, is characterized in that, described step 4 comprises doing the following for each single-layer one-to-many or many-to-one share chart:

Step 4.1, for a single-layer one-to-many or many-to-one share chart, determine the corner coordinates according to the arrow coordinates, and determine the direction of the arrow according to the arrow corner coordinates:

The three corners of an arrow are set as (A(x ₁ , y ₁ ), B(x ₁ , y ₁ ), C(x ₃ , y ₃ )): Assume that the difference between y ₁ and y ₂ is less than given If _{y 3} _> _{y 1} _, the arrow is considered to be downward, if _{y 3} _< y ₁ , the arrow is considered to be upward; traverse all the coordinates of the corners of the arrow, and determine the direction one by one;

Step 4.2, according to the direction of the arrow, divide the company (individual) name into the pointing object and the pointed object, and then bind the one with the largest number of the pointed object and the pointed object to the percentage one-to-one. It is a single layer, so it can be divided into two groups according to the size of the ordinate of the company name. According to the direction of the arrow obtained in step 3.1, if it points upward, the group with the largest ordinate in the company (personal) coordinates is pointed. Object, if the arrow points downward, the group with the smallest vertical coordinate in the company (personal) coordinates is the pointed object; then bind the pointed object and the one with the largest number of pointed objects to the percentage one-to-one: Assuming that the pointed object and the one with the largest number of pointed objects, the smallest abscissa and the largest abscissa among the coordinates of the four points of a certain object are (x _min , x _max ), then find the abscissa of the percentage in (x _min , x _max ), then bind the two in a specific data structure (such as a dictionary), traverse the remaining objects of the party with the largest number, and bind one-to-one with the percentage;

Step 4.3, using OCR technology to identify the characters in the coordinates of the pointing object and the pointed object.

7. a kind of identification method for multi-layer holding relationship share chart as claimed in claim 1, is characterized in that, described step 5 comprises:

Step 5.1, create an empty directed graph G, use the company (person) name obtained in step 3.3 to add it as a node into the directed graph G in turn, and obtain the basic directed graph G' with only existing nodes;

Step 5.2, on the basis of the directed graph G' in step 5.1, convert the pointing relationship in step 3.2 into a triple [u, v, w], where u is the starting point, representing the pointing object; v is the end point, representing the The pointed object, w is the weight representing the percentage of shares, and the converted triplet is used as a parameter to add it to the directed graph G', and finally form a directed weighted graph G" of the holding process.