CN113591829B

CN113591829B - Character recognition method, device, equipment and storage medium

Info

Publication number: CN113591829B
Application number: CN202110711927.6A
Authority: CN
Inventors: 高祥路
Original assignee: Shanghai Yitan Network Technology Co ltd
Current assignee: Shanghai Yitan Network Technology Co ltd
Priority date: 2021-05-25
Filing date: 2021-06-25
Publication date: 2024-02-13
Anticipated expiration: 2041-06-25
Also published as: CN113591829A

Abstract

The invention discloses a character recognition method, a device, equipment and a storage medium, which aim at the problems that in the existing game live broadcast, the live broadcast character is recognized in real time and fed back to a user or displayed on a screen of a live broadcast room in time, the real-time requirement is high, and the technology is difficult to realize; taking a character area in the live broadcast picture as a target, identifying the area where the character is located by adopting a target detection algorithm, and extracting the character area obtained by identification; recognizing characters in the character area through a character recognition model, so as to obtain real-time game information in live game; the real-time state of the main role is obtained, such as whether death occurs or not, so that corresponding appreciation prompts and texts are popped up in the live broadcasting room, interactive appreciation of users in the live broadcasting room is increased, and interestingness and viscosity of the users in watching live broadcasting are improved.

Description

Character recognition method, device, equipment and storage medium

Technical Field

The invention belongs to the technical field of live game broadcasting, and particularly relates to a character recognition method, a character recognition device, character recognition equipment and a storage medium.

Background

The main angles of the current game groups are increased, and more players watch live broadcast of various kinds of games through a game live broadcast platform. In order to increase interaction and interestingness between a host and a user in live broadcast, dynamic game content in live broadcast can be identified in real time, so that possibility of innovative application is provided, user experience and viscosity are improved, and host and platform revenues are increased.

Content elements in a game typically include the character used by the user, the character's current battle, etc. Taking a queen glory game as an example, important content elements in the game include heroes used by users, skills usable by heroes, current hero grades, battle numbers/death numbers and other battle data. And whether currently in a dead state, or a more highlight moment: three kills, four kills, five kills, etc.

A live platform typically has multiple live rooms in the same time slot. This presents the following technical difficulties for the platform:

1. the host broadcasting can interact with the user in the live broadcasting process, however, for the continuous change of a specific object target of dynamic content in the live broadcasting, real-time identification needs to be carried out, the specific object target is fed back to the user in time or displayed on a screen of a live broadcasting room in time, tens of image frames acquired in a video stream also need to be distinguished from the image frames, the specific object target is tracked, and after the change is carried out, the target is calculated and the like, the display or feedback is carried out, so that the real-time requirement is very high, and the processing technology difficulty exists at present.

2. The live broadcast has multiple paths of simultaneous live broadcast, the data volume processed by the real-time image frames is large, the multiple paths of simultaneous processing are needed, and a set multiple-path processing strategy is needed, so that the real-time performance is realized, and the parallelism of the multiple paths of simultaneous processing is considered.

3. Tracking the change of the specific object target has the condition of frame missing and how to process the following.

Disclosure of Invention

The invention aims to provide a character recognition method, a device, equipment and a storage medium, which can recognize live data of a game in real time, acquire real-time states of a master role, and if the master role dies, whether the master role kills three kills/four kills/five kills or not, and the like, so that corresponding appreciation prompts and text records are popped up in a live broadcasting room, thereby increasing interactive appreciation of users in the live broadcasting room and improving interestingness and viscosity of the users when watching the live broadcasting.

In order to solve the problems, the technical scheme of the invention is as follows:

a character recognition method for live internet game, the character recognition method comprising:

acquiring a room identifier corresponding to a room for starting live broadcasting, starting a live broadcasting recognition task according to the room identifier, and recognizing picture characters in live broadcasting;

taking a character area in the live broadcast picture as a target, identifying the area where the character is located by adopting a target detection algorithm, and extracting the character area obtained by identification;

And recognizing the characters in the character area through a character recognition model, so as to obtain real-time game information in the live game.

According to an embodiment of the present invention, the acquiring the room identifier corresponding to the room with the live broadcast turned on further includes:

and receiving a live broadcast request sent by a user, acquiring a user identifier according to the live broadcast request, and endowing the user identifier with a direct broadcast room number, wherein the direct broadcast room number is the room identifier.

According to an embodiment of the present invention, the identifying the area where the character is located by using the object detection algorithm further includes:

the character area is identified by using an object detection model based on the YOLOv5 algorithm.

According to an embodiment of the present invention, the identifying the character area using the object detection model based on YOLOv5 algorithm further includes:

sequentially creating Input, backbone, neck and Prediction network structures to form a YOLOv5 detection model; creating Conv2d3x3 and a plurality of Ghost BottleNeck modules to form Ghost Ne;

removing a preset part of backbond and Neck network structures from a YOLOv5 detection model, and replacing the removed preset part of backbond and Neck network structures with Conv2d3x3 and a plurality of GhostBottleNeck modules in the GhostNet to obtain a target detection model;

And carrying out recognition training on the relevant character area of the live game on the target detection model to obtain a trained target detection model.

performing character region detection on the live broadcast picture by using an MSER algorithm to obtain a candidate region containing characters in the live broadcast picture, and performing vertical projection on the candidate region to obtain at least one character region;

for each character region of the obtained at least one character region, determining a row region to which a character corresponding to the corresponding character region in the live broadcast picture belongs according to the corresponding character region and by combining a clustering algorithm;

when the line region contains a plurality of characters, merging a plurality of character regions corresponding to the plurality of characters in the line region based on the distance between adjacent characters to obtain a target character region;

when the line area contains one character, a character area corresponding to the one character is taken as a target character area.

According to an embodiment of the present invention, the identifying the characters in the character area by the character identification model further includes:

characters in the character region are recognized using a cnocr model or a PaddleOCR model, the characters including numeric and english characters.

According to an embodiment of the present invention, after the character recognition model recognizes the characters in the character area, the method further includes:

storing the recognition results of the continuous frames as a time sequence, and comparing the values of the recognition results in the time sequence with the character areas, so as to remove abnormal recognition data and ensure that the recognized digital results are consistent in the time sequence;

when the network is blocked, the latest frame in the live stream is extracted for identification, and backlog frames caused by the network blocking are discarded, so that a real-time character identification result is provided.

A character recognition apparatus for live internet game, the character recognition apparatus comprising:

the picture acquisition module is used for acquiring a room identifier corresponding to a room for starting live broadcasting, starting a live broadcasting identification task according to the room identifier, and identifying picture characters in live broadcasting;

the character area recognition module is used for taking a character area in the live broadcast picture as a target, recognizing the area where the character is located by adopting a target detection algorithm, and extracting the recognized character area;

and the character recognition module is used for recognizing the characters in the character area through the character recognition model so as to obtain real-time game information in the live game.

A character recognition apparatus comprising:

the device comprises a memory and a processor, wherein instructions are stored in the memory, and the memory and the processor are interconnected through a line;

the processor calls the instructions in the memory to realize the character recognition method in one embodiment of the invention.

A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a character recognition method in an embodiment of the invention.

By adopting the technical scheme, the invention has the following advantages and positive effects compared with the prior art:

1) Aiming at the problems that in the existing game live broadcast, the live broadcast role is identified in real time and timely fed back to a user or timely displayed on a screen of a live broadcast room, the real-time requirement is high, and the technology is difficult to realize, the character identification method in an embodiment of the invention starts a live broadcast identification task according to the room identification by acquiring the room identification corresponding to the room for starting the live broadcast, and identifies the picture characters in the live broadcast; taking a character area in the live broadcast picture as a target, identifying the area where the character is located by adopting a target detection algorithm, and extracting the character area obtained by identification; recognizing characters in the character area through a character recognition model, so as to obtain real-time game information in live game; the real-time state of the main broadcasting role is obtained, if the main broadcasting role dies, whether the main broadcasting role kills three times, four times or five times or the like is obtained, so that corresponding appreciation prompts and texts are popped up in the live broadcasting room, the interactive appreciation of users in the live broadcasting room is increased, and the interestingness and viscosity of the users in watching live broadcasting are improved.

2) According to the character recognition method in the embodiment of the invention, the main network of the target detection model based on the YOLOv5 algorithm is replaced by the GhostNet, a plurality of 'phantom' feature graphs (Ghost feature maps) capable of exploring needed information from original features are generated by adopting the GhostNet at a small cost, the Ghostbottleneck is obtained by stacking the Ghost modules, a lightweight neural network GhostNet is built, more obvious features of images can be obtained by using the GhostNet, and the target recognition accuracy is improved.

3) According to the character recognition method in the embodiment of the invention, aiming at the phenomenon that the definition of a picture in live broadcast is limited, a certain error rate exists in character recognition or service blocking caused by frame missing or too slow processing exists in live broadcast, the recognition result of continuous frames in a live broadcast stream is stored as a time sequence, the value of the recognition result in the time sequence is compared with the recognized character area, so that abnormal recognition data is removed, the recognized digital result is ensured to be consistent in the time sequence, and the optimized recognition result is used as a final result. For the case of network blocking, the latest frame in the live stream needs to be extracted for identification every time, and backlog frames caused by the network blocking are discarded, so that the identification result of each time is a real-time result.

Drawings

FIG. 1 is a flow chart of a character recognition method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram showing the composition of a YOLOv5 detection model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a convolutional layer in a GhostNet in accordance with an embodiment of the invention;

FIG. 4 is a schematic diagram of a Ghost module in an embodiment of the invention;

FIG. 5 is a block diagram of a character recognition device according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a character recognition apparatus according to an embodiment of the present invention.

Detailed Description

The following describes in further detail a character recognition method, device, apparatus and storage medium according to the present invention with reference to the accompanying drawings and specific embodiments. Advantages and features of the invention will become more apparent from the following description and from the claims.

Example 1

Aiming at the problems of high real-time requirement and difficult realization of technology, the embodiment provides a character recognition method which can obtain real-time game information in the live game in the existing live game, wherein the real-time recognition of the live game characters is timely fed back to users or timely displayed on a screen of a live broadcasting room, the real-time state of the main broadcasting role is obtained, if the main broadcasting role dies, whether the main broadcasting role kills three times, four times or five times or the like is obtained, so that corresponding appreciation prompts and texts are popped up in the live broadcasting room, the interactive appreciation of users in the live broadcasting room is increased, and the interestingness and viscosity of the users in watching live broadcasting are improved.

Referring to fig. 1, the character recognition method includes the steps of:

s1: acquiring a room identifier corresponding to a room for starting live broadcasting, starting a live broadcasting recognition task according to the room identifier, and recognizing picture characters in live broadcasting;

s2: taking a character area in the live broadcast picture as a target, identifying the area where the character is located by adopting a target detection algorithm, and extracting the character area obtained by identification;

s3: and recognizing the characters in the character area through a character recognition model, so as to obtain real-time game information in the live game.

Specifically, in the application scenario of the invention, the user equipment is provided with the live broadcast client, when a user opens the live broadcast client to enter a live broadcast room to watch live broadcast, the live broadcast client transmits the live broadcast video stream to the server while the live broadcast client loads the live broadcast video stream, the server processes video frames in the live broadcast video stream after receiving the live broadcast video stream transmitted by the live broadcast client, identifies text object targets in real time and returns a calculated result to the user equipment after calculation, if the live broadcast is a game, the loadable information area is not a main broadcast frame area, thereby improving the watching experience of the user watching the live broadcast.

The user terminal and the server are in communication connection through a network, and the network may be a local area network, a wide area network, or the like. The user terminal may be a portable device (e.g., a mobile phone, a tablet, a notebook, etc.), a personal computer (PC, personal Computer), a server may be any device capable of providing internet services, a live client in the user terminal may be live, etc.

The character recognition method is specifically described below. In step S1, obtaining a room identifier corresponding to a room that is turned on live further includes: and receiving a live broadcast request sent by a user, acquiring a user identifier according to the live broadcast request, and endowing the user identifier with a direct broadcast room number, wherein the direct broadcast room number is the room identifier.

In general, a live game platform, which may also be referred to as a live game system, includes servers (e.g., combat servers, lobby servers), user terminals (e.g., combat terminals, spectator terminals). Wherein, the fight user corresponding to the fight terminal and the sightseeing user corresponding to the sightseeing terminal belong to the users in the same living room, such as living room a. The server is connected with the user terminal through a network, and the network comprises network entities such as a router, a gateway and the like.

When the user terminal needs live broadcast, a live broadcast request is sent to a server through a network, wherein the live broadcast request needs to contain a user identification (such as a user name or a user number) of the user terminal. After receiving the live broadcast request, the server extracts the user identification in the live broadcast request, gives the user identification a direct broadcast room number, and returns the direct broadcast room number to the user terminal, wherein the direct broadcast room number is the room identification.

When the user terminal starts live broadcasting, the user identification is sent to the hall server, and then the hall server finds out the live broadcasting room to which the user identification belongs according to the user identification, and starts a live broadcasting identification thread to identify the picture characters in live broadcasting in real time.

It can be appreciated that when a live broadcast is started from a live broadcast room, the backend initiates an identification request. And starting a corresponding live broadcast identification thread by a background in the live broadcast stream identification task so as to identify a Chinese object target in the live broadcast picture in real time.

When the live broadcast for character object target identification is started, triggering the back end to initiate an identification request. And presetting a live broadcast identification thread in the background. And the live client sends the live video stream to the server. Specifically, when a user enters a live room, a video stream is loaded to a live client, and the live client also transmits the video stream to a server.

In step S2, the character area in the live broadcast picture is taken as a target, the area where the character is located is identified by adopting a target detection algorithm, and the identified character area is extracted.

The main recognition object in this embodiment is a character in a live broadcast picture, such as a user identifier of a live broadcast character, a skill used by the live broadcast character, a current level, a number of clicks/a death number, and other battle data, and whether the live broadcast character is in a death state or not, or at a time of high light highlighting: data of three kills, four kills, five kills, and the like. To acquire these data, the characters representing these data in the live view need to be recognized in real time.

Currently, the following difficulties exist in the recognition of characters in video frames:

difficulty 1: in the video stream, immediately starting to acquire a video frame containing characters after the characters appear;

difficulty 2: positioning a character area in a video frame;

difficulty 3: and the characters are recognized quickly and effectively.

The positioning of the character areas in the video frame has a plurality of methods:

(1) Gradient-based approach:

the character has the largest characteristic of a large gradient, and the character area is positioned by utilizing the gradient characteristic of the character.

The specific method comprises the following steps: and (3) the characteristic of frequent region gradient transformation, the high-pass filtering is utilized to keep the region of frequent gradient transformation, and adjacent regions are combined after morphological processing so as to determine candidate regions of the license plate.

(2) A character texture based method:

wavelet change is performed on the character area, and the character area is extracted by using texture information of the character

(3) Based on the gray level histogram.

However, these methods are not suitable for the speed requirement in live broadcast, and for this purpose, the target detection algorithm is used in this example to identify the region where the character is located from the current predicted video frame.

In this embodiment, the character appearance area is taken as a target, a target detection model is set, and the model building includes the following steps:

using the YOLO algorithm, the image is first divided into S x S boxes, which are responsible for detecting a target if the center of the target falls into a grid, and each grid predicts B Bounding boxes and confidence values reflecting the confidence of the model in whether the box contains the target object, and the accuracy with which it predicts the box. It defines the confidence value as:

if there is no target, the confidence value is zero and the confidence score is equal to the intersection IOU of the joint part between the prediction frame and the true value.

Each Boundingbox contains 5 values: x, y, w, h and confidence, where the (x, y) coordinates represent the center of the bounding box relative to the grid cell bounding box, the width w and height h are relative to the whole frame image prediction, confidence represents the IOU between the predicted box and the actual bounding box, each grid cell also predicts C conditional class probabilities:

Pr(Class _i |Object)

These probabilities are a set of class probabilities that each grid cell predicts, subject to the grid containing targets, regardless of the number B of bounding boxes.

Each frame image is divided into a network of sxs and each grid cell predicts B bounding boxes, the confidence of which and the prediction of the C class probabilities are encoded as the number sxs× (b× 5+C).

At test time, multiplying the conditional class probability and confidence prediction of the individual box yields:

these scores encode the probability that such a target appears in the box and the extent to which the predicted box fits the target.

Then, a convolutional neural network is used to implement the YOLO algorithm and evaluate on the Pascal VOC detection dataset, the initial convolutional layer of the network extracts features from the image, and the full-join layer is used to predict the output probability and corresponding coordinates of the character occurrence region.

And giving out a corresponding training set from the predicted character occurrence area, and respectively putting the labeled training sets into a target detection model for training.

And inputting the current predicted video frame into the target detection model, and predicting the character occurrence area and the corresponding boundary frame information in the video frame.

The realization speed of the mode is very high, and the requirement on the identification speed in live broadcast can be met.

Further, the present embodiment also provides for identifying character regions using a YOLOv5 algorithm-based object detection model. The object detection model recognition character area based on YOLOv5 algorithm further comprises:

removing a back bone and a Neck network structure of a preset part of a YOLOv5 detection model, and replacing the removed back bone and Neck network structure of the preset part with Conv2d3x3 and a plurality of Ghost BottleNeck modules in the Ghost Net to obtain a target detection model;

Referring to fig. 2, the yolov5 detection model includes four networks of Input (Input), backbone, neck and output (Prediction). Wherein the Input end (Input) preprocesses the Input image, such as calculating an initial anchor frame; the BackBone is used for extracting image features; the Neck is used for multi-scale feature fusion; the output terminal (Prediction) is used for target detection and outputting the position of the target frame.

Fig. 2 is a general framework diagram of YOLOv5 detection models, where YOLOv5 has four models for meeting different deployment requirements, namely YOLOv5s, YOLOv5m, YOLOv51, YOLOv5x, and the embodiment selects YOLOv5s as a basic model of the logistic violation detection model. The yolov5s requires 3 images 608 x 608 to be Input into the Input network. Since the sizes of images taken from video are different, it is necessary to unify the sizes of the input images of different sizes. The present embodiment first uses a gaussian filter to remove noise (some extraneous information) from the input image.

A gaussian filter is a type of linear smoothing filter that selects weights according to the shape of the gaussian function. Gaussian smoothing filters are very effective in suppressing noise that is subject to normal distribution. For image processing, a two-dimensional zero-mean discrete gaussian function is commonly used as a smoothing filter. The Gaussian function is:

where σ is a smoothing parameter, the larger σ is, the wider the band of the gaussian filter is, and the better the smoothing degree is.

And then, performing Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive image scaling on the image after noise removal.

Among them, since the average Accuracy (AP) of small targets is generally much lower than that of medium and large targets. While the dataset also contains a large number of small objects, it is cumbersome that the distribution of small objects is not uniform. From the paper Augmentation for small object detection published in 2019, it is known that:

	Min rectangle area	Ma×rectangle area
			Small object	0*0	32*32
Medium object	32*32	96*96
			Large object	96*96	∞*∞

It can be seen that the definition of a small object is an object whose length and width are between 0 x 0 and 32 x 32. The following refers to a dataset for YOLOv5 model training with the following ratios of small, medium and large targets:

	Small	Mid	Large
				Ratio of total boxes(％)	41.4	34.3	24.3
Ratio of images included(％)	52.3	70.7	83.0

the duty cycles of the small, medium, and large targets are not balanced throughout the dataset. As shown in the table above, the small target duty cycle in this dataset reached 41.4% and the number was greater than both the medium and large targets.

A GhostNet model is created, typically comprising a convolutional layer, a pooling layer, and a fully-connected layer, wherein the convolutional layer comprises a plurality of Ghost bottlenegk modules. In more detail, the GhostNet model specifically includes: conv2d3x3, multiple Ghost BottleNeck modules, conv2dlxl, avgPool7x7, conv2d lx1, and full connectivity layer. In the embodiment, the convolution layer in the GhostNet model is mainly adopted to extract the image characteristics. Referring to fig. 3, a schematic diagram of the construction of the Ghost bottlecck module is shown. The gate Bottleneck module with the structure=2 has the functions of learning features and downsampling. The structure of the Ghost bottleneck module is very similar to the resnet structure, except that channel is dimension-first-dimension-second.

And removing a preset part of backbond and Neck network structures from the YOLOv5 detection model, and replacing the removed preset part of backbond and Neck network structures with Conv2d3x3 and a plurality of Ghost Bott1eNeck modules in the Ghost Net to obtain the target detection model.

The GhostNet model incorporates DepthWise convolution operations, i.e., the correlation operations performed by the Ghost BottleNeck module. Specifically, since there is redundancy between feature graphs in the YOLOv5 detection model network, both connected boxes exhibit similarity between feature graphs. So that similar feature maps can be obtained without conventional convolution operations and can be obtained using a DepthWise convolution operation. As shown in fig. 4, the graph model structure diagram first obtains half of the feature map using conventional convolution, and then uses DepthWise convolution operation on the half of the convolution to obtain another part of the feature map. And redundancy among the feature graphs is reduced to a great extent, and the calculated amount of the target detection model is saved.

Performing recognition training on the relevant character area of the live game on the target detection model, and obtaining the trained target detection model further comprises the following steps:

initializing parameters of a target detection model, wherein the parameters comprise input image size, initial learning rate, ghostNet layer class type and depth of a convolution kernel, and reducing the initial learning rate by using a cosine annealing strategy; the cosine annealing strategy has the following calculation formula:

Wherein L represents the learning rate, i represents the ith training,respectively represent the maximum value and the minimum value of the learning rate of the ith training, N _i Represents the total number of iterations in the ith training, N represents the nth iteration in the ith training, n=1, …, N.

In practical application, according to training requirements and display card performance constraints, the size of an input image can be fixed to 512 x 512, the initial learning rate is set to 0.0025, and the learning rate is reduced according to a cosine function by using a learning rate cosine annealing strategy along with the iteration times, and the learning rate is reduced from maximum to minimum in one period. The learning rate determines the update speed of the weight, and too high a setting may cause the result to cross the optimal solution, and too low a setting may cause the loss of downloading speed to be too slow. The GhostNet layer class is modified to be 1 and the depth of the GhostNet convolution kernel is modified accordingly to be 18.

In addition to the above-mentioned object detection model, the present embodiment also provides another object detection algorithm, namely:

Before the character area in the live broadcast picture is detected, the live broadcast sample image can be subjected to labeling processing to obtain corresponding sample image labeling data for subsequent comparison with a recognition result obtained through a character recognition model. Meanwhile, the labeling data can be used for filtering the sample image, so that an effective sample image can be obtained. And then, performing character region detection on the filtered sample image by using an MSER algorithm to obtain a character region in the sample image.

Here, the rule for labeling the sample image includes: the content of the label, the form of the label, and the storage format of the label data. In practical applications, the content of the label may include the character recognition results of all text lines in the sample image and the position information of the character areas of all text lines in the sample image. The labels may be in the form of character meaning of the text line and four position coordinates of the character region of the text line in the sample image (e.g., the labels may be in the form of (character meaning, lower left corner x1, lower left corner y1, upper right corner x2, upper right corner y 2)). The storage format of the annotation data may be a box file.

The basic idea of the MSER algorithm is to binarize an image (binarization can convert a gray image into a binary image, set a pixel gray level greater than a certain critical gray level (binarization threshold) to a gray level maximum value, and set a pixel gray level smaller than this value to a gray level minimum value, thereby realizing binarization), and the binarization threshold takes [0, 255] (the binarization threshold varies upward from 0 to 255). In this way, the binarized image undergoes a process from full black to full white. In this process, there are some connected regions with small area variation with increasing binarization threshold, and this region is called the maximum stable extremum region. The mathematical formula is explained as follows:

for image P (x), x e Q, Q is a finite set of real functions containing pixel elements, define q= [1, 2..n ] (N is the total number of pixels contained in image P (x)), S (x) is a set of levels in image P (x), x e Q, S (x) is a set of gray levels less than or equal to P (x), S (x) can be expressed by the formula:

S(x)＝{y∈Q|P(y)≤p(x)}

wherein S (x) is a level set in the image P (x), P (x) is a binary threshold, and P (y) is a pixel gray value in the image P (x) that satisfies P (x) or less. The sequence (x 1, x2,) x is a connected sequence comprising a number of pixels, a connected component R of Q is a subset of Q, and each pair of pixels (x 1, x 2) ∈r is connected by a path in R. If any connected component R' containing R is equal to R, then R is referred to as the largest connected component. The extremum region C is defined as the largest connected component of the level set S (x). The set of all extremum regions of the image P (x) is denoted by C (i). Of all extremum regions of C (i), the MSER algorithm is only interested in regions that meet certain stationary criteria (i.e. the maximally stable extremal regions). The determination of whether the stability criterion is met may be implemented using a stability indicator q (i):

q(i)＝|Q _i+Δ -Q _i-Δ |/Q _i

Wherein q (i) is a stability index of the ith communication region (when q (i) is smaller than a stability criterion threshold, the region is considered as a maximum stability extremum region); q (Q) _i The i-th communication region area is represented, and Δ represents a minute threshold variation amount.

In this embodiment, the candidate region containing the character in the sample image is obtained by using the MSER algorithm. The MSER algorithm has the characteristics of significance, affine invariance, stability and the like, and can accurately extract a target region in a complex background.

In actual application, performing character region detection on the sample image by using an MSER algorithm to obtain a candidate region containing characters in the sample image; and carrying out vertical projection on the candidate region to obtain at least one character region, and determining a row region to which a character corresponding to the corresponding character region in the sample image belongs according to the corresponding character region and a clustering algorithm aiming at each character region of the obtained at least one character region. When the line area contains a plurality of characters, merging a plurality of character areas corresponding to the plurality of characters in the line area based on the distance between adjacent characters to obtain a character area; when the line area contains one character, a character area corresponding to the one character is taken as a character area.

The above-mentioned first carries out vertical projection to the candidate region containing the character, so as to realize the segmentation of the candidate region containing the character. And setting an information entropy threshold (e.g., 0.25) of the projection histogram according to the empirical value, and kicking out candidate areas containing characters with information entropy smaller than the threshold to obtain a plurality of character areas, wherein each character area contains one character. Since the aspect ratio of a character satisfies a certain morphological constraint, after the candidate region containing the character is projected in the vertical direction, the aspect ratio of each character region in the plurality of character regions after the projection segmentation can be calculated, and the character regions which do not satisfy the morphological constraint can be removed.

The character areas subjected to morphological constraint screening have local consistency in the distribution of row coordinates, the coordinates of all the character areas meeting morphological constraint are counted, and the row coordinates of each text row in the sample image can be determined through a local clustering algorithm (the vertical coordinates of the areas containing one character in each row are the same). In practical application, when a text line contains a plurality of characters, character areas corresponding to the plurality of characters in each text line are combined (namely, the areas containing one character in the same line are communicated) by judging the distance between the characters, so that the combined text line areas are obtained. When a text line contains only one character, the character area corresponding to the one character is used as the text line area. The text line area is herein referred to as a character area.

In practical applications, detecting characters in a video stream requires extracting frames from the video stream, and predicting video frames in which character object targets need to be identified. The video stream extraction method comprises the following steps:

different tag characters are set in advance according to service logic, and frame extraction setting of different frequencies is carried out;

and after the current video stream is obtained from the live broadcast, performing frame extraction operation of corresponding frequency according to the current service logic.

As the number of the platform live broadcasting ways can reach hundreds to thousands ways, the number of frames per second of each way can reach 20 to 30 frames, and the real-time identification has higher requirement. According to business logic, different label characters are subjected to frame extraction with different frequencies, and identification is performed. For example, a person in a live broadcast can be identified by taking one frame every few seconds, and some specific trigger events (such as killing) take one frame every second. Therefore, the service calculation amount is greatly saved, GPU machine resources are saved, and the real-time performance of identification is ensured.

Since many anchor programs are often added with frames outside live broadcast pictures in live broadcast such as shopping and games, some anchor programs are hung in the frames. Thus, the character position in the live broadcast is not fixed. To recognize these characters, it is necessary to locate the position of the characters. By using the character area as a target object and using a target detection algorithm, the area where the character is located can be identified more accurately. The approximate region position of the character to be recognized is cut out according to the business logic, so that the resolution of the image to be processed is reduced. And (3) carrying out data annotation on special graphic elements existing at the positions of the characters, training a target detection model, thereby identifying the symbols and positioning the symbols to the character areas to be identified.

In step S3, the characters in the character area are recognized by the character recognition model, so as to obtain real-time game information in the live game.

Characters are identified from the identified character areas. The identification may also utilize neural network algorithms, such as the YOLO algorithm. Such as building a character recognition model. The above-mentioned is to set up the goal to discern with the area that the character appears as the goal, in this implementation step, can regard different elements of the character as the goal, discern. For example, the targets are classified as digital elements, text elements, english elements, special symbol elements, or special graphic elements. Setting a corresponding target detection model. A target detection model can be established by utilizing a YOLO algorithm, and the positions of a plurality of frames of different element contents can be predicted.

Firstly, a universal character recognition model is used for constructing recognition services of numbers and English letters on a background GPU machine. When a live broadcasting room is started, a corresponding live broadcasting room is started, a real-time character area in live broadcasting is identified and positioned, digital characters are cut out of original live broadcasting images, and the digital characters, english characters and the like contained in the digital characters are identified through http request character identification service, so that real-time business information in live broadcasting is obtained.

The above algorithm is merely an example, and for example, the samples may be divided into 8*4 total 32 blocks, and the ratio of the total pixel value of each block occupied by the black pixels in each block is counted as the characteristic value, so that each sample to be identified has a group of arrays of one row and 32 columns as the characteristic value. The preprocessing after feature selection can be used for analyzing and extracting main feature training by using PCA main components, then cutting a corresponding region from an original live image, and predicting by using the model, and also can be used for identifying and obtaining corresponding element content and position information on an appearance frame.

Wherein identifying characters in the character area using the universal character recognition model further comprises:

Since the character region generally contains english, numerals and other characters, in this embodiment, the set of OCR tokens in the live broadcast picture is obtained through the open-source chinese optical character recognition suite CNOCR (i.e., cnnocr model), where the set contains semantic information of the characters. In practical applications, recognition of characters in a character area can also be achieved through an open-source character recognition suite PaddleOCR (PaddleOCR model).

The cnocr model or the PaddleOCR model can better recognize words, numbers and letters. In practical application, recognition services of numbers and English letters can be constructed on a background GPU machine. If the http service is constructed by using the python-based flash, the identification request is received, and the character identification model can be used for identification after the identification request is received. When a live broadcasting room is started, a real-time identification positioning is started for the corresponding live broadcasting room, after a character area in live broadcasting is positioned, characters are extracted from an original live broadcasting image, and the characters contained in the character identification service are identified through http request, so that real-time game information in game live broadcasting, such as the number of times of killing, the number of times of death and the like, is obtained.

As the number of the platform live broadcasting ways can reach hundreds to thousands ways, the number of frames per second of each way can reach 20 to 30 frames, and the real-time identification has higher requirement. Therefore, different tag characters can be encoded and decoded by using the ffmpeg library according to service logic, so that frames can be extracted at different frequencies for identification. For example, hero in the glory of a king can be identified by drawing a frame for a few seconds, and the death number of the hit in the game can be identified by drawing a frame per second. Therefore, the service calculation amount is greatly saved, GPU machine resources are saved, and the real-time performance of identification is ensured.

Because the definition of the picture in the live broadcast is limited, and a certain error rate exists in character recognition, the phenomenon of service blocking caused by frame missing or too slow processing possibly exists in the live broadcast, and the result of the character recognition service cannot be directly used as a recognition result and needs to be optimized. In the embodiment, the recognition results of the continuous frames are stored as a time sequence, and the values of the character recognition results in the time sequence are compared with the character positioning, so that abnormal recognition data are removed, and the recognized digital results are ensured to be consistent in the time sequence. And taking the optimized recognition result as a final character recognition result.

For the case of network blocking, the latest frame in the live stream needs to be extracted for identification every time, and backlog frames caused by the network blocking are discarded, so that the identification result is a real-time result every time. If the detected untreated amount of the predicted video frame after frame extraction reaches a preset value, directly carrying out frame loss treatment on the untreated frame, and taking the current video frame to identify the text object target.

And carrying out real-time state discrimination calculation in live broadcasting according to the optimized identification result, and further comprising: different processing processes are set in the corresponding live broadcast areas according to different identification result categories in advance; after the identification result is obtained, the corresponding category is judged, and then the processing is completed according to the processing progress corresponding to the category. After identifying the real-time game data, the real-time status of the host role can be judged: whether dead, whether killing three, killing four, killing five, and the like, so that corresponding appreciation prompts and texts are popped up in the live broadcasting room, interactive appreciation of users in the live broadcasting room is increased, and interestingness and viscosity of the users in watching live broadcasting are improved.

Example two

The present embodiment provides a character recognition device for live internet game, referring to fig. 5, the character recognition device includes:

the picture acquisition module 1 is used for acquiring a room identifier corresponding to a room for starting live broadcasting, starting a live broadcasting identification task according to the room identifier, and identifying picture characters in live broadcasting;

the character area recognition module 2 is used for taking a character area in the live broadcast picture as a target, recognizing the area where the character is located by adopting a target detection algorithm, and extracting the recognized character area;

and the character recognition module 3 is used for recognizing the characters in the character area through a character recognition model so as to obtain real-time game information in the live game.

The functions and implementation methods of the above-mentioned image acquisition module 1, the character area recognition module 2 and the character recognition module 3 are described in the above-mentioned embodiment one, and are not repeated here.

Example III

The present embodiment provides a character recognition apparatus. Referring to fig. 6, the character recognition device 500 may vary considerably in configuration or performance and may include one or more processors (central processing units, CPU) 510 (e.g., one or more processors) and memory 520, one or more storage media 530 (e.g., one or more mass storage devices) storing applications 533 or data 532. Wherein memory 520 and storage medium 530 may be transitory or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in the character recognition device 500.

Further, the processor 510 may be arranged to communicate with a storage medium 530, and to execute a series of instruction operations in the storage medium 530 on the character recognition device 500.

The character recognition device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input/output interfaces 560, and/or one or more operating systems 531, such as Windows Serves, vista, and the like.

It will be appreciated by those skilled in the art that the character recognition device configuration shown in fig. 6 is not limiting of the character recognition device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and which may also be a volatile computer readable storage medium. The computer-readable storage medium has stored therein instructions which, when executed on a computer, cause the computer to perform the steps of the character recognition method of the first embodiment.

The modules in the second embodiment may be stored in a computer-readable storage medium if implemented as software functional modules and sold or used as a separate product. Based on this understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of software, and the computer software is stored in a storage medium, and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (Random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and device described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments. Even if various changes are made to the present invention, it is within the scope of the appended claims and their equivalents to fall within the scope of the invention.

Claims

1. A character recognition method for live internet game, comprising:

recognizing the characters in the character area through a character recognition model, storing the recognition results of the continuous frames as a time sequence, comparing the values of the recognition results in the time sequence with the character area, thereby eliminating abnormal recognition data and ensuring that the recognition digital results are consistent in the time sequence; when the network is blocked, the latest frame in the live stream is extracted for identification, and the backlog frame caused by the network blocking is discarded, so that real-time game information in the game live broadcast is obtained.

2. The character recognition method according to claim 1, wherein the acquiring the room identifier corresponding to the room with live broadcast turned on further comprises:

3. The method of claim 1, wherein the identifying the region in which the character is located using the object detection algorithm further comprises:

4. The character recognition method according to claim 3, wherein the recognizing the character region using the YOLOv5 algorithm-based object detection model further comprises:

5. The method of claim 1, wherein the identifying the region in which the character is located using the object detection algorithm further comprises:

6. The character recognition method according to claim 1, wherein the recognizing characters in the character area by the character recognition model further comprises:

7. A character recognition apparatus for live internet game broadcasting, the character recognition apparatus comprising:

the character recognition module is used for recognizing characters in the character area through the character recognition model, storing recognition results of continuous frames into a time sequence, comparing values of the recognition results in the time sequence with the character area, eliminating abnormal recognition data, and ensuring that recognized digital results are consistent in the time sequence; when the network is blocked, the latest frame in the live stream is extracted for identification, and the backlog frame caused by the network blocking is discarded, so that real-time game information in the game live broadcast is obtained.

8. A character recognition apparatus, characterized by comprising:

the processor invokes the instructions in the memory to implement the character recognition method of any one of claims 1-6.

9. A computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the character recognition method according to any one of claims 1-6.