CN110652726B - Game auxiliary system based on image recognition and audio recognition - Google Patents

Game auxiliary system based on image recognition and audio recognition Download PDF

Info

Publication number
CN110652726B
CN110652726B CN201910926107.1A CN201910926107A CN110652726B CN 110652726 B CN110652726 B CN 110652726B CN 201910926107 A CN201910926107 A CN 201910926107A CN 110652726 B CN110652726 B CN 110652726B
Authority
CN
China
Prior art keywords
game
image
module
recognition
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910926107.1A
Other languages
Chinese (zh)
Other versions
CN110652726A (en
Inventor
范科
邬鑫宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HANGZHOU SHUNWANG TECHNOLOGY CO LTD
Original Assignee
HANGZHOU SHUNWANG TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HANGZHOU SHUNWANG TECHNOLOGY CO LTD filed Critical HANGZHOU SHUNWANG TECHNOLOGY CO LTD
Priority to CN201910926107.1A priority Critical patent/CN110652726B/en
Publication of CN110652726A publication Critical patent/CN110652726A/en
Application granted granted Critical
Publication of CN110652726B publication Critical patent/CN110652726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/53Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game
    • A63F13/537Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game using indicators, e.g. showing the condition of a game character on screen
    • A63F13/5375Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game using indicators, e.g. showing the condition of a game character on screen for graphically or textually suggesting an action, e.g. by displaying an arrow indicating a turn in a driving game
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/53Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game
    • A63F13/533Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game for prompting the player, e.g. by displaying a game menu
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/53Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game
    • A63F13/537Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game using indicators, e.g. showing the condition of a game character on screen
    • A63F13/5378Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game using indicators, e.g. showing the condition of a game character on screen for displaying an additional top view, e.g. radar screens or maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/131Protocols for games, networked simulations or virtual reality
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/30Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device
    • A63F2300/303Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device for displaying additional data, e.g. simulating a Head Up Display
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/30Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device
    • A63F2300/303Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device for displaying additional data, e.g. simulating a Head Up Display
    • A63F2300/305Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device for displaying additional data, e.g. simulating a Head Up Display for providing a graphical or textual hint to the player
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/30Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device
    • A63F2300/303Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device for displaying additional data, e.g. simulating a Head Up Display
    • A63F2300/307Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device for displaying additional data, e.g. simulating a Head Up Display for displaying an additional window with a view from the top of the game field, e.g. radar screen
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/30Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device
    • A63F2300/308Details of the user interface

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Optics & Photonics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a game auxiliary system based on image recognition and audio recognition, which comprises a game image acquisition preprocessing module, a game sound acquisition preprocessing module, a digital image recognition module, a digital audio recognition module and a recognition result prompting module, wherein the five modules are divided according to the logic function of a processing task to jointly form an application program, and the data transmission and the signal transmission among the modules are completed through an in-process communication technology, so that the system is very efficient. Therefore, the system can accurately identify the enemy and the vehicle which are latent at a distance and the direction of the gunshot, and give a prompt icon in a striking color on the game screen in real time; in addition, the electronic compass and the small map can be identified to give the travel instruction of the player entering the safe area, and the survival time of the ordinary primary player in the game can be prolonged through the auxiliary prompts, so that the game experience is improved.

Description

Game auxiliary system based on image recognition and audio recognition
Technical Field
The invention belongs to the technical field of electronic game assistance, and particularly relates to a game assistance system based on image recognition and audio recognition.
Background
Electronic Games (Electronic Games), also called Video Games (Video Games) or Video Games (Video Games for short), refer to all interactive Games that run on Electronic device platforms. The perfect electronic game appeared at the end of the 20 th century, changes the behavior mode of human playing the game and the definition of a word of the game, and belongs to a cultural activity born along with the development of science and technology. The electronic game can also be referred to as 'electronic game software', the electronic game is actually an art, the aspects of art, music, movies, AI, computer technology and the like are integrated, the electronic game has very strong cultural bearing capacity and infectivity, and simultaneously has relatively low experience threshold, even people who are difficult to appreciate paintings, music and books and lack imagination can usually find sufficient fun in the game, and the game and the film and television have the same wonderness of different music and work, are very lovely to people, but have stronger game feeling when substituted. The birth of the electronic game enriches the life of human beings, thereby promoting the progress of the human society in the world, enriching the mental world and the physical world of human beings and enabling the life of human beings to be more happy.
However, there are also a number of very popular games, such as "strictly ceremonial", where users who intend to relax occasionally are forced to end the game passively when playing the game because they cannot find the enemy faster, which prevents a large number of primary ordinary players and some players with poor visual and auditory discrimination from enjoying the game and satisfying the game. The appearance of game auxiliary tools can help the disadvantaged groups in the games and also experience the fun of different types of games.
In recent years, with the help of the rapid improvement of the performance of computer hardware, a machine learning algorithm represented by deep learning is greatly successful in the fields of machine vision, voice recognition and the like, and the recognition accuracy is greatly improved, so that artificial intelligence is once more widely concerned by academic circles and industrial circles. The method comprises a series of events such as Alphago, video identification, fingerprint unlocking, picture identification, voice to text conversion and the like, so that people can feel artificial intelligence deeply and change the working mode and cognition of the people. Over a hundred years ago, electricity changed industries such as production, transportation, and agriculture, while today, artificial intelligence, like electricity, would change the traditional industries.
Face recognition and picture recognition are two popular applications in the field of artificial intelligence vision and images. Recognizing objects is another common application of image classification, for example, a simple mobile phone recognition model, which is to define a model for a computer first, then prepare a large number of pictures of a mobile phone to train the model, so that the computer can recognize the pictures, and when a picture is output, whether the picture is a mobile phone or not can be recognized. Under normal conditions, a computer model can be accurately identified, but when some pictures with shielding, variable shapes or angles and difficult illumination are input, the model established by people before can not be identified, which is a difficult problem of computer vision in application. The essence of machine learning is to find a function, which plays different roles in different fields, such as speech recognition field, where the function recognizes a speech into a text; in the field of image recognition, this function maps an image to a classification.
With the development and maturity of machine learning technology, a good basis is provided for the recognition of game images and sounds, and the realization of corresponding auxiliary tool software is possible.
Disclosure of Invention
In view of the above, the present invention provides a game auxiliary system based on image recognition and audio recognition, that is, auxiliary tool software for realizing image and sound recognition based on machine learning for a first person expedition shooting game, which can accurately recognize a remote latent enemy and a vehicle and a direction giving a gunshot, and give a prompt icon in a striking color on a game screen in real time, and can also give a travel instruction for a player to enter a safe area through recognition of an electronic compass and a small map, and the survival time of a common primary player in the game can be increased through the auxiliary prompts, so that the game experience is improved.
A game auxiliary system based on image recognition and audio recognition comprises a game image acquisition preprocessing module, a game sound acquisition preprocessing module, a digital image recognition module, a digital audio recognition module and a recognition result prompting module; wherein:
the game image acquisition preprocessing module is used for capturing and preprocessing game video images in real time;
the game sound acquisition preprocessing module is used for capturing and preprocessing multi-channel (such as stereo, 5.1 sound channels and the like) sounds of a game in real time;
the digital image identification module is used for identifying an electronic compass angle in a game video image, player coordinates in a small map and positions of a character and a vehicle;
the digital audio recognition module adopts the frequency spectrum characteristics of the target signal given by the audio off-line analysis system to perform frequency domain analysis and comparison on the audio data of each sound channel and judge which sound channel contains the gunshot information;
the recognition result prompting module is used for highlighting and marking the electronic compass angle, the coordinates of the player in the small map, the positions of the character and the vehicle and the corresponding positions of the occurrence of the gunshot on the game picture and giving out voice alarm, thereby achieving the purpose of timely reminding the player of paying attention.
In order to avoid the influence on the fluency of the game, the calculation processing of the auxiliary tool needs to reduce the occupation and consumption of GPU resources of the machine display card to the maximum extent; preferably, the game image acquisition preprocessing module converts continuous game video into discrete digital images, and directly intercepts image blocks of corresponding areas in the images for transmitting the image blocks to the digital image identification module for identification processing of contents (such as an electronic compass, a small map and the like) displayed at fixed positions; and for objects (such as people, vehicles and the like) which do not appear at fixed positions, cutting the acquired image in a sliding window mode, only intercepting the upper part of the image, then scaling the intercepted image in an equal ratio mode, recording the front and back sequence of each frame, and calling and analyzing by a digital image recognition module. Such a cropping pattern may cover as much of the area where the player is present as possible with as little resource consumption as possible.
Furthermore, the game sound acquisition preprocessing module converts continuous sound into digital audio packets with segmented fixed time length according to respective sound channels, marks the sound channel to which each audio packet belongs, and then transmits the audio packets to the digital audio recognition module for further processing.
Furthermore, the digital image identification module adopts two convolution neural networks assisted by some image preprocessing methods to identify the angle of the electronic compass; for the identification of the player coordinates in the small map, an image processing method is directly adopted for realizing; for the identification of characters and vehicles, a model obtained through machine learning offline training is adopted for processing and identification, and the model has two functions of target detection and target identification, namely, the position of an object of interest (such as a character, a vehicle and the like) is detected firstly for a game video image input by a game image acquisition preprocessing module, and then the object is identified to judge which type of target object is specific.
Further, the model is a full convolution neural network model based on a deep learning visual algorithm YOLOv2 (young Only Look Once 2), the network model is provided with an upper sampling layer and a pyramid structure, and the upper sampling layer is used for improving the identification accuracy and recall rate of small-size targets; the pyramid structure is used for inputting the information of the shallow layer into the deep layer network, and the accuracy of target identification is further improved by combining the spatial information of the shallow layer and the semantic information of the deep layer and matching with the upper sampling layer.
Furthermore, the digital image recognition module utilizes the correlation information (such as the movement speed of people, the driving speed of vehicles and the like) between the front frame and the rear frame of the continuous image sequence to effectively assist the target detection and recognition process of a single image, so that the recognition accuracy is further improved; meanwhile, because the game image acquisition preprocessing module cuts and scales the input image, the digital image recognition module needs to convert a coordinate system after recognizing the target object, calculates the position coordinate of the target object in the game original image, then sends the position coordinate to the recognition result prompting module for further processing, and feeds the position coordinate of the target object back to the game image acquisition preprocessing module, thereby facilitating the optimization of internal performance and the improvement of efficiency.
Further, after the gunshot is detected in a plurality of sound channels simultaneously, the digital audio recognition module analyzes the sound volume amplitude of each sound channel in a time domain space, the time domain amplitude of each sound channel in which the gunshot is detected is used as different gunshot components, the source direction of the gunshot in a three-dimensional space is finally calculated according to the characteristics of vectors, and then the source direction is transmitted to the recognition result prompting module for further processing.
Furthermore, the digital image recognition module can obtain an AI algorithm model capable of detecting and recognizing a target object by virtue of an image off-line training system when detecting and recognizing the target, the system is based on a deep learning technology in machine learning and training a large number of samples, namely, firstly, images of a game are acquired and strict marking rules are formulated, and parameters in the AI algorithm model can optimize model parameters under the action of a reverse transfer algorithm according to learned data distribution in the training process to approach global optimization or local optimization under the data distribution; in the aspect of target detection, anchor points with certain size and proportion are given in advance, the full graph is traversed by combining the one-to-one corresponding characteristic of the full convolution structure image area, the position of a labeled target is recorded, then the intersection and comparison is used as measurement, labeled frames are clustered by using a kmean + + algorithm, some frames with the most representativeness in all labeled frames are obtained and used as anchor points, and finally, a GPU is used for carrying out iterative training on an AI algorithm model until the model converges.
Further, the audio offline analysis system is based on a frequency domain analysis technology, frequency domain characteristics of a target sample are obtained by performing frequency spectrum analysis on a plurality of kinds of shot sound audio samples collected in advance, the frequency domain characteristics are provided for the digital audio identification module to perform frequency spectrum comparison operation, the digital audio identification module performs time domain to frequency domain conversion on audio signals by adopting fast Fourier transform, namely, fourier transform is performed on the shot sound signals collected in advance, the frequency domain characteristics of the shot sound signals are observed, if the amplitude values of various kinds of shot sounds in certain fixed frequency bands are relatively large, the digital audio identification module takes the frequency bands as reference intervals to compare whether the amplitude values of the various kinds of shot sounds exceed a preset threshold value, and if the amplitude values of the various kinds of shot sounds exceed the preset threshold value, the shot sounds are judged.
Based on the technical scheme, the invention has the following beneficial technical effects:
1. the invention identifies objects such as unobtrusive characters, vehicles and the like in the game through the image identification technology, can quickly help players to find enemy personnel and timely take countermeasures.
2. The invention identifies the angle of the electronic compass and the position of the small map in the game by the image identification technology, and can help the player to efficiently advance to the safe area.
3. The invention identifies the direction of the gunshot in the game multi-channel stereo through the sound identification technology, can quickly help the player to determine the position of the enemy and timely make countermeasures.
4. The invention provides game auxiliary function by image recognition and sound recognition technology, without injecting program and code into game program, without destroying game stability and distorting game data.
5. The function of the invention does not depend on the code injection technology, thus avoiding the embarrassment of interception by antivirus software and the like.
Drawings
FIG. 1 is a schematic diagram of the game assistance system of the present invention.
FIG. 2 is a schematic diagram of an image cropping scheme in the game assistance system of the present invention.
FIG. 3 is a schematic view of the operation of the game support system of the present invention.
Detailed Description
In order to describe the present invention more specifically, the following detailed description of the present invention is made with reference to the accompanying drawings and the detailed description of the present invention.
The game auxiliary tool is auxiliary tool software for realizing image and sound recognition based on machine learning for a first person expedition shooting game, is a single machine software of a Windows platform, needs to be installed on a machine together with game client software, and is structurally shown in figure 1, and comprises a game image acquisition preprocessing module, a game sound acquisition preprocessing module, a digital image recognition module, a digital audio recognition module and a recognition result prompting module, wherein the five modules are divided according to the logic function of a processing task to jointly form an application program, so that data transmission and signal transmission among the modules are completed through an in-process communication technology, the efficiency is high, and the mutual work flow is shown in figure 3.
In addition, in the program development and debugging stage, the auxiliary tool software needs an image offline training system to provide an algorithm model for image target object recognition for the auxiliary tool software, and needs a sound offline analysis system to provide target sound spectrum characteristic information for sound recognition and comparison for the auxiliary tool software. After the development work of the auxiliary tool software is completed, the image recognition algorithm model and the voice frequency spectrum characteristic information are respectively integrated into the digital image recognition module and the digital audio recognition module of the auxiliary tool software. Thus, the image offline training system and the sound offline analysis system are two independent software programs, both running on the Linux system, and need not be distributed to the game players with the accessibility software programs. The auxiliary tool software which is externally released needs to be installed on a computer of a game player, and an auxiliary tool software program is started before the game player starts the game, because the mainstream popular games are all operated on a Windows system, the auxiliary tool software is also developed for a Windows platform and can only be operated on Windows systems which are released after Windows 7.
The game image acquisition preprocessing module is mainly used for acquiring a game screen image in real time through a DirectX system API of Windows, and the image format is RGB24. Because the mainstream sizes of the current game pictures are large, the identification processing of the whole image needs to consume more computing resources and time. Therefore, in order to improve the efficiency of identifying the target object in the image in the later stage, reduce the occupation of system CPU and GPU resources and ensure the smooth running of the game, the captured game image needs to be cut, and only a specific image area needs to be identified. For the electronic compass and the small map, the positions of the electronic compass and the small map are relatively fixed, so that the image blocks of the area where the electronic compass and the small map are located can be directly intercepted and used as sub-images to be identified; for objects such as people and vehicles with unfixed display sizes and appearance positions, the cropping basis of the sub-images to be recognized is as follows: the top of the screen is a remote area, objects are very small and can only be represented as small points on the image, no matter what target object is, no matter what human eyes or program algorithm can not distinguish, and even if enemies are concerned, the distance does not threaten the current player; the lowest part of the screen is the area around the current player in the game, and if an enemy player cannot see the area, the image recognition is not needed. Therefore, specific cutting areas and position conditions are obtained as shown in fig. 2, assuming that the frame a and the frame B are two continuous game screenshots, the areas shown in the frame a and the frame B are effective areas actually cut and captured by cutting, and the effective area images are transmitted to an image recognition module for target detection and recognition after being processed by reducing the effective area images by 40% -50% in equal proportion. At the stage when no object is detected in the image, the cropping rules of the active area alternate according to the approximate positions shown in frame a and frame B of fig. 2. When the detected target position fed back by the image recognition module is received, the selection rule of the effective area changes, and image cutting is carried out according to the size of the previous effective area by taking the target object as the center. The core logic is that the intercepted effective area tracks the movement of the target object from the time axis.
The game sound acquisition preprocessing module is mainly used for acquiring PCM data of a plurality of sound channels of a game in real time through a DirectX system API of Windows, then segmenting continuous PCM data of each sound channel according to the duration of 0.5 second, and finally transmitting the segmented PCM data packets to the digital audio recognition module one by one according to time sequence for gunshot detection.
The digital image identification module mainly comprises three sub-logic modules which respectively correspond to the identification of an electronic compass angle, the identification of a minimap player coordinate and the identification of a character and a vehicle in a game. The identification of the angle of the electronic compass is the identification of the number, and because the identification accuracy of the single number is higher than that of the sequence identification and less calculation force is consumed, the identification of the single number is adopted; before identification, numbers in the electronic compass need to be cut, and the scale of the electronic compass in the game is 5 degrees, so the number of the compass can be one, two or three, and the total number is three. For the compass angle with fixed digit, we can set up the clipping position in advance, so the digit of compass needs to be judged before clipping. Here we use another convolutional neural network to identify the compass bits. As described above, the electronic compass identification is divided into three steps: 1. identifying the compass digit; 2. cutting out a single number according to the digit; 3. a single number is identified. For the identification of the coordinates of a player in the small map, namely identifying the marked lines in the small map, dividing the whole map into 8 multiplied by 8 grids by the first-level marked lines, and dividing each grid into 10 multiplied by 10 small grids by the second-level marked lines; since the coordinates of the player are always in the middle of the small map, the position of the player can be obtained by only identifying the marking line closest to the player.
Firstly, regularizing an image in the horizontal direction and the vertical direction respectively, then carrying out binarization on the regularized image, identifying the image with the continuous length exceeding 2/3 of the width of a small map as a marked line, cutting the number of the marked line according to the position of the marked line after identifying the marked line, fixing the number of the marked line into two bits, wherein the first bit is a letter, the second bit is a number, cutting down the two bits respectively, and identifying by using a convolutional neural network. From the above, the minimap player coordinate identification is divided into three steps: 1. marking position identification; 2. marking line number cutting; 3. identification of individual letters and numbers. For the adventure shooting game such as 'desperate survival' which is popular all over the world in recent two years, the game has an additional survival rule, namely, the concept that a safe activity area is arranged in the game scene, the safe area is gradually reduced along with the progress of the game, players need to stay in the safe area to survive, and the game prompts the players to go to the safe area by using dotted line marks in a small map. Therefore, after the game auxiliary tool identifies the safety zone indicating line in the minimap, and the current compass angle value of the player is combined, the game auxiliary tool can give the advancing prompting information of the player through the identification result prompting module, such as straight advancing, left turning and right turning, which is helpful for the player with weak direction sense and the player with poor understanding of the map.
For the recognition of characters and vehicles in games, as the positions of the targets are not fixed and the target forms have the characteristics of diversity, the invention mainly adopts an algorithm model obtained by machine learning off-line training to process and recognize the target images, and the model mainly has two functions: target detection and target identification. The method comprises the steps of firstly detecting the positions of objects (characters, vehicles and the like) which are possibly interested by people on a game picture input by an image acquisition preprocessing module, then identifying the objects, and judging which type of target object is specific. The model is realized based on a structure of a full convolution neural network of a deep learning vision algorithm YOLOv2 (You Only Look Oce 2), and the network structure has the biggest characteristic that the occupation of computing resources is very little, images can be processed in real time under most conditions, and the accuracy and the recall rate need to be improved. Therefore, a network up-sampling layer and a pyramid structure are added on the basis of the network structure, and the up-sampling layer can improve the identification accuracy and recall rate of small-size targets; the pyramid structure is that shallow information is input into a deep network, and the accuracy of target identification can be further improved by combining the shallow spatial information and the deep semantic information and matching with an upper sampling layer; because a network upper sampling layer and a pyramid structure are added to a full convolution neural network of YOLOv2, the improved model has better identification capability on small objects, consumes less computing resources, improves the accuracy of small object identification by 10 percentage points, and improves the recall rate by 20 percentage points.
The above method and process for identifying a character in a single image by the system are characterized in that a sequence of images which are continuous on a time axis is captured from a game, and related information (such as the movement speed of the character, the driving speed of a vehicle and the like) exists between the front frame and the rear frame of the image in terms of time, so that the position and size relation of the identified target object in the front frame and the rear frame of the image is fully utilized, individual error identification of the full convolution neural network can be further eliminated, and the overall identification accuracy is improved. For example, we identify a player at time t, the size of which is small, and at this time my player remains still, and if the player suddenly shifts a lot at time t +1, significantly exceeding the running speed of the character, the full convolutional neural network is considered to misidentify the vehicle as an enemy player. Similarly, the image recognition module also takes other characteristics of the game itself, such as the aspect ratio of the player character and other factors as auxiliary judgment conditions, so as to further improve the recognition accuracy of the character and the vehicle. In addition, because the image input before the game is cut and zoomed, after the target object is identified, the coordinate system conversion is needed to be carried out, the position coordinate of the target in the game original image is calculated, and then the position coordinate is sent to the identification result prompting module to be processed in the next step, and meanwhile, the identified target coordinate needs to be fed back to the game image acquisition preprocessing module, so that the internal performance optimization and the efficiency improvement are facilitated.
The digital audio identification module mainly adopts a sound frequency spectrum characteristic comparison scheme provided by a sound off-line analysis system to carry out identification, and the calculation amount in the processing process is not large, so that the consumed calculation power is small; since the kind of sound in the game is not particularly rich, the recognition scheme has high accuracy in recognizing the gunshot sound. In this case, the real-time performance of the whole process from the shooting of a gunshot in the game to the recognition and prompting by the aid becomes a key point, and if the recognition result is delayed too much in time, the player is not assisted. Digital audio recognition requires processing sound data for a period of time, but the frequency domain conversion and the spectral comparison in the recognition process are very fast in calculation, so that delay is mainly generated in the acquisition stage of sound samples. As mentioned above, the sound collection module will package collected sound data in segments according to a fixed time length, and then transmit the sound data to the audio recognition module, and the data time length of each sound data packet becomes the lag time of the gunshot prompt. According to the statistical analysis verification in the previous period, the gunshot waveform at least needs to contain a half period to identify the gunshot with higher accuracy; according to the sample statistics, the gunshot waveform is generally about 1 second, so that 0.5 second is selected as a separation period for sound collection, the identification accuracy is guaranteed, and the aim of assisting a player can be achieved in real time.
The recognition result prompting module creates a transparent window on the upper layer of the game screen mainly through the transparent window technology of the Windows system, displays the position coordinates of the enemy and the like input from the digital image recognition module and the corresponding position of the gunshot orientation input from the digital audio recognition module in the transparent window in a flashing mode by using a striking icon, and plays the voice prompt recorded in advance at the same time so as to help common players to find the enemy and the orientation thereof as soon as possible.
The image off-line training system mainly provides a specific recognition algorithm and a model for electronic compass recognition, small map position recognition and character and vehicle recognition in games, and obtains an AI algorithm model capable of detecting and recognizing a target object through learning and training a large number of samples based on a deep learning technology in machine learning. In the current AI algorithm, a certain data distribution is learned, and in the training process, parameters in an algorithm model can optimize model parameters according to the learned data distribution under the action of a reverse transfer algorithm to approach global optimization or local optimization under the data distribution. Therefore, the effect of the algorithm model in actual application depends on whether the learned data distribution is consistent with the actual application scene or not in training; therefore, it is first necessary to capture the image of the game and make strict labeling rules, which helps the model to converge better and has better effect when applied. The above mentioned back-propagation algorithm is based on the gradient descent principle, and in order to converge faster and avoid some local optima, the present example employs an adam optimizer that integrates momentum and parameter differentiation updates. In addition, the target detection is a selective traversal in a certain sense, namely anchor points with certain sizes and proportions are required to be given in advance, the full graph is traversed by combining the one-to-one corresponding characteristic of the full convolution structure image area, so that the sizes and proportions of the anchor points have great influence on the target detection effect, the positions of the marked targets are recorded, then intersection and comparison are taken as measurement, the marked frames are clustered by using a kmean + + algorithm, and some frames which are most representative of all the marked frames are obtained and taken as the anchor points; the model is then iteratively trained using the GPU until the model converges. For electronic compass identification, a certain number of compass angle picture samples with different digits need to be collected and are divided into three categories; then, cutting out the numbers in the compass, and dividing the numbers into ten categories according to the numbers of 0-9; for minimap location identification, specifically numbered letters and numbers are cut from the minimap collection sample and divided into 26 categories, including 16 letters and 10 numbers. For the three recognition tasks, the convolutional neural network and the Softmax classifier are adopted, cross entropy is used as a loss function, training is carried out on a GPU, and network weight is stored for loading and using during recognition after the loss function is converged. For a target detection and identification model of characters and vehicles in a game, weight data are obtained mainly by sending marked images into a neural network and training on the basis of a gradient descent method; in the subsequent identification process, the target object in the graph can be detected and identified by directly loading the weight data. Because the calculation capacity and the video memory size of a single GPU are limited, multiple GPUs are adopted for parallel training, so that the training speed is increased, the size of each batch is increased, and the oscillation generated in the training process is greatly reduced. When the model is approximately converged, the model parameters need to be manually and finely adjusted to continue training, so that a better effect is achieved.
The audio offline analysis system is mainly based on a frequency domain analysis technology, and frequency domain characteristics of a target sample are obtained by performing spectrum analysis on a plurality of gun sound audio samples collected in advance and are provided for auxiliary tool software to perform spectrum comparison operation. Due to the increasing demand for game quality, in order to create game atmosphere, most of shooting games have the recorded real gunshot, different from human voice, the gunshot has high frequency, which causes great obstruction to human ear perception, and the game can generate the sound of bullets hitting on the ground in order to further simulate real scenes. Therefore, the gunshot and the bullet sound are closely connected in tandem, the time interval is very short, and the sound source direction of the bullet touching a wall, the ground and the like may be different from the shooting direction, which further increases the difficulty of the player in judging the gunshot source. When observing and analyzing the waveform of the collected shot sound sample, it is difficult to find out how much the shot sound sample is obviously different from other sounds in shape, because the sound signal is not characterized in the time domain but in the frequency domain. Therefore, the digital audio recognition module uses Fast Fourier Transform-FFT to perform time domain to frequency domain conversion on the sound signal, the Fast Fourier Transform is a variation of the Fourier Transform, and the digital audio recognition module is characterized by very Fast operation speed, small time complexity log (n), and few occupied computing resources, so that the digital audio recognition module can completely meet the requirement of real-time feedback in games. The frequency of the gunshot is very high, the frequency domains of other sounds are much lower, which becomes an obvious distinguishing factor, the fourier transform is carried out on the gunshot signals collected in advance, the frequency domain characteristics of the gunshot signals are observed, and the amplitude values of various gunshots are very large in certain fixed frequency bands; therefore, the digital audio identification module takes the frequency bands as reference intervals and compares whether the amplitudes of the frequency bands exceed a preset threshold value so as to judge whether the frequency bands are gunshot.
The embodiments described above are presented to enable a person having ordinary skill in the art to make and use the invention. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims (2)

1. A game assistance system based on image recognition and audio recognition, characterized by: the system comprises a game image acquisition and preprocessing module, a game sound acquisition and preprocessing module, a digital image recognition module, a digital audio recognition module and a recognition result prompting module; wherein:
the game image acquisition preprocessing module is used for capturing and preprocessing game video images in real time; the module converts continuous game video into discrete digital images, directly intercepts image blocks in corresponding areas in the images for contents displayed at fixed positions and transmits the image blocks to the digital image identification module for identification; for targets which do not appear at fixed positions, cutting the acquired image in a sliding window mode, only intercepting the upper part of the image, then scaling the intercepted image in an equal ratio, recording the sequence of each frame, and calling and analyzing by a digital image recognition module;
the game sound acquisition preprocessing module is used for capturing and preprocessing multi-channel sound of a game in real time; the module converts continuous sound into digital audio packets with segment fixed time length according to respective sound channels, marks the sound channels to which the audio packets belong, and then transmits the digital audio packets to the digital audio recognition module for further processing;
the digital image identification module is used for identifying an electronic compass angle in a game video image, player coordinates in a small map and positions of a person and a vehicle; the module identifies the angle of the electronic compass by adopting two convolutional neural networks assisted by some image preprocessing methods; for the identification of the player coordinates in the small map, an image processing method is directly adopted for realizing the identification; for the recognition of characters and vehicles, a model obtained through machine learning off-line training is adopted for processing and recognition, the model has two functions of target detection and target recognition, namely, the position of an object of interest is detected firstly for a game video image input by a game image acquisition preprocessing module, then the object is recognized, and the type of the object is judged; the digital image identification module utilizes the correlation information between the front frame and the rear frame of the continuous image sequence to effectively assist the target detection and identification process of a single image, thereby further improving the identification accuracy; meanwhile, because the game image acquisition preprocessing module cuts and scales the input image, the digital image recognition module needs to convert a coordinate system after recognizing the target object, calculates the position coordinate of the target object in the game original image, then sends the position coordinate to the recognition result prompting module for further processing, and feeds the position coordinate of the target object back to the game image acquisition preprocessing module, thereby facilitating the optimization of internal performance and the improvement of efficiency;
the model is a full convolution neural network model based on a deep learning visual algorithm YOLOv2, the network model is provided with an upsampling layer and a pyramid structure, and the upsampling layer is used for improving the identification accuracy and the recall rate of small-size targets; the pyramid structure is used for inputting the shallow information into a deep network, combining the shallow spatial information and the deep semantic information, and further improving the accuracy of target identification by matching with an upper sampling layer;
the digital image recognition module can obtain an AI algorithm model capable of detecting and recognizing a target object by means of an image off-line training system when detecting and recognizing the target object, the system is based on a deep learning technology in machine learning and training a large number of samples, namely, firstly, image acquisition is carried out on a game and a strict marking rule is formulated, parameters in the AI algorithm model can optimize model parameters under the action of a reverse transfer algorithm according to learned data distribution in the training process, and the overall optimization or local optimization under the data distribution is approached; in the aspect of target detection, anchor points with certain size and proportion are given in advance, the full graph is traversed by combining the one-to-one corresponding characteristic of the full-convolution structure image area, the position of a labeled target is recorded at the same time, then the labeled frames are clustered by using a kmean + + algorithm by taking cross-over comparison as measurement to obtain some frames which are most representative of all labeled frames as anchor points, and finally, a GPU is used for carrying out iterative training on an AI algorithm model until the model converges;
the digital audio recognition module adopts the frequency spectrum characteristics of the target signal given by the audio off-line analysis system to perform frequency domain analysis and comparison on the audio data of each sound channel and judge which sound channel contains the gunshot information; when the gunshot is detected in a plurality of sound channels simultaneously, the module analyzes the sound volume amplitude of each sound channel in a time domain space, the time domain amplitude of each sound channel in which the gunshot is detected is used as different gunshot components, the source direction of the gunshot in a three-dimensional space is finally calculated according to the characteristics of the vector, and then the source direction is transmitted to an identification result prompting module for next processing;
the recognition result prompting module is used for marking the electronic compass angle, the player coordinate in the small map, the positions of the character and the vehicle and the position corresponding to the position of the gunshot on the game picture, and giving a voice alarm, so that the aim of timely reminding the player of paying attention is fulfilled.
2. A game support system as defined in claim 1, wherein: the audio offline analysis system is based on a frequency domain analysis technology, frequency domain characteristics of a target sample are obtained by performing frequency spectrum analysis on a plurality of kinds of gun sound audio samples collected in advance, the frequency domain characteristics are provided for a digital audio identification module to perform frequency spectrum comparison operation, the digital audio identification module performs time domain to frequency domain conversion on audio signals by adopting fast Fourier transform, namely, fourier transform is performed on the gun sound signals collected in advance, the frequency domain characteristics of the gun sound signals are observed, if the amplitude values of various gun sounds in certain fixed frequency bands are relatively large, the digital audio identification module takes the frequency bands as reference intervals to compare whether the amplitude values of the gun sound signals exceed a preset threshold value, and if the amplitude values of the gun sound signals exceed the preset threshold value, the gun sound is judged.
CN201910926107.1A 2019-09-27 2019-09-27 Game auxiliary system based on image recognition and audio recognition Active CN110652726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910926107.1A CN110652726B (en) 2019-09-27 2019-09-27 Game auxiliary system based on image recognition and audio recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910926107.1A CN110652726B (en) 2019-09-27 2019-09-27 Game auxiliary system based on image recognition and audio recognition

Publications (2)

Publication Number Publication Date
CN110652726A CN110652726A (en) 2020-01-07
CN110652726B true CN110652726B (en) 2022-10-25

Family

ID=69039625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910926107.1A Active CN110652726B (en) 2019-09-27 2019-09-27 Game auxiliary system based on image recognition and audio recognition

Country Status (1)

Country Link
CN (1) CN110652726B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113198172A (en) * 2020-02-03 2021-08-03 宏碁股份有限公司 Game key mode adjusting method and electronic device
CN111603776B (en) * 2020-05-21 2023-09-05 上海艾为电子技术股份有限公司 Method for identifying gunshot in audio data, motor driving method and related device
CN113289327A (en) * 2021-06-18 2021-08-24 Oppo广东移动通信有限公司 Display control method and device of mobile terminal, storage medium and electronic equipment
CN114904267A (en) * 2022-05-10 2022-08-16 网易(杭州)网络有限公司 In-game display control method and device, storage medium, and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012143509A (en) * 2011-01-14 2012-08-02 Furyu Kk Video game device, method of processing video game, video game processing program, and computer readable recording medium
CN106408001A (en) * 2016-08-26 2017-02-15 西安电子科技大学 Rapid area-of-interest detection method based on depth kernelized hashing
CN108126342A (en) * 2017-12-28 2018-06-08 珠海市君天电子科技有限公司 A kind of information processing method, device and terminal
CN108176049A (en) * 2017-12-28 2018-06-19 珠海市君天电子科技有限公司 A kind of information cuing method, device, terminal and computer readable storage medium
CN109276882A (en) * 2018-09-20 2019-01-29 珠海市君天电子科技有限公司 A kind of game householder method, device, electronic equipment and storage medium
CN109685060A (en) * 2018-11-09 2019-04-26 科大讯飞股份有限公司 Image processing method and device
CN110009004A (en) * 2019-03-14 2019-07-12 努比亚技术有限公司 Image processing method, computer equipment and storage medium
CN110052025A (en) * 2019-05-24 2019-07-26 网易(杭州)网络有限公司 Method for sending information, display methods, device, equipment, medium in game
CN110170170A (en) * 2019-05-30 2019-08-27 维沃移动通信有限公司 A kind of information display method and terminal device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5563613B2 (en) * 2012-03-30 2014-07-30 株式会社コナミデジタルエンタテインメント GAME DEVICE, GAME DEVICE CONTROL METHOD, AND PROGRAM
CN103023872B (en) * 2012-11-16 2016-01-06 杭州顺网科技股份有限公司 A kind of cloud game service platform
JP6561339B2 (en) * 2017-03-23 2019-08-21 株式会社コナミアミューズメント GAME SYSTEM AND COMPUTER PROGRAM USED FOR THE SAME

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012143509A (en) * 2011-01-14 2012-08-02 Furyu Kk Video game device, method of processing video game, video game processing program, and computer readable recording medium
CN106408001A (en) * 2016-08-26 2017-02-15 西安电子科技大学 Rapid area-of-interest detection method based on depth kernelized hashing
CN108126342A (en) * 2017-12-28 2018-06-08 珠海市君天电子科技有限公司 A kind of information processing method, device and terminal
CN108176049A (en) * 2017-12-28 2018-06-19 珠海市君天电子科技有限公司 A kind of information cuing method, device, terminal and computer readable storage medium
CN109276882A (en) * 2018-09-20 2019-01-29 珠海市君天电子科技有限公司 A kind of game householder method, device, electronic equipment and storage medium
CN109685060A (en) * 2018-11-09 2019-04-26 科大讯飞股份有限公司 Image processing method and device
CN110009004A (en) * 2019-03-14 2019-07-12 努比亚技术有限公司 Image processing method, computer equipment and storage medium
CN110052025A (en) * 2019-05-24 2019-07-26 网易(杭州)网络有限公司 Method for sending information, display methods, device, equipment, medium in game
CN110170170A (en) * 2019-05-30 2019-08-27 维沃移动通信有限公司 A kind of information display method and terminal device

Also Published As

Publication number Publication date
CN110652726A (en) 2020-01-07

Similar Documents

Publication Publication Date Title
CN110652726B (en) Game auxiliary system based on image recognition and audio recognition
CN111091824B (en) Voice matching method and related equipment
CN109543606B (en) Human face recognition method with attention mechanism
CN110909630B (en) Abnormal game video detection method and device
CN107766842B (en) Gesture recognition method and application thereof
EP4184927A1 (en) Sound effect adjusting method and apparatus, device, storage medium, and computer program product
CN109376603A (en) A kind of video frequency identifying method, device, computer equipment and storage medium
CN107203953A (en) It is a kind of based on internet, Expression Recognition and the tutoring system of speech recognition and its implementation
CN106778506A (en) A kind of expression recognition method for merging depth image and multi-channel feature
CN112132197A (en) Model training method, image processing method, device, computer equipment and storage medium
CN107423398A (en) Exchange method, device, storage medium and computer equipment
CN110009057A (en) A kind of graphical verification code recognition methods based on deep learning
Li et al. Sign language recognition based on computer vision
CN103198330B (en) Real-time human face attitude estimation method based on deep video stream
KR20190108378A (en) Method and System for Automatic Image Caption Generation
CN110465089B (en) Map exploration method, map exploration device, map exploration medium and electronic equipment based on image recognition
CN112562742B (en) Voice processing method and device
CN110175646A (en) Multichannel confrontation sample testing method and device based on image transformation
CN113537056A (en) Avatar driving method, apparatus, device, and medium
Zhao et al. The 3rd anti-uav workshop & challenge: Methods and results
CN111881776B (en) Dynamic expression acquisition method and device, storage medium and electronic equipment
CN112487981A (en) MA-YOLO dynamic gesture rapid recognition method based on two-way segmentation
CN116561533B (en) Emotion evolution method and terminal for virtual avatar in educational element universe
CN111564064A (en) Intelligent education system and method based on game interaction
US11379725B2 (en) Projectile extrapolation and sequence synthesis from video using convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant