WO2021147473A1 - 一种模型训练方法、内容生成方法以及相关装置 - Google Patents

一种模型训练方法、内容生成方法以及相关装置 Download PDF

Info

Publication number
WO2021147473A1
WO2021147473A1 PCT/CN2020/128245 CN2020128245W WO2021147473A1 WO 2021147473 A1 WO2021147473 A1 WO 2021147473A1 CN 2020128245 W CN2020128245 W CN 2020128245W WO 2021147473 A1 WO2021147473 A1 WO 2021147473A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
content
training
card
feature
Prior art date
Application number
PCT/CN2020/128245
Other languages
English (en)
French (fr)
Inventor
黄超
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP20915388.1A priority Critical patent/EP4005652A4/en
Publication of WO2021147473A1 publication Critical patent/WO2021147473A1/zh
Priority to US17/585,677 priority patent/US20220148295A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/40Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment
    • A63F13/42Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/20Input arrangements for video game devices
    • A63F13/21Input arrangements for video game devices characterised by their sensors, purposes or types
    • A63F13/214Input arrangements for video game devices characterised by their sensors, purposes or types for locating contacts on a surface, e.g. floor mats or touch pads
    • A63F13/2145Input arrangements for video game devices characterised by their sensors, purposes or types for locating contacts on a surface, e.g. floor mats or touch pads the surface being also a display device, e.g. touch screens
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/67Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/90Constructional details or arrangements of video game devices not provided for in groups A63F13/20 or A63F13/25, e.g. housing, wiring, connections or cabinets
    • A63F13/92Video game devices specially adapted to be hand-held while playing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7747Organisation of the process, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6027Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment

Definitions

  • This application relates to the field of computer technology, in particular to model training and content generation.
  • an image-based AI simulation learning training program can be used, that is, the content image is used as the input of the deep network, and the deep features of the image are extracted through the convolutional layer and the fully connected layer to train the relevant model, and finally output through the trained model Targeted content.
  • the present application provides a method for model training, which can effectively avoid training interference caused by the background of training content, and improve the efficiency and accuracy of the model training process.
  • the first aspect of the present application provides a method for model training, which can be applied to a system or program containing a model training function in a terminal device, including: acquiring a training set based on the interactive process of multiple card content, and the training set includes multiple Video frames, the video frame including the trigger area of the card content;
  • the feature area in the video frame is determined according to the trigger area of the card content, the feature area is provided with an action label for indicating training content, the action label is determined based on the card content, and the feature area is smaller than The interface area of the video frame;
  • the feature vector is input to a first model for training to obtain a second model, the first model is used to associate the feature vector with the action label, and the second model is used to indicate the target card content and the Describe the corresponding relationship of the action tags.
  • the second aspect of the present application provides a model training device, which is deployed in a terminal device and includes: an acquisition unit for acquiring a training set based on an interactive process of multiple card content, the training set including multiple video frames , The video frame includes the trigger area of the card content;
  • the determining unit is configured to determine a feature area in the video frame according to the trigger area of the card content, the feature area is provided with an action label for indicating training content, and the action label is determined based on the card content,
  • the characteristic area is smaller than the interface area of the video frame;
  • a statistical unit configured to count the triggering situation of the characteristic area to generate a characteristic vector
  • the training unit is configured to input the feature vector into a first model for training to obtain a second model, where the first model is used to associate the feature vector with the action label, and the second model is used to indicate a target Correspondence between the card content and the action tag.
  • the determining unit is specifically configured to determine the position information of the training content in the video frame according to the trigger area of the card content
  • the determining unit is specifically configured to train a third model according to the correspondence between the location information and the training content to obtain a fourth model, and the third model is used to associate the location information with the training Content, the fourth model is used to indicate the correspondence between the video frame and the characteristic region;
  • the determining unit is specifically configured to input the video frame into the fourth model to obtain the characteristic region.
  • the determining unit is further configured to determine the shape information corresponding to the training content
  • the determining unit is also used to filter out training content with similar shape information, so as to update the training content.
  • the determining unit is specifically configured to classify the training content according to the shape information to obtain at least two categories of training content;
  • the determining unit is specifically configured to determine the first gradient information according to the training content of the at least two categories;
  • the determining unit is specifically configured to determine second gradient information according to the position information
  • the determining unit is specifically configured to train the parameters of the third model by minimizing the first gradient information and the second gradient information to obtain the fourth model.
  • the statistics unit is specifically configured to separately count the triggering conditions of the characteristic regions in each video frame to generate a characteristic sequence, and the characteristic sequence is a multi-dimensional binary sequence;
  • the statistical unit is specifically configured to generate the feature vector according to the feature sequence.
  • the statistical unit is specifically configured to obtain the center point of the characteristic region
  • the statistical unit is specifically configured to determine the trigger situation according to the distance between the center point and the trigger operation point, so as to generate the characteristic sequence.
  • the training unit is specifically used to determine the time sequence feature of the feature vector
  • the training unit is specifically configured to input the feature vector into the first model for parameter adjustment
  • the training unit is specifically configured to train the first model after the time sequence feature input parameters are adjusted to obtain the second model.
  • the training unit is specifically configured to segment the feature vector according to feature extraction intervals to obtain at least two feature vector sets;
  • the training unit is specifically configured to extract the operation sequences corresponding to the same feature region in the at least two feature vector sets to obtain the time sequence feature.
  • the acquiring unit is specifically configured to acquire level information of an interaction process of multiple card content, and the level information is used to indicate the complexity of the interaction process;
  • the acquiring unit is specifically configured to extract a first card game and a second card game according to the level information, where the level information of the first card game and the second card game are different;
  • the acquiring unit is specifically configured to determine the training set according to the first card game and the second card game.
  • a third aspect of the present application provides a method for generating content, which is executed by a terminal device and includes: obtaining first content output by a target object;
  • the second content is generated according to the action tag.
  • a fourth aspect of the present application provides an apparatus for generating content.
  • the apparatus is deployed on a terminal device and includes: an obtaining unit configured to obtain first content output by a target object;
  • the input unit is configured to input the first content into a second model to obtain an action label, and the second model is trained based on the model training method described in the first aspect or any one of the first aspects;
  • the generating unit is configured to generate second content according to the action tag.
  • a fifth aspect of the present application provides a computer device, including: a memory, a processor, and a bus system; the memory is used to store program code; the processor is used to execute the first aspect or the first aspect according to instructions in the program code The method of model training described in any one of the first aspect, or the method of content generation described in the third aspect.
  • a sixth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the first aspect or any one of the first aspects described above The method of model training, or the method of content generation described in the third aspect.
  • the seventh aspect of the present application provides a computer program product, when the computer program product is executed, it is used to perform the model training method described in any one of the first aspect or the first aspect, or the third aspect The method of content generation.
  • the feature area is provided with an action label for indicating the training content, and the feature area is smaller than the interface area of the video frame; and then the trigger situation of the feature area is counted, To generate a feature vector; and then input the feature vector into the first model for training, so as to obtain a second model for indicating the correspondence between the content of the target card and the action label.
  • the process of imitation learning training based on the feature region is realized. Because the feature region is a part of the card image corresponding to the video frame, it does not contain the background part or other interference regions, which makes the model training process pertinent and reduces the amount of time in the model training process. The amount of data processing improves the efficiency and accuracy of model training.
  • Figure 1 is a network architecture diagram of the model training system running
  • FIG. 2 is a process architecture diagram of a model training provided by an embodiment of the application
  • FIG. 3 is a flowchart of a method for model training provided by an embodiment of the application
  • FIG. 4 is a schematic diagram of a model training scenario provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of a scenario for triggering an operation provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of another scenario for triggering an operation provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of a scenario for generating a feature sequence according to an embodiment of the application.
  • FIG. 8 is a model architecture diagram of a model training provided by an embodiment of the application.
  • FIG. 9 is a flowchart of another model training method provided by an embodiment of the application.
  • FIG. 10 is a flowchart of a method for generating content according to an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of a model training device provided by an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of a content generation apparatus provided by an embodiment of this application.
  • FIG. 13 is a schematic structural diagram of a server provided by an embodiment of this application.
  • FIG. 14 is a schematic structural diagram of a terminal device provided by an embodiment of this application.
  • the embodiment of the application provides a model training method and related devices, which can be applied to a system or program that includes a model training function in a terminal device, by acquiring a training set including multiple video frames, and determining features in the video frame
  • the feature area is provided with an action label for indicating the training content, and the feature area is smaller than the interface area of the video frame; then the trigger situation of the feature area is counted to generate a feature vector; and then the feature vector is input to the first model for training to obtain A second model for indicating the correspondence between the content of the target card and the action label.
  • the process of imitation learning training based on the feature region is realized.
  • the feature region is a part of the card image corresponding to the video frame, it does not contain the background part or other interference regions, which makes the model training process pertinent and reduces the amount of time in the model training process.
  • the amount of data processing improves the efficiency and accuracy of model training.
  • Action tag the manually labeled action corresponding to the game screen, which corresponds to the running strategy of the game.
  • Convolutional Neural Network (Convolutional Neural Network, CNN): It has the ability to characterize learning and can classify the input information according to its hierarchical structure.
  • Lightweight deep network small parameters, suitable for small deep networks of CPUs and embedded devices, in this embodiment, the first model is the preset lightweight deep network model; and the second model is the pass card The lightweight deep network model trained on the card content training set, that is, the second model is obtained by adjusting the parameters of the first model.
  • Depth feature The feature finally extracted by the image through the deep network, which contains the abstract information of the image.
  • Long Short-Term Memory Long Short-Term Memory (Long Short-Term Memory, LSTM): A time recurrent neural network, mainly to solve the problem of gradient disappearance and gradient explosion in the training process of long sequences, so as to generate feature vectors based on time series.
  • YOLO (You only look once): is a target detection framework used to obtain the characteristics of the target area.
  • the third model is the preset YOLO model; and the fourth model is the content of the passing card
  • the position correspondence relationship of the YOLO model after training, that is, the fourth model is obtained by adjusting the parameters of the third model.
  • Darknet53 It is a deep network with 53 convolutional layers.
  • DQN algorithm Abbreviation for Deep Q-learning, a deep value learning algorithm.
  • the model training method provided in this application can be applied to a system or program that includes a model training function in a terminal device, such as a card game.
  • the model training system can run in the network architecture shown in FIG. 1, as shown in FIG. As shown in 1, is the network architecture diagram of the model training system.
  • the model training system can provide model training with multiple information sources.
  • the terminal device establishes a connection with the server through the network, and then receives multiple contents sent by the server. , And display the corresponding content according to the strategy of the terminal device itself.
  • the server trains the relevant model by collecting the training set uploaded by the terminal device, so that the generated content is suitable for the target terminal device; it can be understood that in Figure 1 A variety of terminal devices are shown.
  • more or fewer types of terminal devices can participate in the model training process.
  • the specific number and types depend on the actual scenario and are not limited here.
  • the figure One server is shown in 1, but in an actual scenario, multiple servers may also participate, especially in a multi-content application interaction scenario.
  • the specific number of servers depends on the actual scenario.
  • model training method provided in this embodiment can also be performed offline, that is, without the participation of the server, at this time the terminal device is locally connected with other terminal devices to perform the process of model training between the terminal devices.
  • the process of single-player game content simulation For example: the process of single-player game content simulation.
  • the above model training system can be run on a personal mobile terminal, that is, the terminal device can be a mobile terminal, for example, as an application of card games, can also run on a server, or run on a third-party device
  • the specific model training system can be run in the above-mentioned equipment in the form of a program, can also be run as a system component in the above-mentioned equipment, and can also be used as a cloud A type of service program.
  • the specific operation mode depends on the actual scenario and is not limited here.
  • image-based AI imitating learning training schemes can be used, that is, content images such as card content are used as the input of the deep network, and the depth features of the image are extracted through the convolutional layer and the fully connected layer to train the relevant model, and finally through the training The subsequent model outputs targeted content.
  • this application proposes a model training method, which is applied to the model training process framework shown in FIG. 2.
  • a model training method provided in this embodiment of the application is Process architecture diagram, first collect the card content related to the user or the card content of the server database as the training content and input the detection model to detect the feature area, and count the triggering of the feature area to obtain the feature vector, and then proceed based on the feature vector Imitate learning and training, and then generate intelligent content based on user input.
  • the method provided in this application can be a kind of program writing, as a kind of processing logic in a hardware system, or as a model training device, which implements the aforementioned processing logic in an integrated or external manner.
  • the model training device obtains a training set including multiple video frames, and determines a feature area in the video frame, where the feature area is provided with an action label for indicating the training content, and the feature area is smaller than the interface of the video frame Area; then count the triggering of the characteristic area to generate a characteristic vector; and then input the characteristic vector into the first model for training to obtain a second model for indicating the correspondence between the content of the target card and the action label.
  • the process of imitation learning training based on the feature region is realized. Because the feature region is a part of the card image corresponding to the video frame, it does not contain the background part or other interference regions, which makes the model training process pertinent and reduces the amount of time in the model training process. The amount of data processing improves the efficiency and accuracy of model training.
  • Figure 3 is a flowchart of a method of model training provided by an embodiment of this application, which can be applied to the generation of card game content.
  • the embodiment of the present application at least includes the following steps:
  • the interaction process based on the content of multiple cards can be displayed in the form of video, that is, the training set includes multiple video frames.
  • the video frame can be a continuous video selected based on the content of the target card, for example: Select consecutive card game videos within 1 hour.
  • the acquisition of video frames can be frame-by-frame, or can be acquired at a certain sampling interval. This is considering that there may be a time interval in the content switching process, that is, according to the training sample Combine adjacent video frames at intervals to obtain a training set. For example, select video frames within 1 second to be combined as a sample in the training set.
  • the characteristic area is an area determined based on the trigger area of the card content, and the trigger area means that the user can select the card by triggering the area.
  • the feature area can be the same as the trigger area, or it can be reduced appropriately, for example: remove the border of the card as the feature area; in addition, the feature area is provided with an action label to indicate the training content, and the feature area is smaller than the video frame. Interface area.
  • the action label is the logical action corresponding to each card or other button, for example: kill, flash, and there is a logical connection between “flash” and “kill”; and the training content is the card
  • the sequence of cards in the game is the logic of cards.
  • FIG. 4 it is a schematic diagram of a model training scenario provided by an embodiment of the present application.
  • the figure shows a confrontation scenario of a card game, and the figure includes the OK keys A1, Function card A2, character's blood volume A3, equipment card A4, and skill card A5 are different characteristic areas. Each characteristic area corresponds to the logical process it represents.
  • the action label indicated by the confirm button A1 is to confirm the card.
  • Function The action label indicated by card A2 is the interaction logic indicated by the content of the card.
  • the action label of the character’s blood volume A3 is the end judgment indicator of the card game process, that is, the card game ends when the card game is reset;
  • the action label of the equipment card A4 is the battle Interaction parameters between users;
  • the action label of skill card A5 is the additional card interaction logic.
  • the process of determining the feature region can be based on a preset setting, that is, to mark the relevant feature region in the video frame, and directly count the relevant conditions of the mark during the training process; in addition, for The process of determining the characteristic area can also be based on image characteristics, such as: recognizing the image feature with the image size of 10*10 in the video frame and determining the corresponding characteristic area; or recognizing the characteristic characters in the image, such as: kill, flash, etc. Characters, and a certain range is determined as the characteristic area based on the characters. The specific determination method depends on the actual scene.
  • the trigger situation of the characteristic area is the card playing situation of the characteristic area corresponding to the card.
  • the trigger situation can be realized by different triggering methods, for example, it is triggered by a sliding operation, it also starts to be triggered by a click operation, or it can be activated by voice.
  • the operation mode is triggered, and the trigger mode depends on the actual scene, and there is no limitation here.
  • FIG. 5 it is a schematic diagram of a trigger operation scenario provided by an embodiment of the present application.
  • the figure shows the distance B1 between the trigger operation point and the center point of the characteristic area.
  • FIG. 6 it is a schematic diagram of another trigger operation scenario provided by an embodiment of the present application.
  • the distance C1 between the trigger operation point and the center point of the first card and the distance C2 between the trigger operation point and the center point of the second card can be calculated by comparing the size of C1 and C2 to determine which card the trigger operation corresponds to and record, for example : If C1 is less than C2, the trigger operation corresponds to the first card being triggered.
  • the process of generating feature vectors can be performed based on the following process; since the card playing situation corresponding to the trigger situation has a certain time sequence, the trigger situation of the feature area in each video frame can be counted separately to generate Feature sequence, the feature sequence is a multi-dimensional binary sequence; then the feature vector is generated according to the feature sequence.
  • Figure 7 it is a schematic diagram of a scenario for generating a feature sequence provided by an embodiment of the present application. The figure shows a digitization process of the card issuing sequence, where each digit of the feature sequence is assigned a different meaning. For example: for the cards “flash" in the first place in the sequence, and the cards “invulnerable” in the third place in the sequence, arrange and count all the cards that may appear in the card in turn. Correlation operations to obtain the characteristic sequence.
  • the description of the card playing situation can be carried out in a binary manner, that is, 1 represents the playing card, and 0 represents the playing card, for example: the trigger situation in the characteristic area is a "kill” , If there is no “flash” but "impeccable", the characteristic sequence is: "101".
  • you can also use a digital representation method that is, multiple position values as a group, and feature representation in the order of appearance 1, for example: our blood volume is 4, and the existing 3, the feature sequence is expressed as "0010", The specific feature expression method depends on the actual scene and is not limited here.
  • the first model is used to associate feature vectors with action tags
  • the second model is used to indicate the correspondence between the content of the target card and the action tags.
  • the first model and the second model may be deep learning network models used for imitation learning.
  • the model architecture of the first model can refer to FIG. 8, which is this application.
  • a model architecture diagram for model training provided by the embodiment. First determine the time series feature of the feature vector; then input the feature vector into the first model for parameter adjustment; train the first model after the time series feature input parameters are adjusted to obtain the first model Two models.
  • the time sequence feature corresponds to the order of the cards, a single feature vector may cause the time sequence feature to be too scattered.
  • the feature vector can be divided according to the feature extraction interval to obtain at least two feature vector sets; then Extracting at least two feature vector sets corresponding to the same feature region of the operation sequence to obtain the time series feature. For example, feature learning is performed according to the feature vectors of 5 consecutive frames.
  • the first model consists of a fully connected layer and an LSTM layer.
  • the purpose of the fully connected layer is to learn deep features from data features and learn the relationship between features in different dimensions, while LSTM is to learn card games Timing characteristics in.
  • the input of the network is the feature vector corresponding to 5 frames of continuous images.
  • the feature vector first extracts the deep features in it through the fully connected layer, and then enters it into the LSTM, and outputs it as an action label, which represents the game behavior.
  • the goal of the first model is to output game behaviors that are as consistent as possible with the player, that is, in the same game state, the game goal clicked by the AI is consistent with the player.
  • the output of the corresponding model is 21, that is, add the behavior category of no action, and then adjust the parameters to obtain the first Two models.
  • the feature region is a part of the card image corresponding to the video frame, it does not contain the background part or other interference regions, which makes the model training process pertinent and reduces the amount of time in the model training process.
  • the amount of data processing improves the efficiency and accuracy of model training.
  • the process of determining the characteristic region can also be performed based on the method of detecting model training.
  • this scenario will be introduced in conjunction with card game applications. Please refer to FIG. 9, which is another example provided by this application.
  • the flow chart of the method of model training, the embodiment of this application at least includes the following steps:
  • step 901 is similar to step 301 in the embodiment shown in FIG. 3, and related features can be referred to, and will not be repeated here.
  • the training content is the set of card content in the current video frame.
  • the position information of the training content in the video frame can be the coordinates of the center point of the card corresponding to the training content, or the corner of the card corresponding to the training content.
  • the point coordinate can also be the coordinate position of the center point and the corner point.
  • the shape information corresponding to the training content can be determined first; Except training content with similar shape information to update the training content.
  • 28 categories can be defined, including 18 types of cards, 4 types of equipment, 4 types of HP, confirm, and cancel.
  • re-sampling samples can be used to increase the number of occurrences of categories with fewer samples in a round of iterations to ensure that the number of samples of each category exceeds 50.
  • the third model and the fourth model can adopt the yoloV3 model, and the yoloV3 network can be divided into a feature extraction part and a target detection part.
  • the feature extraction part adopts the darknet53 network pre-trained on the large data set ImageNet.
  • the target detection part predicts the target position based on three scales of convolution features.
  • the yoloV3 network divides the image into 13*13, 26*26 and 52*52 grids, respectively Detect large-scale, medium-scale, and small-scale targets.
  • large-scale targets correspond to card targets in the game
  • medium-scale targets correspond to the confirm and cancel buttons in the game
  • small-scale targets Corresponds to the blood volume and equipment in the game.
  • the category cross entropy and the loss of the target position can be merged as the loss function of the network, and the model parameters can be optimized by the way of gradient backward transfer.
  • yoloV3 first extracts the features through the feature extraction part, and extracts the convolution features based on the 13*13 grid size to predict the category and location of the card.
  • the network outputs the probability and location of the category to the upper left Angle x, y coordinates and width and height, the loss function combines the category cross entropy loss and the loss of the target position.
  • the corresponding gradient is calculated, and the model parameters are updated by the method of gradient backward transfer to obtain the fourth model.
  • the fourth model has the feature area recognition function, thereby ensuring the accuracy of the feature area detection, that is, the accuracy of the card or related function button detection. .
  • steps 905 and 906 are similar to steps 303 and 304 in the embodiment described in FIG. 3, and related features can be referred to, and will not be repeated here.
  • the training of the third model improves the accuracy of determining the characteristic area, ensures the accurate recognition process of related cards or function keys, and further improves the removal of interference items in the subsequent imitation learning process. , Improve the efficiency and accuracy of model training.
  • FIG. 10 is a flowchart of a content generation method provided by an embodiment of the application.
  • the application example includes at least the following steps:
  • the target object can be a user group in a specific program, for example: Hero kills game users in the game database; the target object can also be a single user, which is mainly applied to the content of a stand-alone game.
  • the terminal device collects the user’s local operation data and trains the AI of the corresponding game to realize the intelligent generation process of matching and the user’s content; the target object can also be the terminal device, that is, this application
  • the proposed content generation method is applied to the process of confrontation learning. Through large-scale battles between multiple terminal devices, a large amount of battle data is obtained to enrich the content of the database.
  • the first content may be the historical battle data of the game users, and any continuous video of any duration is selected as the training content.
  • the first content is the user's current operation.
  • the corresponding AI side operation is automatically generated. For example, if the player "kills”, the AI outputs "flash”, which is understandable Yes, in an actual scenario, the corresponding process is a multi-step card-playing process, which is only an example here.
  • the second content is a card-playing strategy generated based on the cards output by the AI, and the content type of the second content depends on the actual scene, and is not limited here.
  • the AI can adaptively generate the target card content of the relevant user, which improves the efficiency and accuracy of content generation, and further improves the user's user experience in the content interaction process. Experience.
  • FIG. 11 is a schematic structural diagram of a model training device provided by an embodiment of the application.
  • the model training device 1100 includes:
  • the acquiring unit 1101 is configured to acquire a training set based on an interactive process of multiple card content, the training set includes multiple video frames, and the video frame includes a trigger area of the card content;
  • the determining unit 1102 is configured to determine a feature area in the video frame according to the trigger area of the card content, the feature area is provided with an action label for indicating training content, and the action label is determined based on the card content ,
  • the characteristic area is smaller than the interface area of the video frame;
  • the statistics unit 1103 is used to count the trigger conditions of the characteristic regions to generate a characteristic vector
  • the training unit 1104 is configured to input the feature vector into a first model for training to obtain a second model, where the first model is used to associate the feature vector with the action label, and the second model is used to indicate The corresponding relationship between the content of the target card and the action tag.
  • the determining unit 1102 is specifically configured to determine the location information of the training content in the video frame according to the trigger area of the card content;
  • the determining unit 1102 is specifically configured to train a third model according to the correspondence between the location information and the training content to obtain a fourth model, and the third model is used to associate the location information with the training content. Training content, where the fourth model is used to indicate the correspondence between the video frame and the feature region;
  • the determining unit 1102 is specifically configured to input the video frame into the fourth model to obtain the characteristic region.
  • the determining unit 1102 is further configured to determine the shape information corresponding to the training content
  • the determining unit 1102 is also used to filter out training content with similar shape information, so as to update the training content.
  • the determining unit 1102 is specifically configured to classify the training content according to the shape information to obtain at least two categories of training content;
  • the determining unit 1102 is specifically configured to determine the first gradient information according to the training content of the at least two categories;
  • the determining unit 1102 is specifically configured to determine second gradient information according to the position information
  • the determining unit 1102 is specifically configured to train the parameters of the third model by minimizing the first gradient information and the second gradient information to obtain the fourth model.
  • the statistical unit 1103 is specifically configured to separately count the trigger conditions of the characteristic regions in each video frame to generate a characteristic sequence, the characteristic sequence being a multi-dimensional binary sequence;
  • the statistical unit 1103 is specifically configured to generate the feature vector according to the feature sequence.
  • the statistical unit 1103 is specifically configured to obtain the center point of the characteristic region
  • the statistics unit 1103 is specifically configured to determine the trigger condition according to the distance between the center point and the trigger operation point to generate the characteristic sequence.
  • the training unit 1104 is specifically configured to determine the time series feature of the feature vector
  • the training unit 1104 is specifically configured to input the feature vector into the first model for parameter adjustment
  • the training unit 1104 is specifically configured to train the first model after the time sequence feature input parameters are adjusted to obtain the second model.
  • the training unit 1104 is specifically configured to segment the feature vector according to feature extraction intervals to obtain at least two feature vector sets;
  • the training unit 1104 is specifically configured to extract the operation sequences corresponding to the same feature region in the at least two feature vector sets to obtain the time sequence feature.
  • the acquiring unit 1101 is specifically configured to acquire level information of an interaction process of multiple card content, and the level information is used to indicate the complexity of the interaction process;
  • the acquiring unit 1101 is specifically configured to extract a first card game and a second card game according to the level information, where the level information of the first card game and the second card game are different;
  • the acquiring unit 1101 is specifically configured to determine the training set according to the first card game and the second card game.
  • the feature area is provided with an action label for indicating the training content, and the feature area is smaller than the interface area of the video frame; and then the trigger situation of the feature area is counted, To generate a feature vector; and then input the feature vector into the first model for training, so as to obtain a second model for indicating the correspondence between the content of the target card and the action label.
  • the imitation learning training process based on the feature region is realized. Because the feature region is a part of the corresponding image of the video frame, it does not contain the background part or other interference regions, which makes the model training process pertinent and reduces the data processing in the model training process Increased the training efficiency and accuracy of the model.
  • FIG. 12 is a schematic structural diagram of a server provided in an embodiment of the present application, including: an obtaining unit 1201, configured to obtain the first content output by the target object ;
  • the input unit 1202 is configured to input the first content into a second model to obtain an action label, and the second model is trained based on the model training method described in the first aspect or any one of the first aspect;
  • the generating unit 1203 is configured to generate second content according to the action tag.
  • FIG. 13 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 1300 may have relatively large differences due to differences in configuration or performance, and may include One or more central processing units (CPU) 1322 (e.g., one or more processors) and memory 1332, and one or more storage media 1330 (e.g., one or more) storing application programs 1342 or data 1344 Mass storage devices).
  • the memory 1332 and the storage medium 1330 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of command operations on the server.
  • the central processing unit 1322 may be configured to communicate with the storage medium 1330, and execute a series of instruction operations in the storage medium 1330 on the server 1300.
  • the server 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1358, and/or one or more operating systems 1341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • operating systems 1341 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the steps performed by the model training device in the foregoing embodiment may be based on the server structure shown in FIG. 13.
  • the embodiment of the present application also provides a terminal device.
  • FIG. 14 it is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the terminal can be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a car computer, etc. Take the terminal as a mobile phone as an example:
  • FIG. 14 shows a block diagram of a part of the structure of a mobile phone related to a terminal provided in an embodiment of the present application.
  • the mobile phone includes: a radio frequency (RF) circuit 1410, a memory 1420, an input unit 1430, a display unit 1440, a sensor 1450, an audio circuit 1460, a wireless fidelity (WiFi) module 1470, and a processor 1480 , And power supply 1490 and other components.
  • the input unit 1430 may include a touch panel 1431 and other input devices 1432
  • the display unit 1440 may include a display panel 1441
  • the audio circuit 1460 may include a speaker 1461 and a microphone 1462.
  • FIG. 14 does not constitute a limitation on the mobile phone, and may include more or less components than those shown in the figure, or a combination of certain components, or different component arrangements.
  • the processor 1480 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. It executes by running or executing software programs and/or modules stored in the memory 1420, and calling data stored in the memory 1420. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole.
  • the processor 1480 may include one or more processing units; optionally, the processor 1480 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, and application programs. And so on, the modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 1480.
  • the mobile phone may also include a camera, a Bluetooth module, etc., which will not be repeated here.
  • the processor 1480 included in the terminal device also has the function of executing each step of the above-mentioned page processing method.
  • An embodiment of the present application also provides a computer-readable storage medium, which stores model training instructions, which when run on a computer, causes the computer to execute the embodiments shown in FIGS. 2 to 10 above. The steps performed by the model training device in the described method.
  • the embodiment of the present application also provides a computer program product including model training instructions, which when running on a computer, causes the computer to execute the method performed by the model training device in the method described in the foregoing embodiments shown in FIGS. 2 to 10 step.
  • An embodiment of the present application also provides a model training system.
  • the model training system may include the model training device in the embodiment described in FIG. 11 or the content generation device described in FIG. 12.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a model training device, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

一种模型训练方法、内容生成方法以及相关装置,应用于基于人工智能的卡牌内容生成过程中。所述方法包括:基于多张卡牌内容的交互过程获取训练集(301);根据卡牌内容的触发区域确定所述视频帧中的特征区域(302);统计所述特征区域的触发情况,以生成特征向量(303);将所述特征向量输入第一模型进行训练,以得到第二模型(304)。本方法实现了基于特征区域的模仿学习训练过程,由于特征区域为视频帧对应卡牌图像的一部分,不包含背景部分或其他干扰区域,使得模型训练过程具有针对性,减小了模型训练过程中的数据处理量,提高了模型训练效率以及准确性。

Description

一种模型训练方法、内容生成方法以及相关装置
本申请要求于2020年1月21日提交中国专利局、申请号202010073390.0、申请名称为“一种模型训练方法、内容生成方法以及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及模型训练、内容生成。
背景技术
随着移动终端相关技术的发展,越来越多的智能设备出现在人们的生活中,其中,通过智能设备进行内容交互为主要应用之一,例如:游戏内容交互,为保证内容对于用户的吸引性,需要对内容进行长期的针对性更新,可以通过人工智能(Artificial Intelligence,AI)进行内容的智能更新。
一般,可以基于图像的AI模仿学习训练方案,即将内容图像作为深度网络的输入,并通过卷积层和全连接层提取图像的深度特征,以对相关模型进行训练,最终通过训练后的模型输出针对性内容。
但是,由于在模仿学习的过程中,内容图像的特征繁多,且存在一些干扰内容,即无实际内容指示的部分,容易造成模型训练的过拟合,且训练量大,影响模型训练的效率以及准确性。
发明内容
有鉴于此,本申请提供一种模型训练的方法,可以有效避免由于训练内容的背景产生的训练干扰,提高模型训练过程的效率及准确性。
本申请第一方面提供一种模型训练的方法,可以应用于终端设备中包含模型训练功能的系统或程序中,包括:基于多张卡牌内容的交互过程获取训练集,所述训练集包括多个视频帧,所述视频帧包括所述卡牌内容的触发区域;
根据所述卡牌内容的触发区域确定所述视频帧中的特征区域,所述特征区域设有用于指示训练内容的动作标签,所述动作标签基于所述卡牌内容确定,所述特征区域小于所述视频帧的界面区域;
统计所述特征区域的触发情况,以生成特征向量;
将所述特征向量输入第一模型进行训练,以得到第二模型,所述第一模型用于关联所述特征向量与所述动作标签,所述第二模型用于指示目标卡牌内容与所述动作标签的对应关系。
本申请第二方面提供一种模型训练的装置,所述装置部署于终端设备,包括:获取单元,用于基于多张卡牌内容的交互过程获取训练集,所述训练集包括多个视频帧,所述视频帧包括所述卡牌内容的触发区域;
确定单元,用于根据所述卡牌内容的触发区域确定所述视频帧中的特征区域,所述特征区域设有用于指示训练内容的动作标签,所述动作标签基于所述卡牌内容确定,所述特征区域小于所述视频帧的界面区域;
统计单元,用于统计所述特征区域的触发情况,以生成特征向量;
训练单元,用于将所述特征向量输入第一模型进行训练,以得到第二模型,所述第一模型用于关联所述特征向量与所述动作标签,所述第二模型用于指示目标卡牌内容与所述动作标签的对应关系。
在本申请一些可能的实现方式中,所述确定单元,具体用于根据所述卡牌内容的触发区域确定所述训练内容在所述视频帧中的位置信息;
所述确定单元,具体用于根据所述位置信息与所述训练内容的对应关系对第三模型进行训练,以得到第四模型,所述第三模型用于关联所述位置信息与所述训练内容,所述第四模型用于指示所述视频帧与所述特征区域的对应关系;
所述确定单元,具体用于将所述视频帧输入所述第四模型,以得到所述特征区域。
在本申请一些可能的实现方式中,所述确定单元,还用于确定所述训练内容对应的形状信息;
所述确定单元,还用于筛除所述形状信息相似的训练内容,以对所述训练内容进行更新。
在本申请一些可能的实现方式中,所述确定单元,具体用于根据所述形状信息对所述训练内容进行分类,以得到至少两个类别训练内容;
所述确定单元,具体用于根据所述至少两个类别训练内容确定第一梯度信息;
所述确定单元,具体用于根据所述位置信息确定第二梯度信息;
所述确定单元,具体用于通过最小化所述第一梯度信息和所述第二梯度信息的方式对所述第三模型的参数进行训练,以得到第四模型。
在本申请一些可能的实现方式中,所述统计单元,具体用于分别统计每个视频帧中的所述特征区域的触发情况,以生成特征序列,所述特征序列为多维二值序列;
所述统计单元,具体用于根据所述特征序列生成所述特征向量。
在本申请一些可能的实现方式中,所述统计单元,具体用于获取所述特征区域的中心点;
所述统计单元,具体用于根据所述中心点与触发操作点的距离确定所述触发情况,以生成所述特征序列。
在本申请一些可能的实现方式中,所述训练单元,具体用于确定所述特征向量的时序特征;
所述训练单元,具体用于将所述特征向量输入所述第一模型进行参数调整;
所述训练单元,具体用于将所述时序特征输入参数调整后的所述第一模型进行训练,以得到所述第二模型。
在本申请一些可能的实现方式中,所述训练单元,具体用于按照特征提取间隔对所述特征向量进行分割,以得到至少两个特征向量集;
所述训练单元,具体用于提取所述至少两个特征向量集中对应于相同特征区域的操作序列,以得到所述时序特征。
在本申请一些可能的实现方式中,所述获取单元,具体用于获取多张卡牌内容的交互过程的等级信息,所述等级信息用于指示所述交互过程的复杂度;
所述获取单元,具体用于根据所述等级信息提取第一牌局和第二牌局,所述第一牌局与所述第二牌局的等级信息不同;
所述获取单元,具体用于根据所述第一牌局与所述第二牌局确定所述训练集。
本申请第三方面提供一种内容生成的方法,所述方法由终端设备执行,包括:获取目标对象输出的第一内容;
将所述第一内容输入第二模型,以得到动作标签,所述第二模型基于上述第一方面或第一方面任一项所述的模型训练的方法训练所得;
根据所述动作标签生成第二内容。
本申请第四方面提供一种内容生成的装置,所述装置部署于终端设备,包括:获取单元,用于获取目标对象输出的第一内容;
输入单元,用于将所述第一内容输入第二模型,以得到动作标签,所述第二模型基于上述第一方面或第一方面任一项所述的模型训练的方法训练所得;
生成单元,用于根据所述动作标签生成第二内容。
本申请第五方面提供一种计算机设备,包括:存储器、处理器以及总线系统;所述存储器用于存储程序代码;所述处理器用于根据所述程序代码中的指令执行上述第一方面或第一方面任一项所述的模型训练的方法,或第三方面所述的内容生成的方法。
本申请第六方面提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面任一项所述的模型训练的方法,或第三方面所述的内容生成的方法。
本申请第七方面提供一种计算机程序产品,当所述计算机程序产品被执行时,用于执行上述第一方面或第一方面任一项所述的模型训练的方法,或第三方面所述的内容生成的方法。
从以上技术方案可以看出,本申请实施例具有以下优点:
通过获取包括多个视频帧的训练集,并确定视频帧中的特征区域,其中特征区域设有用于指示训练内容的动作标签,特征区域小于视频帧的界面区域;然后统计特征区域的触发情况,以生成特征向量;进而将特征向量输入第一模型进行训练,以得到用于指示目标卡牌内容与动作标签的对应关系的第二模型。从而实现了基于特征区域的模仿学习训练过程,由于特征区域为视频帧对应卡牌图像的一部分,不包含背景部分或其他干扰区域,使得模型训练过程具有针对性,减小了模型训练过程中的数据处理量,提高了模型训练效率以及准确性。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为模型训练系统运行的网络架构图;
图2为本申请实施例提供的一种模型训练的流程架构图;
图3为本申请实施例提供的一种模型训练的方法的流程图;
图4为本申请实施例提供的一种模型训练的场景示意图;
图5为本申请实施例提供的一种触发操作的场景示意图;
图6为本申请实施例提供的另一种触发操作的场景示意图;
图7为本申请实施例提供的一种特征序列生成的场景示意图;
图8为本申请实施例提供的一种模型训练的模型架构图;
图9为本申请实施例提供的另一种模型训练的方法的流程图;
图10为本申请实施例提供的一种内容生成的方法的流程图;
图11为本申请实施例提供的一种模型训练装置的结构示意图;
图12为本申请实施例提供的一种内容生成的装置的结构示意图;
图13为本申请实施例提供的一种服务器的结构示意图;
图14为本申请实施例提供的一种终端设备的结构示意图。
具体实施方式
本申请实施例提供了一种模型训练的方法以及相关装置,可以应用于终端设备中包含模型训练功能的系统或程序中,通过获取包括多个视频帧的训练集,并确定视频帧中的特征区域,其中特征区域设有用于指示训练内容的动作标签,特征区域小于视频帧的界面区域;然后统计特征区域的触发情况,以生成特征向量;进而将特征向量输入第一模型进行训练,以得到用于指示目标卡牌内容与动作标签的对应关系的第二模型。从而实现了基于特征区域的模仿学习训练过程,由于特征区域为视频帧对应卡牌图像的一部分,不包含背景部分或其他干扰区域,使得模型训练过程具有针对性,减小了模型训练过程中的数据处理量,提高了模型训练效率以及准确性。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“对应于”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
首先,对本申请实施例中可能出现的一些名词进行解释。
动作标签:游戏画面对应的人工标注的动作,该动作对应于游戏的运行策略。
卷积神经网络(Convolutional Neural Network,CNN):具有表征学习能力,能够按其阶层结构对输入信息进行平移不变的分类。
轻量级深度网络:参数量小,适用于CPU和嵌入式设备的小型深度网络,本实施例中,第一模型即为预设的轻量级深度网络模型;而第二模型即为经过卡牌内容训练集训练后的轻量级深度网络模型,即第二模型由第一模型经过参数调整所得。
深度特征:图像通过深度网络最终提取到的特征,包含了图像的抽象信息。
长短期记忆网络(Long Short-Term Memory,LSTM):一种时间递归神经网络,主要是为了解决长序列训练过程中的梯度消失和梯度爆炸的问题,以生成基于时间序列的特征向量。
YOLO(You only look once):是一种目标检测框架,用于获取目标区域的特征,在本实施例中,第三模型即为预设的YOLO模型;而第四模型即为经过卡牌内容的位置对应关系训练后的YOLO模型,即第四模型由第三模型经过参数调整所得。
Darknet53:是包含53层卷积层的深度网络。
DQN算法:Deep Q-learning的缩写,为深度价值学习算法。
应理解,本申请提供的模型训练方法可以应用于终端设备中包含模型训练功能的系统或程序中,例如牌类游戏,例如模型训练系统可以运行于如图1所示的网络架构中,如图1所示,是模型训练系统运行的网络架构图,如图可知,模型训练系统可以提供与多个信息源的模型训练,终端设备通过网络建立与服务器的连接,进而接收服务器发送的多个内容,并根据终端设备本身的策略进行对应的内容显示,另外,服务器通过收集终端设备上传的训练集,对相关模型进行训练,使得生成的内容适应于目标终端设备;可以理解的是,图1中示出了多种终端设备,在实际场景中可以有更多或更少种类的终端设备参与到模型训练的过程中,具体数量和种类因实际场景而定,此处不做限定,另外,图1中示出了一个服务器,但在实际场景中,也可以有多个服务器的参与,特别是在多内容应用交互的场景中,具体服务器数量因实际场景而定。
应当注意的是,本实施例提供的模型训练方法也可以离线进行,即不需要服务器的参与,此时终端设备在本地与其他终端设备进行连接,进而进行终端设备之间的模型训练的过程,例如:单机游戏内容模拟的过程。
可以理解的是,上述模型训练系统可以运行于个人移动终端,即终端设备例如可以是移动终端,例如:作为牌类游戏这一类应用,也可以运行于服务器,还可以作为运行于第三方设备以提供模型训练,以得到信息源的模型训练处理结果;具体的模型训练系统可以是以一种程序的形式在上述设备中运行,也可以作为上述设备中的系统部件进行运行,还可以作为云端服务程序的一种,具体运作模式因实际场景而定,此处不做限定。
随着移动终端相关技术的发展,越来越多的智能设备出现在人们的生活中,其中,通过智能设备进行内容交互为主要应用之一,例如:游戏内容交互,为保证内容对于用户的吸引性,需要对内容进行长期的针对性更新,可以通过人工智能进行内容的智能更新。
一般,可以基于图像的AI模仿学习训练方案,即将内容图像例如卡牌内容作为深度网络的输入,并通过卷积层和全连接层提取图像的深度特征,以对相关模型进行训练,最终通过训练后的模型输出针对性内容。
但是,由于在模范学习的过程中,内容图像的特征繁多,且存在一些干扰内容,即无实际内容指示的部分,容易造成模型训练的过拟合,且训练量大,影响模型训练的效率以及准确性。
为了解决上述问题,本申请提出了一种模型训练的方法,该方法应用于图2所示的模型训练的流程框架中,如图2所示,为本申请实施例提供的一种模型训练的流程架构图, 首先收集与用户相关的卡牌内容或服务器数据库的卡牌内容作为训练内容输入检测模型,进行特征区域的检测,并统计特征区域的触发情况以得到特征向量,然后基于特征向量进行模仿学习训练,进而根据用户的输入内容进行智能的内容生成。
可以理解的是,本申请所提供的方法可以为一种程序的写入,以作为硬件系统中的一种处理逻辑,也可以作为一种模型训练装置,采用集成或外接的方式实现上述处理逻辑。作为一种实现方式,该模型训练装置通过获取包括多个视频帧的训练集,并确定视频帧中的特征区域,其中特征区域设有用于指示训练内容的动作标签,特征区域小于视频帧的界面区域;然后统计特征区域的触发情况,以生成特征向量;进而将特征向量输入第一模型进行训练,以得到用于指示目标卡牌内容与动作标签的对应关系的第二模型。从而实现了基于特征区域的模仿学习训练过程,由于特征区域为视频帧对应卡牌图像的一部分,不包含背景部分或其他干扰区域,使得模型训练过程具有针对性,减小了模型训练过程中的数据处理量,提高了模型训练效率以及准确性。
结合上述流程架构,下面将对本申请中模型训练的方法进行介绍,请参阅图3,图3为本申请实施例提供的一种模型训练的方法的流程图,可以应用于牌类游戏内容的生成过程中,本申请实施例至少包括以下步骤:
301、基于多张卡牌内容的交互过程获取训练集。
本实施例中,基于多张卡牌内容的交互过程可以是通过视频的形式展现,即训练集包括多个视频帧,例如视频帧可以是基于目标卡牌内容选取的连续的一段视频,例如:选取1小时内连续的牌类游戏视频。
在一种可能的实现方式中,对于视频帧的获取,可以是逐帧的提取,也可以是按照一定的采样间隔进行获取,这是考虑到内容的切换过程可能存在时间间隔,即根据训练样本间隔对相邻的视频帧进行合并,以得到训练集,例如:选取1秒内的视频帧合并作为训练集中的一个样本。
302、根据卡牌内容的触发区域确定视频帧中的特征区域。
本实施例中,特征区域为基于卡牌内容的触发区域确定的区域,而触发区域即用户可以通过触发该区域进行卡牌的选择。通常情况下,特征区域可以与触发区域相同,也可以适当的缩小,例如:去除卡牌的边框以作为特征区域;另外,特征区域设有用于指示训练内容的动作标签,特征区域小于视频帧的界面区域。
在牌类游戏的场景中,动作标签即为每张牌或其他按键所对应的逻辑动作,例如:杀、闪,且“闪”存在与“杀”的逻辑联系;而训练内容则是该牌局过程中的出牌顺序,即出牌的逻辑。
在一种可能的场景中,如图4所示,是本申请实施例提供的一种模型训练的场景示意图,图中示出了一种牌类游戏的对抗场景,图中包括确定键A1、功能牌A2、人物血量A3、装备牌A4、技能牌A5,即为不同的特征区域,每个特征区域对应着其代表的逻辑过程,其中确定键A1指示的动作标签为确认出牌,功能牌A2指示的动作标签为该卡牌内容所指示的交互逻辑,人物血量A3的动作标签为卡牌游戏过程的结束判断标识,即归零则牌局结束;装备牌A4的动作标签为对战用户之间的交互参数;技能牌A5的动作标签为附加的卡牌交互逻辑。
在一种可能的实现方式中,对于特征区域的确定过程可以是基于预先的设定,即在视频帧中标记相关的特征区域,在训练过程中直接统计标记的相关情况即可;另外,对于特征区域的确定过程还可以基于图像特征进行识别,例如:识别视频帧中图像尺寸为10*10的图像特征,并确定对应的特征区域;或识别图像中的特征字符,例如:杀、闪等字符,并基于该字符确定一定的范围为特征区域,具体的确定方法因实际场景而定。
303、统计特征区域的触发情况,以生成特征向量。
本实施例中,特征区域的触发情况即特征区域对应牌的出牌情况,触发情况可以通过不同的触发方式实现,例如是由滑动操作触发,也开始由点击操作触发,还可以是由声控的操作方式触发,触发方式因实际场景而定,此处不做限定。
在一种可能的实现方式中,在触发过程中,可以根据触发操作点与相关特征区域的中心点的距离确定,如图5所示,是本申请实施例提供的一种触发操作的场景示意图,图中示出了触发操作点与特征区域的中心点的距离B1,可以通过对于B1的设定确定特征区域是否被触发,例如:设定触发的阈值为20厘米,若触发操作点距离中心点的距离小于20厘米,则特征区域被触发。
另外,由于牌类游戏中存在多张牌相近的情况,此时可能存在误操作的问题,如图6所示,是本申请实施例提供的另一种触发操作的场景示意图,图中示出了触发操作点距离第一卡牌中心点的距离C1以及触发操作点距离第二卡牌中心点的距离C2,可以通过比较C1与C2的大小判断触发操作对应于哪张牌并进行记录,例如:若C1小于C2,则该触发操作对应于第一卡牌被触发。
本实施例中,对于生成特征向量的过程可以基于下述过进行;由于触发情况对应的出牌情况存在一定的时序性,故可以分别统计每个视频帧中的特征区域的触发情况,以生成特征序列,特征序列为多维二值序列;然后根据特征序列生成特征向量。如图7所示,是本申请实施例提供的一种特征序列生成的场景示意图,图中示出了一个出牌顺序的数据化过程,其中,对于特征序列的每个数位分配不同的含义,例如:对于序列第一位为卡牌“闪”的出牌情况,对于序列第三位为卡牌“无懈可击”的出牌情况,并依次对卡牌中可能出现的所有牌类进行排列并统计相关操作,以得到特征序列。
在一种可能的实现方式中,对于出牌情况的描述可以采用二值化的方式进行,即1代表出牌,而0代表为出牌,例如:在特征区域的触发情况为出“杀”、未出“闪”,出“无懈可击”,则特征序列为:“101”。另外还可以采用数位的表述方式,即将多个位置值作为一组,按照出现1的次序进行特征表述,例如:我方血量一共为4,现有3,则特征序列表述为“0010”,具体的特征表述方式因实际场景而定,此处不做限定。
304、将特征向量输入第一模型进行训练,以得到第二模型。
本实施例中,第一模型用于关联特征向量与动作标签,第二模型用于指示目标卡牌内容与动作标签的对应关系。其中,第一模型和第二模型可以是用于模仿学习的深度学习网络模型。
在一种可能的实现方式中,由于牌类游戏存在一定的出牌时序性,需要对特征向量的时序进行进一步的特征训练,故第一模型的模型架构可以参照图8,图8为本申请实施例提供的一种模型训练的模型架构图,首先确定特征向量的时序特征;然后将特征向量输入第一模型进行参数调整;将时序特征输入参数调整后的第一模型进行训练,以得到第二模型。另外,由于时序特征对应的是出牌的顺序,单一的特征向量可能出现时序特征过于分散的问题,此时,可以按照特征提取间隔对特征向量进行分割,以得到至少两个特征向量集;然后提取至少两个特征向量集中对应于相同特征区域的操作序列,以得到时序特征。例如:按照连续5个帧的特征向量进行特征学习。
在一种场景中,第一模型由全连接层和LSTM层组成,全连接层的目的是根据数据特征中学习深度特征,学会不同维度特征之间的相互关系,而LSTM则是学习卡牌游戏中的时序特征。网络的输入为5帧连续图像对应的特征向量,特征向量首先通过全连接层提取其中的深度特征,再将其输入LSTM中,输出为动作标签,动作标签表征游戏行为。其中,第一模型的目标是输出与玩家尽量一致的游戏行为,即在相同的游戏状态下,AI点击的游戏目标与玩家一致。另外,由于可以操作的目标个数可以为20,即包含18种卡牌、取消和确定按钮;对应的模型的输出为21,即加上不做动作的行为类别,进而进行参数调整,得到第二模型。
结合上述实施例可知,通过获取包括多个视频帧的训练集,并确定视频帧中的特征区域,其中特征区域设有用于指示训练内容的动作标签,特征区域小于视频帧的界面区域;然后统计特征区域的触发情况,以生成特征向量;进而将特征向量输入第一模型进行训练,以得到用于指示目标卡牌内容与动作标签的对应关系的第二模型。从而实现了基于特征区域的模仿学习训练过程,由于特征区域为视频帧对应卡牌图像的一部分,不包含背景部分或其他干扰区域,使得模型训练过程具有针对性,减小了模型训练过程中的数据处理量,提高了模型训练效率以及准确性。
在上述实施例中确定特征区域的过程还可以基于检测模型训练的方法进行,下面,结合牌类游戏应用对该场景进行介绍,请参阅图9,图9为本申请实施例提供的另一种模型训练的方法的流程图,本申请实施例至少包括以下步骤:
901、获取训练集。
本实施例中,步骤901与图3所述实施例中的步骤301相似,相关特征可以进行参考,此处不做赘述。
902、确定训练集中训练内容在视频帧中的位置信息。
本实施例中,训练内容即为当前视频帧中卡牌内容的集合,训练内容在视频帧中的位置信息可以是训练内容对应卡牌的中心点坐标,也可以是训练内容对应卡牌的角点坐标,还可以是中心点与角点结合的坐标位置。
在一种可能的实现方式中,由于卡牌或相关功能按钮之间存在不同的类别,而同一类别内的卡牌往往具有相似的动作标签,故可以首先确定训练内容对应的形状信息;然后筛除形状信息相似的训练内容,以对训练内容进行更新。例如:在一种卡牌游戏中,可以定义28个类别,包含18种卡牌、4种装备、4种血量、确定、取消。另外,由于不同类别之间的样本个数差异较大,可以通过重新采样样本,增加样本较少的类别在一轮迭代中出现的次数,确保每一类样本的个数超过50个。
903、根据位置信息与训练内容的对应关系对第三模型进行训练,以得到第四模型。
本实施例中,第三模型和第四模型可以采用yoloV3模型,yoloV3网络可以分为特征提取部分和目标检测部分。其中,为了防止模型过拟合,特征提取部分采用预先在大数据集ImageNet训练的darknet53网络。目标检测部分基于三种尺度的卷积特征预测目标位置,在输入图像大小为416*416像素的情况下,yoloV3网络将图像划分成13*13,26*26和52*52的网格,分别检测大尺度、中等尺度、小尺度的目标,在一种可能的场景中,大尺度目标对应的游戏中的卡牌目标,中等尺度的目标对应是游戏中的确认和取消按钮,小尺度的目标对应的是游戏中的血量和装备。
在一种可能的实现方式中,经过上述按照尺寸信息进行分类后,训练yoloV3模型时,可以融合类别交叉熵和目标位置的损失作为网络的损失函数,通过梯度后向传递的方式优化模型参数。具体的,以卡牌的位置预测为例,yoloV3首先通过特征提取部分提取特征,提取基于13*13网格大小的卷积特征预测卡牌的类别和位置,网络输出类别的概率和位置的左上角x、y坐标以及宽度和高度,损失函数融合了类别交叉熵损失和目标位置的损失。进而通过最小化该损失函数,计算对应的梯度,采用梯度后向传递的方法更新模型参数,以得到第四模型。
904、将视频帧输入第四模型,以得到特征区域。
本实施例中,通过上述步骤903中对应第三模型的训练过程,使得第四模型具有特征区域的识别功能,从而保证了特征区域检测的准确性,即卡牌或相关功能按钮检测的准确性。
905、统计特征区域的触发情况,以生成特征向量。
906、将特征向量输入第一模型进行训练,以得到第二模型。
本实施例中,步骤905和906与图3所述实施例中的步骤303和304相似,相关特征可以进行参考,此处不做赘述。
结合上述实施例可见,通过对于第三模型的训练,提高了特征区域确定的准确性,保证了相关卡牌或功能按键的精确识别过程,进一步的提高了对于后续模仿学习过程中干扰项的去除,提高了模型训练的效率以及准确性。
上述实施例介绍了模型训练的过程,下面,结合游戏应用作为具体场景进行模型应用方面的介绍,请参阅图10,图10为本申请实施例提供的一种内容生成的方法的流程图,本申请实施例至少包括以下步骤:
1001、获取目标对象输出的第一内容。
本实施例中,目标对象可以是某一特定程序中的用户群体,例如:英雄杀这款游戏数据库中的游戏用户;目标对象还可以是单一的用户,此时主要应用于单机游戏的内容自动生成的过程中,即终端设备通过采集该用户的本端操作数据,对相应游戏的AI进行模型训练,从而实现匹配与该用户的内容智能生成过程;目标对象还可以是终端设备,即本申请提出的内容生成方法应用于对抗学习的过程中,通过多个终端设备之间大规模的对战,获取大量的对战数据,以便于丰富数据库的内容。
另外,以目标对象为英雄杀这款游戏数据库中的游戏用户为例,第一内容可以是游戏用户的历史对战数据,选取其中任意时长连续的视频作为训练内容。
1002、将第一内容输入第二模型,以得到动作标签。
本实施例中,对于第二模型的生成过程可以参照上述图3和图9所述的实施例,此处不做赘述。
在一种可能的场景中,第一内容即为用户当前的操作,通过输入第二模型,自动生成对应的AI侧操作,例如:玩家出“杀”,则AI输出“闪”,可以理解的是,在实际场景中,该对应的过程为多步骤的出牌过程,此处仅为示例。
1003、根据动作标签生成第二内容。
本实施例中,第二内容即为基于AI输出的卡牌生成的出牌策略,第二内容的内容类型因实际场景而定,此处不做限定。
通过上述实施例可知,通过对于第二模型的训练,使得AI可以适应性的生成相关用户的目标卡牌内容,提高了内容生成的效率以及准确性,进而提高了用户在内容交互过程中的用户体验。
为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关装置。请参阅图11,图11为本申请实施例提供的一种模型训练装置的结构示意图,模型训练装置1100包括:
获取单元1101,用于基于多张卡牌内容的交互过程获取训练集,所述训练集包括多个视频帧,所述视频帧包括所述卡牌内容的触发区域;
确定单元1102,用于根据所述卡牌内容的触发区域确定所述视频帧中的特征区域,所述特征区域设有用于指示训练内容的动作标签,所述动作标签基于所述卡牌内容确定,所述特征区域小于所述视频帧的界面区域;
统计单元1103,用于统计所述特征区域的触发情况,以生成特征向量;
训练单元1104,用于将所述特征向量输入第一模型进行训练,以得到第二模型,所述第一模型用于关联所述特征向量与所述动作标签,所述第二模型用于指示目标卡牌内容与所述动作标签的对应关系。
在本申请一些可能的实现方式中,所述确定单元1102,具体用于根据所述卡牌内容的触发区域确定所述训练内容在所述视频帧中的位置信息;
所述确定单元1102,具体用于根据所述位置信息与所述训练内容的对应关系对第三模型进行训练,以得到第四模型,所述第三模型用于关联所述位置信息与所述训练内容,所述第四模型用于指示所述视频帧与所述特征区域的对应关系;
所述确定单元1102,具体用于将所述视频帧输入所述第四模型,以得到所述特征区域。
在本申请一些可能的实现方式中,所述确定单元1102,还用于确定所述训练内容对应的形状信息;
所述确定单元1102,还用于筛除所述形状信息相似的训练内容,以对所述训练内容进行更新。
在本申请一些可能的实现方式中,所述确定单元1102,具体用于根据所述形状信息对所述训练内容进行分类,以得到至少两个类别训练内容;
所述确定单元1102,具体用于根据所述至少两个类别训练内容确定第一梯度信息;
所述确定单元1102,具体用于根据所述位置信息确定第二梯度信息;
所述确定单元1102,具体用于通过最小化所述第一梯度信息和所述第二梯度信息的方式对所述第三模型的参数进行训练,以得到第四模型。
在本申请一些可能的实现方式中,所述统计单元1103,具体用于分别统计每个视频帧中的所述特征区域的触发情况,以生成特征序列,所述特征序列为多维二值序列;
所述统计单元1103,具体用于根据所述特征序列生成所述特征向量。
在本申请一些可能的实现方式中,所述统计单元1103,具体用于获取所述特征区域的中心点;
所述统计单元1103,具体用于根据所述中心点与触发操作点的距离确定所述触发情况,以生成所述特征序列。
在本申请一些可能的实现方式中,所述训练单元1104,具体用于确定所述特征向量的时序特征;
所述训练单元1104,具体用于将所述特征向量输入所述第一模型进行参数调整;
所述训练单元1104,具体用于将所述时序特征输入参数调整后的所述第一模型进行训练,以得到所述第二模型。
在本申请一些可能的实现方式中,所述训练单元1104,具体用于按照特征提取间隔对所述特征向量进行分割,以得到至少两个特征向量集;
所述训练单元1104,具体用于提取所述至少两个特征向量集中对应于相同特征区域的操作序列,以得到所述时序特征。
在本申请一些可能的实现方式中,所述获取单元1101,具体用于获取多张卡牌内容的交互过程的等级信息,所述等级信息用于指示所述交互过程的复杂度;
所述获取单元1101,具体用于根据所述等级信息提取第一牌局和第二牌局,所述第一牌局与所述第二牌局的等级信息不同;
所述获取单元1101,具体用于根据所述第一牌局与所述第二牌局确定所述训练集。
通过获取包括多个视频帧的训练集,并确定视频帧中的特征区域,其中特征区域设有用于指示训练内容的动作标签,特征区域小于视频帧的界面区域;然后统计特征区域的触发情况,以生成特征向量;进而将特征向量输入第一模型进行训练,以得到用于指示目标卡牌内容与动作标签的对应关系的第二模型。从而实现了基于特征区域的模仿学习训练过程,由于特征区域为视频帧对应图像的一部分,不包含背景部分或其他干扰区域,使得模型训练过程具有针对性,减小了模型训练过程中的数据处理量,提高了模型训练效率以及准确性。
本申请第三方面提供一种内容生成的装置1200,请参阅图12,图12是本申请实施例提供的服务器一种结构示意图,包括:获取单元1201,用于获取目标对象输出的第一内容;
输入单元1202,用于将所述第一内容输入第二模型,以得到动作标签,所述第二模型基于上述第一方面或第一方面任一项所述的模型训练的方法训练所得;
生成单元1203,用于根据所述动作标签生成第二内容。
本申请实施例还提供了一种服务器,请参阅图13,图13是本申请实施例提供的一种服务器的结构示意图,该服务器1300可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1322(例如,一个或一个以上处理器)和存储器1332,一个或一个以上存储应用程序1342或数据1344的存储介质1330(例如一个或一个以上海量存储设备)。其中,存储器1332和存储介质1330可以是短暂存储或持久存储。存储在存储介质1330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1322可以设置为与存储介质1330通信,在服务器1300上执行存储介质1330中的一系列指令操作。
服务器1300还可以包括一个或一个以上电源1326,一个或一个以上有线或无线网络接口1350,一个或一个以上输入输出接口1358,和/或,一个或一个以上操作系统1341,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由模型训练装置所执行的步骤可以基于该图13所示的服务器结构。
本申请实施例还提供了一种终端设备,如图14所示,是本申请实施例提供的一种终端设备的结构示意图,为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请实施例方法部分。该终端可以为包括手机、平板电脑、个人数字助理(personal digital assistant,PDA)、销售终端(point of sales,POS)、车载电脑等任意终端设备,以终端为手机为例:
图14示出的是与本申请实施例提供的终端相关的手机的部分结构的框图。参考图14,手机包括:射频(radio frequency,RF)电路1410、存储器1420、输入单元1430、显示单元1440、传感器1450、音频电路1460、无线保真(wireless fidelity,WiFi)模块1470、处理器1480、以及电源1490等部件。输入单元1430可包括触控面板1431以及其他输入设备1432,显示单元1440可包括显示面板1441,音频电路1460可以包括扬声器1461和传声器1462。本领域技术人员可以理解,图14中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
处理器1480是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器1420内的软件程序和/或模块,以及调用存储在存储器1420内的 数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器1480可包括一个或多个处理单元;可选的,处理器1480可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1480中。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
在本申请实施例中,该终端设备所包括的处理器1480还具有执行如上述页面处理方法的各个步骤的功能。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有模型训练指令,当其在计算机上运行时,使得计算机执行如前述图2至图10所示实施例描述的方法中模型训练装置所执行的步骤。
本申请实施例中还提供一种包括模型训练指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图2至图10所示实施例描述的方法中模型训练装置所执行的步骤。
本申请实施例还提供了一种模型训练系统,所述模型训练系统可以包含图11所描述实施例中的模型训练装置,或者图12所描述的内容生成装置。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可 以是个人计算机,模型训练装置,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (16)

  1. 一种模型训练的方法,所述方法由终端设备执行,包括:
    基于多张卡牌内容的交互过程获取训练集,所述训练集包括多个视频帧,所述视频帧包括所述卡牌内容的触发区域;
    根据所述卡牌内容的触发区域确定所述视频帧中的特征区域,所述特征区域设有用于指示训练内容的动作标签,所述动作标签基于所述卡牌内容确定,所述特征区域小于所述视频帧的界面区域;
    统计所述特征区域的触发情况,以生成特征向量;
    将所述特征向量输入第一模型进行训练,以得到第二模型,所述第一模型用于关联所述特征向量与所述动作标签,所述第二模型用于指示目标卡牌内容与所述动作标签的对应关系。
  2. 根据权利要求1所述的方法,所述根据所述卡牌内容的触发区域确定所述视频帧中的特征区域,包括:
    根据所述卡牌内容的触发区域确定所述训练内容在所述视频帧中的位置信息;
    根据所述位置信息与所述训练内容的对应关系对第三模型进行训练,以得到第四模型,所述第三模型用于关联所述位置信息与所述训练内容,所述第四模型用于指示所述视频帧与所述特征区域的对应关系;
    将所述视频帧输入所述第四模型,以得到所述特征区域。
  3. 根据权利要求2所述的方法,所述根据所述卡牌内容的触发区域确定所述训练内容在所述视频帧中的位置信息之前,所述方法还包括:
    确定所述训练内容对应的形状信息;
    筛除所述形状信息相似的训练内容,以对所述训练内容进行更新。
  4. 根据权利要求3所述的方法,所述根据所述位置信息与所述训练内容的对应关系对第三模型进行训练,以得到第四模型,包括:
    根据所述形状信息对所述训练内容进行分类,以得到至少两个类别训练内容;
    根据所述至少两个类别训练内容确定第一梯度信息;
    根据所述位置信息确定第二梯度信息;
    通过最小化所述第一梯度信息和所述第二梯度信息的方式对所述第三模型的参数进行训练,以得到所述第四模型。
  5. 根据权利要求1所述的方法,所述统计所述特征区域的触发情况,以生成特征向量,包括:
    分别统计每个视频帧中的所述特征区域的触发情况,以生成特征序列,所述特征序列为多维二值序列;
    根据所述特征序列生成所述特征向量。
  6. 根据权利要求5所述的方法,所述分别统计每个视频帧中的所述特征区域的触发情况,以生成特征序列,包括:
    获取所述特征区域的中心点;
    根据所述中心点与触发操作点的距离确定所述触发情况,以生成所述特征序列。
  7. 根据权利要求1所述的方法,所述将所述特征向量输入第一模型进行训练,以得到第二模型,包括:
    确定所述特征向量的时序特征;
    将所述特征向量输入所述第一模型进行参数调整;
    将所述时序特征输入参数调整后的所述第一模型进行训练,以得到所述第二模型。
  8. 根据权利要求7所述的方法,所述确定所述特征向量的时序特征,包括:
    按照特征提取间隔对所述特征向量进行分割,以得到至少两个特征向量集;
    提取所述至少两个特征向量集中对应于相同特征区域的操作序列,以得到所述时序特征。
  9. 根据权利要求1-8任一项所述的方法,所述基于多张卡牌内容的交互过程获取训练集,包括:
    获取所述多张卡牌内容的交互过程的等级信息,所述等级信息用于指示所述交互过程的复杂度;
    根据所述等级信息提取第一牌局和第二牌局,所述第一牌局与所述第二牌局的等级信息不同;
    根据所述第一牌局与所述第二牌局确定所述训练集。
  10. 根据权利要求1所述的方法,所述训练内容和所述目标卡牌内容为卡牌游戏的内容,所述第一模型和所述第二模型为深度网络模型。
  11. 一种内容生成的方法,所述方法由终端设备执行,包括:
    获取目标对象输出的第一内容;
    将所述第一内容输入第二模型,以得到动作标签,所述第二模型基于权利要求1-10任一项所述的模型训练方法训练所得;
    根据所述动作标签生成第二内容。
  12. 一种模型训练的装置,所述装置部署于终端设备,包括:
    获取单元,用于基于多张卡牌内容的交互过程获取训练集,所述训练集包括多个视频帧,所述视频帧包括所述卡牌内容的触发区域;
    确定单元,用于根据所述卡牌内容的触发区域确定所述视频帧中的特征区域,所述特征区域设有用于指示训练内容的动作标签,所述动作标签基于所述卡牌内容确定,所述特征区域小于所述视频帧的界面区域;
    统计单元,用于统计所述特征区域的触发情况,以生成特征向量;
    训练单元,用于将所述特征向量输入第一模型进行训练,以得到第二模型,所述第一模型用于关联所述特征向量与所述动作标签,所述第二模型用于指示目标卡牌内容与所述动作标签的对应关系。
  13. 一种内容生成的装置,所述装置部署于终端设备,包括:
    获取单元,用于获取目标对象输出的第一内容;
    输入单元,用于将所述第一内容输入第二模型,以得到动作标签,所述第二模型基于权利要求1-10任一项所述的模型训练方法训练所得;
    生成单元,用于根据所述动作标签生成第二内容。
  14. 一种计算机设备,所述计算机设备包括处理器以及存储器:
    所述存储器用于存储程序代码;所述处理器用于根据所述程序代码中的指令执行权利要求1至10任一项所述的模型训练的方法,或权利要求11所述的内容生成的方法。
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述权利要求1至10任一项所述的模型训练的方法,或权利要求11所述的内容生成的方法。
  16. 一种计算机程序产品,当所述计算机程序产品被执行时,用于执行权利要求1至10任一项所述的模型训练的方法,或权利要求11所述的内容生成的方法。
PCT/CN2020/128245 2020-01-21 2020-11-12 一种模型训练方法、内容生成方法以及相关装置 WO2021147473A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20915388.1A EP4005652A4 (en) 2020-01-21 2020-11-12 MODEL LEARNING METHOD, CONTENT GENERATION METHOD AND ASSOCIATED DEVICES
US17/585,677 US20220148295A1 (en) 2020-01-21 2022-01-27 Model training method, content generation method, and related apparatuses

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010073390.0 2020-01-21
CN202010073390.0A CN111265881B (zh) 2020-01-21 2020-01-21 一种模型训练方法、内容生成方法以及相关装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/585,677 Continuation US20220148295A1 (en) 2020-01-21 2022-01-27 Model training method, content generation method, and related apparatuses

Publications (1)

Publication Number Publication Date
WO2021147473A1 true WO2021147473A1 (zh) 2021-07-29

Family

ID=70992254

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/128245 WO2021147473A1 (zh) 2020-01-21 2020-11-12 一种模型训练方法、内容生成方法以及相关装置

Country Status (4)

Country Link
US (1) US20220148295A1 (zh)
EP (1) EP4005652A4 (zh)
CN (1) CN111265881B (zh)
WO (1) WO2021147473A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111265881B (zh) * 2020-01-21 2021-06-22 腾讯科技(深圳)有限公司 一种模型训练方法、内容生成方法以及相关装置
CN113254654B (zh) * 2021-07-05 2021-09-21 北京世纪好未来教育科技有限公司 模型训练、文本识别方法、装置、设备和介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109107161A (zh) * 2018-08-17 2019-01-01 深圳市腾讯网络信息技术有限公司 一种游戏对象的控制方法、装置、介质以及设备
CN110170171A (zh) * 2019-06-03 2019-08-27 深圳市腾讯网域计算机网络有限公司 一种目标对象的控制方法及装置
CN110251942A (zh) * 2019-06-04 2019-09-20 腾讯科技(成都)有限公司 控制游戏场景中虚拟角色的方法及装置
CN110339569A (zh) * 2019-07-08 2019-10-18 深圳市腾讯网域计算机网络有限公司 控制游戏场景中虚拟角色的方法及装置
US20190321727A1 (en) * 2018-04-02 2019-10-24 Google Llc Temporary Game Control by User Simulation Following Loss of Active Control
CN111265881A (zh) * 2020-01-21 2020-06-12 腾讯科技(深圳)有限公司 一种模型训练方法、内容生成方法以及相关装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109634696A (zh) * 2017-10-09 2019-04-16 华为技术有限公司 一种显示多个内容卡片的方法及终端设备
US11594028B2 (en) * 2018-05-18 2023-02-28 Stats Llc Video processing for enabling sports highlights generation
CN110162454B (zh) * 2018-11-30 2022-02-08 腾讯科技(深圳)有限公司 游戏运行方法和装置、存储介质及电子装置
CN109364490A (zh) * 2018-12-05 2019-02-22 网易(杭州)网络有限公司 卡牌游戏测试方法、装置及存储介质
CN110119815B (zh) * 2019-05-21 2021-08-13 深圳市腾讯网域计算机网络有限公司 模型训练方法、装置、存储介质及设备
CN110598853B (zh) * 2019-09-11 2022-03-15 腾讯科技(深圳)有限公司 一种模型训练的方法、信息处理的方法以及相关装置
CN111013149A (zh) * 2019-10-23 2020-04-17 浙江工商大学 一种基于神经网络深度学习的卡牌设计生成方法及系统
CN112084920B (zh) * 2020-08-31 2022-05-03 北京字节跳动网络技术有限公司 提取热词的方法、装置、电子设备及介质
US20230186634A1 (en) * 2021-12-14 2023-06-15 The Hong Kong University Of Science And Technology Vision-based monitoring of site safety compliance based on worker re-identification and personal protective equipment classification

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190321727A1 (en) * 2018-04-02 2019-10-24 Google Llc Temporary Game Control by User Simulation Following Loss of Active Control
CN109107161A (zh) * 2018-08-17 2019-01-01 深圳市腾讯网络信息技术有限公司 一种游戏对象的控制方法、装置、介质以及设备
CN110170171A (zh) * 2019-06-03 2019-08-27 深圳市腾讯网域计算机网络有限公司 一种目标对象的控制方法及装置
CN110251942A (zh) * 2019-06-04 2019-09-20 腾讯科技(成都)有限公司 控制游戏场景中虚拟角色的方法及装置
CN110339569A (zh) * 2019-07-08 2019-10-18 深圳市腾讯网域计算机网络有限公司 控制游戏场景中虚拟角色的方法及装置
CN111265881A (zh) * 2020-01-21 2020-06-12 腾讯科技(深圳)有限公司 一种模型训练方法、内容生成方法以及相关装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4005652A4

Also Published As

Publication number Publication date
EP4005652A1 (en) 2022-06-01
CN111265881A (zh) 2020-06-12
CN111265881B (zh) 2021-06-22
EP4005652A4 (en) 2022-10-26
US20220148295A1 (en) 2022-05-12

Similar Documents

Publication Publication Date Title
US9690982B2 (en) Identifying gestures or movements using a feature matrix that was compressed/collapsed using principal joint variable analysis and thresholds
CN108090561B (zh) 存储介质、电子装置、游戏操作的执行方法和装置
CN110339569B (zh) 控制游戏场景中虚拟角色的方法及装置
CN106028134A (zh) 针对移动计算设备检测体育视频精彩部分
CN107633207A (zh) Au特征识别方法、装置及存储介质
US11551479B2 (en) Motion behavior pattern classification method, system and device
Hu et al. Robust background subtraction with shadow and highlight removal for indoor surveillance
CN106462725A (zh) 监测游戏场所的活动的系统和方法
CN112069929A (zh) 一种无监督行人重识别方法、装置、电子设备及存储介质
WO2021147473A1 (zh) 一种模型训练方法、内容生成方法以及相关装置
CN113422977B (zh) 直播方法、装置、计算机设备以及存储介质
US11819734B2 (en) Video-based motion counting and analysis systems and methods for virtual fitness application
CN107633205A (zh) 嘴唇动作分析方法、装置及存储介质
CN112827168B (zh) 一种目标跟踪的方法、装置及存储介质
CN112560723A (zh) 一种基于形态识别与速度估计的跌倒检测方法及系统
CN107944381B (zh) 人脸跟踪方法、装置、终端及存储介质
CN106471440A (zh) 基于高效森林感测的眼睛跟踪
CN112232258A (zh) 一种信息处理方法、装置及计算机可读存储介质
CN113822254B (zh) 一种模型训练方法及相关装置
CN112052771A (zh) 一种对象重识别方法及装置
CN109145876A (zh) 图像分类方法、装置、电子设备及存储介质
US11450010B2 (en) Repetition counting and classification of movements systems and methods
CN112580472A (zh) 一种快速轻量的人脸识别方法、装置、机器可读介质及设备
WO2021081768A1 (zh) 界面切换方法、装置、可穿戴电子设备及存储介质
CN113326829B (zh) 视频中手势的识别方法、装置、可读存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915388

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020915388

Country of ref document: EP

Effective date: 20220223

NENP Non-entry into the national phase

Ref country code: DE