CN112070132A - Sample data construction method, device, equipment and medium - Google Patents

Sample data construction method, device, equipment and medium Download PDF

Info

Publication number
CN112070132A
CN112070132A CN202010865395.7A CN202010865395A CN112070132A CN 112070132 A CN112070132 A CN 112070132A CN 202010865395 A CN202010865395 A CN 202010865395A CN 112070132 A CN112070132 A CN 112070132A
Authority
CN
China
Prior art keywords
behavior data
behavior
combined
data
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010865395.7A
Other languages
Chinese (zh)
Inventor
王业君
云朋
汪明伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Zhilian Beijing Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010865395.7A priority Critical patent/CN112070132A/en
Publication of CN112070132A publication Critical patent/CN112070132A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The embodiment of the application discloses a sample data construction method, a device, equipment and a medium, and relates to the technical field of machine learning. The sample data construction method comprises the following steps: monitoring the running state of the detection object under the target operation type to obtain at least two target running behavior data corresponding to the target operation type; combining and arranging the target operation behavior data, and carrying out sample marking on a combined and arranged result to obtain sample data; wherein the sample data is used for training an anomaly detection model for the detection object. The embodiment of the application can achieve the effect of rapidly constructing rich training sample data.

Description

Sample data construction method, device, equipment and medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for constructing sample data.
Background
With the development of artificial intelligence technology, it is increasingly common to detect the operating state of equipment by machine learning. The sufficiency of the training sample library is an important precondition for effectively implementing machine learning.
Taking an automobile as an example, implementing abnormality detection for an automobile through an artificial intelligence technology is becoming one direction of efforts of automobile safety researchers. However, since there are relatively few car security attack events actually disclosed or implemented at present, that is, there are few behavior monitoring data of the vehicle in an abnormal operation state, the training sample inventory required by the machine learning process is missing. The lack of the training sample library makes machine learning unable to exert due value in the abnormal detection of the automobile running state.
Disclosure of Invention
The application provides a sample data construction method, a device, equipment and a medium, so as to realize rapid construction of rich training sample data.
According to an aspect of the embodiments of the present application, there is provided a sample data construction method, including:
monitoring the running state of a detection object under a target operation type to obtain at least two target running behavior data corresponding to the target operation type;
performing combined arrangement on the target operation behavior data, and performing sample marking on a combined arrangement result to obtain sample data; wherein the sample data is used to train an anomaly detection model for the detection object.
According to another aspect of the embodiments of the present application, there is provided a sample data constructing apparatus, including:
the operation monitoring module is used for monitoring the operation state of a detection object under a target operation type to obtain at least two target operation behavior data corresponding to the target operation type;
the sample data determining module is used for carrying out combined arrangement on the target operation behavior data and carrying out sample marking on a combined arrangement result to obtain sample data; wherein the sample data is used to train an anomaly detection model for the detection object.
According to another aspect of embodiments of the present application, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of specimen data construction as described in any of the embodiments of the present application.
According to another aspect of embodiments of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the sample data construction method according to any one of the embodiments of the present application.
According to the technical scheme of the embodiment of the application, the running state of the detection object is monitored, the monitored target running behavior data corresponding to the target operation type is combined and arranged, and the combined arrangement result is subjected to sample marking to obtain sample data, so that the effect of quickly constructing abundant training sample data is achieved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a flow chart of a sample data construction method disclosed in an embodiment of the present application;
FIG. 2 is a flow chart of another sample data construction method disclosed in an embodiment of the present application;
FIG. 3 is a schematic flow chart of vehicle anomaly detection according to the disclosed embodiments of the present application;
fig. 4 is a schematic structural diagram of a sample data constructing apparatus according to an embodiment of the present application;
fig. 5 is a block diagram of an electronic device disclosed according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a sample data construction method disclosed in an embodiment of the present application, which may be applied to a case of quickly constructing sample data for model training. The method disclosed by the embodiment of the application can be executed by a sample data construction device, and the device can be realized by adopting software and/or hardware and can be integrated on any electronic equipment with computing capability.
As shown in fig. 1, the sample data construction method disclosed in the embodiment of the present application may include:
s101, monitoring the running state of the detection object under the target operation type to obtain at least two target running behavior data corresponding to the target operation type.
The detection object can be any mechanical device or electronic device, the target operation type is related to the running function of the detection object, and the target operation type can be any operation type in the running life cycle of the detection object. For example, the detection object may include, but is not limited to, a vehicle, and the target operation type may include, but is not limited to, operations related to vehicle operation, such as vehicle starting, door opening, air conditioning, brake braking, and the like.
By monitoring the running state of the detection object, at least two target running behavior data of the detection object under the target operation type can be monitored and obtained, and the at least two target running behavior data are used for describing the implementation process of the target operation from the perspective of the detection object. In the target operation type, the amount of the target operation behavior data may be determined according to the actual operation of the detection object, and the embodiment of the present application is not particularly limited.
Illustratively, the running log can be obtained by monitoring the running state of the detection object; and then identifying fields used for representing the operation behavior data in the operation log to obtain at least two target operation behavior data corresponding to the target operation type. In addition, at least two target operation behavior data corresponding to the target operation type can be obtained by analyzing a preset data structure for recording the target operation implementation process. Taking the detection object as a vehicle and the target operation type as an open door as an example, the target operation behavior data of the detection object may include: 1) receiving a door opening instruction sent by a user side, 2) analyzing the door opening instruction, and 3) controlling the opening of the vehicle door.
Optionally, the monitoring the running state of the detection object in the target operation type to obtain at least two target running behavior data corresponding to the target operation type includes:
and segmenting the running process of the detection object according to the target operation type, and monitoring the running state in a segmented manner to obtain at least two target running behavior data corresponding to the target operation type. The method comprises the steps of performing segmentation on an implementation process of target operation according to a specific target operation type, performing segmentation according to interface function calling conditions generated when the implementation process of the target operation occurs, wherein interface functions which are not important can not participate in the process segmentation, setting corresponding information acquisition output points on each segmentation point, outputting corresponding operation information when the operation state of a detection object reaches each segmentation point, and finally integrating the output operation information to obtain at least two target operation behavior data corresponding to the target operation type. By segmenting the running process of the detection object, fine-grained monitoring of the running process of the detection object is achieved, and the effect of accurately and efficiently determining the target running behavior data of the detection object under the target operation type is achieved.
S102, performing combined arrangement on the target operation behavior data, and performing sample marking on a combined arrangement result to obtain sample data; wherein the sample data is used for training an anomaly detection model for the detection object.
After the target operation behavior data of the detected object in the target operation type is obtained, the target operation behavior data may be combined and arranged, and specifically, a combined behavior set having a different operation order from the monitored target operation behavior data may be obtained by using a full-arrangement technique. Each combined behavior set comprises at least one target operation behavior data. The running sequence between the running behavior data in each combined behavior set may be a normal running state (i.e. a standard running state conforming to the conventional execution logic) corresponding to the target operation type, or may be an abnormal running state. No matter whether the operation state corresponds to the normal operation state or the abnormal operation state, a combined behavior set obtained by combining and arranging the target operation behavior data is used as sample data in the embodiment of the application, and the sample data can be subsequently identified and classified to mark, so that the method is used in the training process of the abnormal detection model. The classification marking of the sample data comprises manual marking and automatic marking according to a normal operation sequence among the operation behavior data. It should be noted that there may be one or at least two target operation types, and the sample data determined as the target operation type according to the target running behavior data corresponding to the target operation type is used to train the anomaly detection mode for the target operation type of the detection object.
Still taking the above-mentioned door opening as an example, the target operation behavior data of the detected object obtained by monitoring may include: 1) receiving a door opening instruction sent by a user side, 2) analyzing the door opening instruction, and 3) controlling the opening of the vehicle door, and performing combined arrangement based on the 3 target operation behavior data, wherein a full arrangement technology can be specifically adopted, 15 combined arrangement results can be obtained at most, and the operation sequence among the operation behavior data is considered. For example, a combination behavior set may include 1 operation behavior data element — an opening door command sent by a receiving user end, a combination behavior set may include 2 operation behavior data elements — an opening door command sent by a receiving user end and an analysis of the opening door command, a combination behavior set may include 3 behavior elements — a control door opening, an opening door command sent by a receiving user end and an analysis of the opening door command, and a specific combination arrangement result is not limited to the foregoing example.
On the basis of the technical scheme, further, the at least two target running behavior data corresponding to the target operation type comprise running behavior data of the detection object in a normal running state; the sample data comprises operation behavior data of the detection object in an abnormal operation state, and can also comprise operation behavior data of the detection object in a normal operation state, wherein the different operation sequences of at least two operation behavior data can correspond to different operation states. That is, in the embodiment of the present application, the operation behavior data of the detected object in the abnormal operation state can be obtained by expanding the operation behavior data of the detected object in the normal operation state obtained by monitoring through combination arrangement of the operation behavior data of the detected object in the normal operation state, so that the utilization mode of the operation behavior data of the detected object in the normal operation state is expanded, and the effect of conveniently constructing the abnormal detection sample data for the detected object is realized.
Of course, if the at least two target operation behavior data corresponding to the target operation type include operation behavior data of the detection object in the abnormal operation state, the sample data obtained by the combined arrangement of the operation behavior data may include operation behavior data of the detection object in the abnormal operation state, and may further include operation behavior data of the detection object in the normal operation state, which is related to the operation sequence between the operation behavior data after the combined arrangement.
According to the technical scheme of the embodiment of the application, the running state of the detection object is monitored, the monitored target running behavior data corresponding to the target operation type is combined and arranged, the combined arrangement result is subjected to sample marking to obtain sample data, the effect of quickly constructing rich training sample data is achieved, the problem that the sample data which can be used when the detection data of the detection object is singly depended on in the abnormal detection process based on machine learning is limited is solved, especially when the abnormal running state of the detection object is few in occurrence, the problem that the sample data which can be used in the abnormal running state is poor is solved, and a rich sample data base is laid for the abnormal detection based on machine learning.
Fig. 2 is a flowchart of another sample data construction method disclosed in an embodiment of the present application, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments. As shown in fig. 2, the method may include:
s201, monitoring the running state of the detection object under the target operation type to obtain at least two target running behavior data corresponding to the target operation type.
S202, combining and arranging the target operation behavior data to obtain a preset number of combined behavior sequences; and the position arrangement sequence of the operation behavior data in the combined behavior sequence represents the operation sequence of the operation behavior data in the combined behavior sequence.
Each combined behavior sequence may include at least one target operation behavior data, and the number of the combined operation behavior data sequences, that is, the value of the preset number, is related to the number of the monitored target operation behavior data. Preferably, the monitored target operation behavior data may be fully arranged to obtain a plurality of combined behavior sequences, and assuming that the number of the target operation behavior data is n, the number of the combined behavior sequences may be expressed as follows:
Figure BDA0002649571950000061
where i represents the amount of operational behavior data that can be included in each combined sequence of behaviors.
In order to facilitate the subsequent execution of the sample marking operation, in the combined behavior sequence obtained by combining the operation behavior data, the position arrangement sequence of the operation behavior data represents the operation sequence among the operation behavior data.
S203, according to a preset running sequence between adjacent running behavior data under the target operation type, carrying out sample marking on the combined behavior sequence to obtain sample data; wherein the sample data comprises positive and negative samples.
The preset running sequence may refer to a running sequence between adjacent running behavior data when the detection object is in a normal running state in the target operation type. If the number of the operation behavior data included in the combined behavior sequence is at least two, and the operation sequence between any two adjacent operation behavior data meets the preset operation sequence, marking the combined behavior sequence as a positive sample, namely, the combined behavior sequence corresponds to the normal operation state of the detection object; and if the adjacent operation behavior data which do not meet the preset operation sequence exist in the combined behavior sequence, marking the combined behavior sequence as a negative sample, namely the abnormal operation state of the corresponding detection object. If the number of the operation behavior data included in the combined behavior sequence is one, and the operation behavior data belongs to the operation behavior data at the operation head under the target operation type, that is, when the detection object operates to the operation behavior data belonging to the operation head, the combined behavior sequence can be marked as a positive sample corresponding to the initialization stage of the target operation implementation process; if the run behavior data does not belong to the run behavior data at the run head under the target operation type, the combined behavior sequence may be marked as a negative example. By marking positive and negative samples on the combined behavior sequence obtained by combination in the sample data construction process, convenience can be provided for a subsequent training abnormity detection model, the operation of marking sample data again is omitted, and the integrity of the sample data construction process is reflected.
On the basis of the above technical solution, optionally, according to a preset operation sequence between adjacent operation behavior data under a target operation type, performing sample marking on the combined behavior sequence to obtain sample data, including:
if the number of the operation behavior data included in the combined behavior sequence is at least two, and the operation identifications of any two adjacent operation behavior data in the combined behavior sequence satisfy the following relation, marking the combined behavior sequence as a positive sample:
j=k·n+i%n+1;
the operation identification is pre-distributed for the operation behavior data according to a preset operation sequence, operation identification values of adjacent operation behavior data which accord with the preset operation sequence in the operation identification distribution process have continuity, j represents an operation identification of operation behavior data positioned at the back of two adjacent operation behavior data in the combined behavior sequence, i represents an operation identification of operation behavior data positioned at the front of two adjacent operation behavior data in the combined behavior sequence, n represents the total number of the operation behavior data in the combined behavior sequence, i% n represents that i performs modular operation on n, k represents an integer quotient obtained by dividing the minimum operation identification m in the combined behavior sequence by n, and values of i, j, k and n are integers.
Further, according to a preset operation sequence between adjacent operation behavior data under the target operation type, performing sample marking on the combined behavior sequence to obtain sample data, and further comprising:
if the number of the operation behavior data included in the combined behavior sequence is one and the operation identifier of the operation behavior data is a preset identifier, marking the combined behavior sequence as a positive sample; the prediction identifier is used for representing running behavior data at the first running position under the target operation type.
The remaining sequences of combined behavior that do not satisfy the above can be labeled as negative examples.
Taking a detected object as a vehicle and a target operation type as an example, a marking process of the combined behavior sequence is exemplarily described, and under the operation type of a door, target operation behavior data which is obtained by monitoring and accords with a preset operation sequence, that is, target operation behavior data corresponding to a normal operation state of the vehicle, may sequentially include: 1) receiving a door opening instruction E sent by a user side1And 2) analyzing the door opening instruction E2And 3) controlling the opening of the vehicle door E3That is, the 3 target operation behavior data are respectively assigned with the operation identifiers 1, 2 and 3, and the 3 target operation behavior data themselves form a normal behavior sequence { E }1,E2,E3}. Based on the aforementioned 3 target operation behavior data, 15 combination sequencing results can be obtained by full permutation and combination, which are respectively as follows:
1) bag (bag)Case including 1 behavior element: { E1}、{E2}、{E3};
2) Case comprising 2 behavior elements: { E1,E2}、{E1,E3}、{E2,E1}、{E2,E3}、{E3,E1}、{E3,E2};
3) Case comprising 3 behavior elements { E1,E2,E3}、{E1,E3,E2}、{E2,E3,E1}、{E2,E1,E3}、{E3,E1,E2And { E } and3,E2,E1};
the sequence of combined behaviors that can be marked as positive samples in the ranking result includes: { E1}、{E1,E2}、{E2,E3}、{E3,E1}、{E1,E2,E3}、{E2,E3,E1And { E } and3,E1,E2}; the remaining sequence of combined behaviors is marked as negative examples. It should be noted that the execution of the operation behavior data is cyclic, for example, the combination behavior sequence { E }1,E2,E3Can represent a complete operational implementation, but the sequence of combined behaviors E2,E3,E1That is, it belongs to a normal sequence of actions, but only two implementation processes corresponding to the target operation, i.e. E2And E3The last two target operation behavior data belonging to the last implementation, and subsequent E1First target operational behavior data pertaining to a next implementation.
The values 1, 2, and 3 of the operation identifier are only used as an example, and should not be understood as a specific limitation to the embodiment of the present application, and for any operation type, a series of target operation behavior data of the detected object obtained by monitoring in the normal operation state may be represented as { E }n1,En2,En3,……,EniAnd the value of the running identifier ni is related to the quantity of the target running behavior data under the target operation type.
According to the technical scheme of the embodiment of the application, the running state of the detection object is monitored, the monitored target running behavior data corresponding to the target operation type are combined and arranged to obtain a preset number of combined behavior sequences, and finally, the marking is carried out according to the preset running sequence between the adjacent running behavior data under the target operation type, so that the construction of sample data is completed, the effect of quickly constructing rich training sample data is realized, and the problem that the available sample data is limited when the monitoring data of the detection object is singly depended on in the abnormal detection process based on machine learning is solved; and the positive and negative samples are marked on the combined behavior sequence obtained by combination in the sample data construction process, so that the operation of re-marking the sample data is omitted, and the integrity of the sample data construction process is embodied.
Fig. 3 is a schematic flow chart of vehicle abnormality detection according to an embodiment of the present application, which is used for exemplary illustration of the embodiment of the present application and should not be construed as a specific limitation to the embodiment of the present application. As shown in fig. 3, firstly, the type of the car operation to be protected is determined, and then, the normal implementation process of each car operation is segmented and monitored to obtain the normal behavior sequence of the car operation; then, fully arranging the behavior elements in each normal behavior sequence to obtain a plurality of groups of combined behavior sequences; secondly, marking positive and negative samples of each combined behavior sequence according to the marking principle; and finally, model training is carried out by combining the marked positive and negative samples and the operation types to obtain an abnormality detection model for the vehicle, and the trained model is used in the abnormality detection process of the vehicle. The model training algorithm used in the model training process is not specifically limited in the embodiments of the present application, and may be determined according to actual needs, for example, and may include, but is not limited to, a decision tree algorithm, and the like.
Fig. 4 is a schematic structural diagram of a sample data construction apparatus according to an embodiment of the present application, which may be applied to a case of quickly constructing sample data for model training. The device disclosed by the embodiment of the application can be realized by adopting software and/or hardware, and can be integrated on any electronic equipment with computing capability.
As shown in fig. 4, the sample data constructing apparatus 400 disclosed in the embodiment of the present application may include an operation monitoring module 401 and a sample data determining module 402, where:
the operation monitoring module 401 is configured to monitor an operation state of the detection object in the target operation type to obtain at least two target operation behavior data corresponding to the target operation type;
a sample data determining module 402, configured to perform combined arrangement on the target operation behavior data, and perform sample marking on a combined arrangement result to obtain sample data; wherein the sample data is used for training an anomaly detection model for the detection object.
Optionally, the at least two target operation behavior data corresponding to the target operation type include operation behavior data of the detection object in a normal operation state;
the sample data comprises operation behavior data of the detection object in an abnormal operation state.
Optionally, the sample data determining module 402 includes:
the behavior combination unit is used for carrying out combination arrangement on the target operation behavior data to obtain a preset number of combination behavior sequences; the preset number is related to the number of the target operation behavior data, and the position arrangement sequence of the operation behavior data in the combined behavior sequence represents the operation sequence of the operation behavior data in the combined behavior sequence;
the sample marking unit is used for marking the sample of the combined behavior sequence according to a preset running sequence between the adjacent running behavior data under the target operation type to obtain sample data; wherein the sample data comprises positive and negative samples.
Optionally, the sample marking unit comprises a first marking subunit for:
if the number of the operation behavior data included in the combined behavior sequence is at least two, and the operation identifications of any two adjacent operation behavior data in the combined behavior sequence satisfy the following relation, marking the combined behavior sequence as a positive sample:
j=k·n+i%n+1;
the operation identification is pre-allocated to the operation behavior data according to a preset operation sequence, operation identification values of adjacent operation behavior data which accord with the preset operation sequence in the operation identification allocation process have continuity, j represents an operation identification of operation behavior data positioned at the back of two adjacent operation behavior data in the combined behavior sequence, i represents an operation identification of operation behavior data positioned at the front of two adjacent operation behavior data in the combined behavior sequence, n represents the total number of the operation behavior data in the combined behavior sequence, i% n represents that i performs modular operation on n, and k represents an integer quotient obtained by dividing the minimum operation identification m in the combined behavior sequence by n.
Optionally, the sample marking unit further comprises a second marking subunit for:
if the number of the operation behavior data included in the combined behavior sequence is one and the operation identifier of the operation behavior data is a preset identifier, marking the combined behavior sequence as a positive sample; the prediction identifier is used for representing running behavior data at the first running position under the target operation type.
Optionally, the operation monitoring module 401 is specifically configured to:
and segmenting the running process of the detection object according to the target operation type, and monitoring the running state of the detection object in a segmented manner to obtain at least two target running behavior data corresponding to the target operation type.
Optionally, the detection object includes a vehicle, and the target operation type includes an operation related to running of the vehicle.
The sample data construction apparatus 400 disclosed in the embodiment of the present application can execute the sample data construction method disclosed in the embodiment of the present application, and has functional modules and beneficial effects corresponding to the execution method. Reference may be made to the description of any method embodiment of the present application for details not explicitly described in the apparatus embodiments of the present application.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 5, fig. 5 is a block diagram of an electronic device for implementing the sample data construction method in the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of embodiments of the present application described and/or claimed herein.
As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display Graphical information for a Graphical User Interface (GUI) on an external input/output device, such as a display device coupled to the Interface. In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations, e.g., as a server array, a group of blade servers, or a multi-processor system. In fig. 5, one processor 501 is taken as an example.
The memory 502 is a non-transitory computer readable storage medium provided by the embodiments of the present application. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the sample data construction method provided by the embodiment of the application. The non-transitory computer-readable storage medium of the embodiments of the present application stores computer instructions for causing a computer to execute the sample data construction method provided by the embodiments of the present application.
The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the sample data construction method in the embodiment of the present application, for example, the operation monitoring module 401 and the sample data determination module 402 shown in fig. 4. The processor 501 executes various functional applications and data processing of the electronic device by running non-transitory software programs, instructions and modules stored in the memory 502, that is, implements the sample data construction method in the above method embodiment.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 502 may optionally include a memory remotely located from the processor 501, and these remote memories may be connected via a network to an electronic device for implementing the sample data construction method in this embodiment. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device for implementing the sample data construction method in the embodiment of the present application may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus for implementing the sample data construction method in the present embodiment, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output device 504 may include a display device, an auxiliary lighting device such as a Light Emitting Diode (LED), a tactile feedback device, and the like; the tactile feedback device is, for example, a vibration motor or the like. The Display device may include, but is not limited to, a Liquid Crystal Display (LCD), an LED Display, and a plasma Display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, Integrated circuitry, Application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs, also known as programs, software applications, or code, include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or Device for providing machine instructions and/or data to a Programmable processor, such as a magnetic disk, optical disk, memory, Programmable Logic Device (PLD), including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device for displaying information to a user, for example, a Cathode Ray Tube (CRT) or an LCD monitor; and a keyboard and a pointing device, such as a mouse or a trackball, by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
According to the technical scheme of the embodiment of the application, the running state of the detection object is monitored, the monitored target running behavior data corresponding to the target operation type is combined and arranged, and the combined arrangement result is subjected to sample marking to obtain sample data, so that the effect of quickly constructing abundant training sample data is achieved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (16)

1. A sample data construction method comprises the following steps:
monitoring the running state of a detection object under a target operation type to obtain at least two target running behavior data corresponding to the target operation type;
performing combined arrangement on the target operation behavior data, and performing sample marking on a combined arrangement result to obtain sample data; wherein the sample data is used to train an anomaly detection model for the detection object.
2. The method according to claim 1, wherein the at least two target operation behavior data corresponding to the target operation type include operation behavior data of the detection object in a normal operation state;
the sample data comprises operation behavior data of the detection object in an abnormal operation state.
3. The method of claim 1, wherein performing combined permutation on the target operation behavior data and performing sample marking on a combined permutation result to obtain sample data comprises:
combining and arranging the target operation behavior data to obtain a preset number of combined behavior sequences; wherein the position arrangement order of the operation behavior data in the combined behavior sequence represents the operation order of the operation behavior data in the combined behavior sequence;
according to a preset operation sequence between adjacent operation behavior data under the target operation type, carrying out sample marking on the combined behavior sequence to obtain sample data; wherein the sample data comprises positive and negative samples.
4. The method of claim 3, wherein sample marking the combined behavior sequence according to a preset running order between adjacent running behavior data under the target operation type to obtain the sample data comprises:
if the number of the operation behavior data included in the combined behavior sequence is at least two, and the operation identifications of any two adjacent operation behavior data in the combined behavior sequence satisfy the following relation, marking the combined behavior sequence as a positive sample:
j=k·n+i%n+1
the operation identification is pre-allocated to the operation behavior data according to the preset operation sequence, and operation identification values of adjacent operation behavior data which accord with the preset operation sequence in the operation identification allocation process have continuity, j represents an operation identification of operation behavior data positioned at the back of two adjacent operation behavior data in the combined behavior sequence, i represents an operation identification of operation behavior data positioned at the front of two adjacent operation behavior data in the combined behavior sequence, n represents the total number of the operation behavior data included in the combined behavior sequence, i% n represents that i performs modular operation on n, and k represents an integer quotient obtained by dividing the minimum operation identification m in the combined behavior sequence by n.
5. The method of claim 4, wherein sample marking is performed on the combined behavior sequence according to a preset running order between adjacent running behavior data under the target operation type to obtain the sample data, and further comprising:
if the number of the operation behavior data included in the combined behavior sequence is one and the operation identifier of the operation behavior data is a preset identifier, marking the combined behavior sequence as a positive sample; and the prediction identifier is used for representing the running behavior data at the first running position under the target operation type.
6. The method according to claim 1, wherein the monitoring the running state of the detection object under the target operation type to obtain at least two target running behavior data corresponding to the target operation type includes:
and segmenting the running process of the detection object according to the target operation type, and monitoring the running state in a segmented manner to obtain at least two target running behavior data corresponding to the target operation type.
7. The method of claim 1, wherein the detection object comprises a vehicle and the target operation type comprises an operation related to vehicle operation.
8. A sample data construction apparatus comprising:
the operation monitoring module is used for monitoring the operation state of a detection object under a target operation type to obtain at least two target operation behavior data corresponding to the target operation type;
the sample data determining module is used for carrying out combined arrangement on the target operation behavior data and carrying out sample marking on a combined arrangement result to obtain sample data; wherein the sample data is used to train an anomaly detection model for the detection object.
9. The apparatus according to claim 8, wherein the at least two target operation behavior data corresponding to the target operation type include operation behavior data of the detection object in a normal operation state;
the sample data comprises operation behavior data of the detection object in an abnormal operation state.
10. The apparatus of claim 8, wherein the sample data determination module comprises:
the behavior combination unit is used for carrying out combination arrangement on the target operation behavior data to obtain a preset number of combination behavior sequences; wherein the position arrangement order of the operation behavior data in the combined behavior sequence represents the operation order of the operation behavior data in the combined behavior sequence;
the sample marking unit is used for carrying out sample marking on the combined behavior sequence according to a preset running sequence between adjacent running behavior data under the target operation type to obtain the sample data; wherein the sample data comprises positive and negative samples.
11. The apparatus of claim 10, wherein the sample labeling unit comprises a first labeling subunit to:
if the number of the operation behavior data included in the combined behavior sequence is at least two, and the operation identifications of any two adjacent operation behavior data in the combined behavior sequence satisfy the following relation, marking the combined behavior sequence as a positive sample:
j=k·n+i%n+1
the operation identification is pre-allocated to the operation behavior data according to the preset operation sequence, and operation identification values of adjacent operation behavior data which accord with the preset operation sequence in the operation identification allocation process have continuity, j represents an operation identification of operation behavior data positioned at the back of two adjacent operation behavior data in the combined behavior sequence, i represents an operation identification of operation behavior data positioned at the front of two adjacent operation behavior data in the combined behavior sequence, n represents the total number of the operation behavior data included in the combined behavior sequence, i% n represents that i performs modular operation on n, and k represents an integer quotient obtained by dividing the minimum operation identification m in the combined behavior sequence by n.
12. The apparatus of claim 11, wherein the sample labeling unit further comprises a second labeling subunit for:
if the number of the operation behavior data included in the combined behavior sequence is one and the operation identifier of the operation behavior data is a preset identifier, marking the combined behavior sequence as a positive sample; and the prediction identifier is used for representing the running behavior data at the first running position under the target operation type.
13. The apparatus of claim 8, wherein the operation monitoring module is specifically configured to:
and segmenting the running process of the detection object according to the target operation type, and monitoring the running state of the detection object in a segmented manner to obtain at least two target running behavior data corresponding to the target operation type.
14. The apparatus of claim 8, wherein the detection object comprises a vehicle and the target operation type comprises an operation related to vehicle operation.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of specimen data construction of any one of claims 1 to 7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to execute the sample data construction method of any one of claims 1-7.
CN202010865395.7A 2020-08-25 2020-08-25 Sample data construction method, device, equipment and medium Pending CN112070132A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010865395.7A CN112070132A (en) 2020-08-25 2020-08-25 Sample data construction method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010865395.7A CN112070132A (en) 2020-08-25 2020-08-25 Sample data construction method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN112070132A true CN112070132A (en) 2020-12-11

Family

ID=73659420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010865395.7A Pending CN112070132A (en) 2020-08-25 2020-08-25 Sample data construction method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN112070132A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115204050A (en) * 2022-07-22 2022-10-18 木卫四(北京)科技有限公司 Vehicle-mounted CAN bus data abnormity detection method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130188176A1 (en) * 2012-01-20 2013-07-25 Peter Scott Lovely Monitoring for disturbance of optical fiber
CN204706192U (en) * 2015-06-04 2015-10-14 石立公 Vehicle electronics policing system
CN105160181A (en) * 2015-09-02 2015-12-16 华中科技大学 Detection method of abnormal data of numerical control system instruction field sequence
CN105758644A (en) * 2016-05-16 2016-07-13 上海电力学院 Rolling bearing fault diagnosis method based on variation mode decomposition and permutation entropy
CN107908300A (en) * 2017-11-17 2018-04-13 哈尔滨工业大学(威海) A kind of synthesis of user's mouse behavior and analogy method and system
CN108537176A (en) * 2018-04-11 2018-09-14 武汉斗鱼网络科技有限公司 Recognition methods, device, terminal and the storage medium of target barrage
CN110113226A (en) * 2019-04-16 2019-08-09 新华三信息安全技术有限公司 A kind of method and device of detection device exception
CN110823237A (en) * 2019-10-24 2020-02-21 百度在线网络技术(北京)有限公司 Starting point binding and prediction model obtaining method, device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130188176A1 (en) * 2012-01-20 2013-07-25 Peter Scott Lovely Monitoring for disturbance of optical fiber
CN204706192U (en) * 2015-06-04 2015-10-14 石立公 Vehicle electronics policing system
CN105160181A (en) * 2015-09-02 2015-12-16 华中科技大学 Detection method of abnormal data of numerical control system instruction field sequence
CN105758644A (en) * 2016-05-16 2016-07-13 上海电力学院 Rolling bearing fault diagnosis method based on variation mode decomposition and permutation entropy
CN107908300A (en) * 2017-11-17 2018-04-13 哈尔滨工业大学(威海) A kind of synthesis of user's mouse behavior and analogy method and system
CN108537176A (en) * 2018-04-11 2018-09-14 武汉斗鱼网络科技有限公司 Recognition methods, device, terminal and the storage medium of target barrage
CN110113226A (en) * 2019-04-16 2019-08-09 新华三信息安全技术有限公司 A kind of method and device of detection device exception
CN110823237A (en) * 2019-10-24 2020-02-21 百度在线网络技术(北京)有限公司 Starting point binding and prediction model obtaining method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈阳舟 等: "基于Co-training方法的车辆鲁棒检测算法", 《北京工业大学学报》, vol. 38, no. 03, pages 394 - 401 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115204050A (en) * 2022-07-22 2022-10-18 木卫四(北京)科技有限公司 Vehicle-mounted CAN bus data abnormity detection method and device

Similar Documents

Publication Publication Date Title
Schlegel et al. Towards a rigorous evaluation of XAI methods on time series
US11288102B2 (en) Modifying resources for composed systems based on resource models
CN111753914B (en) Model optimization method and device, electronic equipment and storage medium
CN112241452B (en) Model training method and device, electronic equipment and storage medium
CN111967302B (en) Video tag generation method and device and electronic equipment
US9519481B2 (en) Branch synthetic generation across multiple microarchitecture generations
CN111506401B (en) Automatic driving simulation task scheduling method and device, electronic equipment and storage medium
US10671061B2 (en) Devices, methods, and systems for a distributed rule based automated fault detection
CN111552646B (en) Method and apparatus for regression testing
US9772895B2 (en) Identifying intervals of unusual activity in information technology systems
CN111666206A (en) Method, device, equipment and storage medium for acquiring influence range of change code
CN112070416B (en) AI-based RPA flow generation method, apparatus, device and medium
CN112070132A (en) Sample data construction method, device, equipment and medium
US20180173687A1 (en) Automatic datacenter state summarization
CN111696095B (en) Method and device for detecting surface defects of object
CN111738290B (en) Image detection method, model construction and training method, device, equipment and medium
CN113093695A (en) Data-driven SDN controller fault diagnosis system
CN112560459B (en) Sample screening method, device, equipment and storage medium for model training
CN113887101A (en) Visualization method and device of network model, electronic equipment and storage medium
CN112381167A (en) Method for training task classification model, and task classification method and device
CN112101447A (en) Data set quality evaluation method, device, equipment and storage medium
CN112328710A (en) Entity information processing method, entity information processing device, electronic equipment and storage medium
CN112183484A (en) Image processing method, device, equipment and storage medium
CN112070487A (en) AI-based RPA process generation method, apparatus, device and medium
CN111783872A (en) Method and device for training model, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211020

Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Apollo Zhilian (Beijing) Technology Co.,Ltd.

Address before: 2 / F, baidu building, 10 Shangdi 10th Street, Haidian District, Beijing 100085

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.