All of this means that if we want to minimize surprise FPs between model releases, we must ensure DV ordering preservation.
XGBoost is flexible because its Newton-Raphson solver requires only the gradient and Hessian of the objective rather than the objective itself. By adding small perturbations to the gradient and to the Hessian, we can replace the standard XGBoost objective function with one that includes a loss for failing to rank DVs according to the DV ranking defined by the previous model release, thereby promoting model release stability.
Table of Contents
Mathematical Description of XGBoost Optimization
The following, up to but not including the example, is taken predominantly from the XGBoost Project docs. The XGBoost model consists of an ensemble of trees
such that
The objective function we leverage for training the binary classifier is the binary logistic loss function with complexity regularization
where
and
For each iteration (t) the goal is to find ft that minimizes obj(t). In the case of a neural network, loss minimization requires computing the rate of change of the loss with respect to the model weights. In the case of XGBoost, we compute the second-order Taylor expansion of the loss l and provide the gradient and Hessian to the Newton-Raphson solver to find the optimal ft given previously constructed trees f(s
The second-order Taylor expansion of the objective takes the form
where
The upshot is that if we want to customize the XGBoost objective, we need only provide the updated gradient gi and Hessian hi.
A note to the observant reader (not from the docs): In the above expansion, the loss function
is being expanded around
where the independent variable is in the form
and
Computing
gives
For the sake of making these equations more interpretable and concrete, assume we have a sample x such that the XGBoost model f outputs 0.2 = p = f(x), and assume we have a true label y = 1. The gradient of the logistic loss for this sample is g = p-y = -0.8. This will encourage the (t+1)st tree to be constructed so as to push the prediction value for this sample higher.
The adjustment to the gradient and Hessian are then
and
respectively.
The takeaway is that a negative gradient pushes the prediction value and therefore the DV higher, as the sigmoid function is everywhere increasing. This means that if we want to customize the objective function in such a way that the DV of a given sample is pushed higher as subsequent trees are added, we should add a number v to the gradient for that sample.
An Intuitive Toy Example
Assume we have sorted the samples in the training corpus of model N by DV in ascending order and stacked the remaining samples below. Assume ypred = [1,2,3,4,5,7,6]. The resulting addition to the gradient should be something like [0,0,0,0,0,1,-1]. The intuition is that we want to move the prediction of the sample whose current prediction is 6 a little higher and the prediction of the sample whose current prediction is 7 a little lower. Keep in mind that the ordering in terms of row position of the underlying samples in the train set is correct by assumption. This will enforce the proper ordering of [1,2,3,4,5,6,7].
Experiments, Code, and Results
Experimental Setup
Each experiment consists of training exactly three XGBoost binary classifier models on a set of 90/10 dirty/clean PE files. Featurization was performed with an internally developed static parser, but the method itself is agnostic to the parser. One could leverage the EMBER open-source parser, for example. The first model represents the “N” release trained with the standard XGBoost logistic loss objective. We call this the “old” model. The second model represents the standard “N+1” release trained with the same objective as the “old” model but with 10% more data and the same label balance. We call this the “full” model. The third model represents the candidate “N+1” release trained with the custom objective described above and on the same dataset as the “full” model.
We ran two separate experiments, differing only in the number of training samples. The custom objective succeeded in reducing swap-in or “surprise” FPs with a minimal trade-off in true positives.
Results
Comparison | Swap-Ins | Persistent FPS | Non-Swap New FPS | Total FPS Old Model | Total FPS New Model | Total TPS Old Model | Total TPS New Model |
Old vs. Full | 32 | 194 | 23 | 226 | 250 | 25,267 | 28,111 |
Old vs. Candidate | 26 (18.75%) | 199 | 25 | 226 | 250 | 25, 267 | 28,104 (0.025%) |
Comparison | Swap-Ins | Persistent FPS | Non-Swap New FPS | Total FPS Old Model | Total FPS New Model | Total TPS Old Model | Total TPS New Model |
Old vs. Full | 59 | 382 | 56 | 446 | 497 | 62,157 | 69,059 |
Old vs. Candidate | 53 (10.2%) | 387 | 56 | 446 | 497 | 62,157 | 69,053 (0.009%) |
Python Implementation
The perturbation value we decided to use was simply the difference between the pred values of each pair of misordered samples (ordered according to DV output by model N, or “old” model). Note that this requires a perturbation to the Hessian as well. This code assumes the values in the argument “y_pred” are ordered according to values output by model N. Take care to note that this does not mean these values are ordered as on the real number line. The scipy function expit is the sigmoid function with built-in underflow and overflow protection.
The callable CustomObjective class instantiation is then passed to the standard xgb.train function. Incidentally, the callable class is another way, in addition to lambda functions, to pass additional arguments to Python functions called with a signature restriction on the number of arguments.
Employing an XGBoost Custom Objective Function Results in More Predictable Model Behavior with Fewer FPs
XGBoost classifier consistency between releases can be improved with an XGBoost custom objective function that is easy to implement and mathematically sound, with a minimal trade-off in true positive rate. The results are more predictable model behavior, less chaotic customer environments, and fewer threat researcher cycles wasted on surprise FP remediation.
CrowdStrike’s Research Investment Pays Off for Customers and the Cybersecurity Industry
Research is a critical function at CrowdStrike, ensuring we continue to take a leadership role in advancing the global cybersecurity ecosystem. The results of groundbreaking work — like that done by the team who conducted the research into the XGBoost custom objective function — ensure CrowdStrike customers enjoy state-of-the-art protection and advance cyber defenses globally against sophisticated adversaries.
Additional Resources
Leave a Reply