Xgboost is short for eXtreme Gradient Boosting package.
BTW what is boosting?
Two common terms used in ML is Bagging & Boosting
Bagging: It is an approach where you take random samples of data, build learning algorithms and take simple means to find bagging probabilities.
Boosting: Boosting is similar, however the selection of sample is made more intelligently. We subsequently give more and more weight to hard to classify observations.
In broad terms, it’s the efficiency, accuracy and feasibility of this algorithm.
It has both linear model solver and tree learning algorithms. So, what makes it fast is its capacity to do parallel computation on a single machine.
It also has additional features for doing cross validation and finding important variables.
Xgboost manages only numeric vectors.
What to do when you have categorical data?
A simple method to convert categorical variable into numeric vector is One Hot Encoding.
We first briefly review the learning objective in tree boosting. For a given data set with n examples and m features a tree ensemble model (shown in Fig. above ) uses K additive functions to predict the output.
It has also been widely adopted by industry users, including Google, Alibaba and Tencent, and various startup companies. According to a popular article in Forbes, xgboost can scale with hundreds of workers (with each worker utilizing multiple processors) smoothly and solve machine learning problems involving Terabytes of real world data.