首页 / 更多教程 / GBDT之GradientBoostingClassifier源码分析

GBDT之GradientBoostingClassifier源码分析

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了GBDT之GradientBoostingClassifier源码分析，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含3328字，纯文字阅读大概需要5分钟。

内容图文

GradientBoostingClassifier

import pandas as pd
import numpy as np
import math
from sklearn.ensemble import GradientBoostingClassifier
df = pd.DataFrame([[1,-1],[2,-1],[3,-1],[4,1],[5,1],
                   [6,-1],[7,-1],[8,-1],[9,1],[10,1]])
X = df.iloc[:,[0]]
Y = df.iloc[:,-1]
model = GradientBoostingClassifier(n_estimators=20, learning_rate=1.0,
                                 max_depth=1, random_state=0)
model.fit(X, Y)
print(model.predict(X))

模型初始化
第1轮加法模型不在预测为均值
def fit(self, X, y, sample_weight=None):
    # pre-cond: pos, neg are encoded as 1, 0
    if sample_weight is None:
        pos = np.sum(y)
        neg = y.shape[0] - pos
    else:
        pos = np.sum(sample_weight * y)
        neg = np.sum(sample_weight * (1 - y))

    if neg == 0 or pos == 0:
        raise ValueError('y contains non binary labels.')
    self.prior = self.scale * np.log(pos / neg)

GBDT之GradientBoostingClassifier源码分析 - 文章图片
计算第1轮加法模型下损失函数的负梯度（残差）

计算负梯度的公式 expit(x) = 1/(1+exp(-x))

def negative_gradient(self, y, pred, **kargs):
    """Compute the residual (= negative gradient). """
    return y - expit(pred.ravel())

调整叶子结点
使用第1轮加法模型的负梯度拟合一棵回归树，但这里要调整叶子结点，每个叶子结点的输出值依赖于选用的损失函数

def update_terminal_regions(self, tree, X, y, residual, y_pred,
                            sample_weight, sample_mask,
                            learning_rate=1.0, k=0):
    """Update the terminal regions (=leaves) of the given tree and
    updates the current predictions of the model. Traverses tree
    and invokes template method `_update_terminal_region`.

    Parameters
    ----------
    tree : tree.Tree
        The tree object.
    X : ndarray, shape=(n, m)
        The data array.
    y : ndarray, shape=(n,)
        The target labels.
    residual : ndarray, shape=(n,)
        The residuals (usually the negative gradient).
    y_pred : ndarray, shape=(n,)
        The predictions.
    sample_weight : ndarray, shape=(n,)
        The weight of each sample.
    sample_mask : ndarray, shape=(n,)
        The sample mask to be used.
    learning_rate : float, default=0.1
        learning rate shrinks the contribution of each tree by
         ``learning_rate``.
    k : int, default 0
        The index of the estimator being updated.

    """
    # compute leaf for each sample in ``X``.
    terminal_regions = tree.apply(X)

    # mask all which are not in sample mask.
    masked_terminal_regions = terminal_regions.copy()
    masked_terminal_regions[~sample_mask] = -1

    # update each leaf (= perform line search)
    for leaf in np.where(tree.children_left == TREE_LEAF)[0]:
        self._update_terminal_region(tree, masked_terminal_regions,
                                     leaf, X, y, residual,
                                     y_pred[:, k], sample_weight)

    # update predictions (both in-bag and out-of-bag)
    y_pred[:, k] += (learning_rate
                     * tree.value[:, 0, 0].take(terminal_regions, axis=0))


numerator = np.sum(sample_weight * residual)
denominator = np.sum(sample_weight * (y - residual) * (1 - y + residual))

可视化

[[1,-1],[2,-1],[3,-1],[4,1],[5,1],[6,-1],[7,-1],[8,-1],[9,1],[10,1]]
round1
第1轮加法模型不是预测为均值了
self.prior = self.scale * np.log(pos / neg)
pos 是1类数量 neg是-1类数量
np.log(4/6)=-0.4054
GBDT之GradientBoostingClassifier源码分析 - 文章图片

计算负梯度，程序会自动将-1类变为0
a = np.log(4/6)
print(0-1/(1+np.exp(-a))) -0.4
return y - expit(pred.ravel())
GBDT之GradientBoostingClassifier源码分析 - 文章图片

round2
使用第1轮加法模型下的负梯度拟合一棵回归树，这里就需要对叶子结点进行调整了
GBDT之GradientBoostingClassifier源码分析 - 文章图片
前8个样本被分在了左叶结点，后2个样本被分在了右叶结点
以左叶结点为例，该结点最终的输出值的计算如下：

分子/分母= -0.625，这里的学习率=1

加入加法模型后
-0.4054-0.625=-1.0304
GBDT之GradientBoostingClassifier源码分析 - 文章图片

round3

GBDT之GradientBoostingClassifier源码分析 - 文章图片

round4

…
round20
GBDT之GradientBoostingClassifier源码分析 - 文章图片

预测

score = self.decision_function(X)  封装
decisions = self.loss_._score_to_decision(score) 输出类别
return self.classes_.take(decisions, axis=0)

内容总结

以上是互联网集市为您收集整理的GBDT之GradientBoostingClassifier源码分析全部内容，希望文章能够帮你解决GBDT之GradientBoostingClassifier源码分析所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/819001.html

来源：【匿名】

【上一篇】c – 在运行程序时找不到共享对象库,但在编译期间它已链接【下一篇】关于IE的RegExp.exec的问题

更多 ►

【GBDT之GradientBoostingClassifier源码分析】教程文章相关的互联网学习教程文章

飞鸽传书源码分析四-消息发送【代码】【图】

转载请注明出处：http://blog.csdn.net/mxway/article/details/44569831 本篇文章是在飞鸽传书源码v2.06的基础上进行分析的飞鸽传书是一款工作在局域网的软件，支持局域网里不同设备之间的消息发送及文件的传输（消息发送使用udp，文件传输使用tcp）。发送消息及文件传输是在飞鸽传书的发送对话框中进行，而发送对话框的打开是通过双击拖盘（win7系统）到任务栏的图标。而拖盘到任务栏的这个图标就是飞鸽传书的主窗口，对应的源...

精尽MyBatis源码分析 - MyBatis初始化（四）之 SQL 初始化（下）

摘自：https://www.cnblogs.com/lifullmoon/p/14015075.html该系列文档是本人在学习 Mybatis 的源码过程中总结下来的，可能对读者不太友好，请结合我的源码注释（Mybatis源码分析 GitHub 地址、Mybatis-Spring 源码分析 GitHub 地址、Spring-Boot-Starter 源码分析 GitHub 地址）进行阅读 MyBatis 版本：3.5.2 MyBatis-Spring 版本：2.0.3 MyBatis-Spring-Boot-Starter 版本：2.1.4MyBatis的初始化在MyBatis初始化过程中，大致会有...

xxl-job源码分析【代码】【图】

xxl-job 系统说明安装安装部署参考文档：分布式任务调度平台xxl-job 功能定时调度、服务解耦、灵活控制跑批时间（停止、开启、重新设定时间、手动触发） XXL-JOB是一个轻量级分布式任务调度平台，其核心设计目标是开发迅速、学习简单、轻量级、易扩展。现已开放源代码并接入多家公司线上产品线，开箱即用概念 1、执行器列表：一个执行器是一个项目 2、任务：一个任务是一个项目中的 JobHandler 3、一个xxl-job服务可以有多个执...

Dubbo 源码分析 - 集群容错之 Router【代码】

1. 简介上一篇文章分析了集群容错的第一部分 – 服务目录 Directory。服务目录在刷新 Invoker 列表的过程中，会通过 Router 进行服务路由。上一篇文章关于服务路由相关逻辑没有细致分析，一笔带过了，本篇文章将对此进行详细的分析。首先，先来介绍一下服务目录是什么。服务路由包含一条路由规则，路由规则决定了服务消费者的调用目标，即规定了服务消费者可调用哪些服务提供者。Dubbo 目前提供了三种服务路由实现，分别为条件路由...

Mybaits源码分析九之sql执行流程【代码】

SubjectDao subjectDao=sqlSession.getMapper(SubjectDao.class);对应的源码为：1 @Override 2 public <T> T getMapper(Class<T> type) { 3 return configuration.<T>getMapper(type, this); 4 }1 public <T> T getMapper(Class<T> type, SqlSession sqlSession) { 2 return mapperRegistry.getMapper(type, sqlSession); 3 } 1 @SuppressWarnings("unchecked")2 public <T> T getMapper(Class<T> type, SqlSess...

spring boot启动源码分析 afterRefresh【代码】

1 protected void afterRefresh(ConfigurableApplicationContext context,2 ApplicationArguments args) {3 callRunners(context, args);4 }5 6 private void callRunners(ApplicationContext context, ApplicationArguments args) {7 List<Object> runners = new ArrayList<Object>();8 runners.addAll(context.getBeansOfType(ApplicationRunner.class).values());9 ru...

一、dubbo源码分析【图】

一、整体设计dubbo整体设计以及调用用链路参照官网?http://dubbo.apache.org/zh-cn/docs/dev/design.html 二、dubbo的注册中心1、注册中心参照官网 ?http://dubbo.apache.org/zh-cn/docs/user/references/registry/introduction.html；2、zk注册中心详解2.1、目录结构+- dubbo+- com.demo.service.HelloService+- consumers+- consumer://192.168.1.102/com.demo.service.HelloService?application=dubbo-demo-annotation-consumer...

ForkJoinPool 源码分析

ForkJoinPool ForkJoinPool 是一个运行 ForkJoinTask 任务、支持工作窃取和并行计算的线程池核心参数+创建实例// 工作者线程驻留任务队列索引位static final int SWIDTH = 16; // 低 16 位掩码static final int SMASK = 0xffff; // 最大并行度：#workers - 1static final int MAX_CAP = 0x7fff; // 最大工作队列数、提交队列数static final int SQMASK = 0x007e; // 工作者线...

3.2spring源码系列----循环依赖源码分析【代码】【图】

首先,我们在3.1 spring5源码系列--循环依赖之手写代码模拟spring循环依赖中手写了循环依赖的实现. 这个实现就是模拟的spring的循环依赖. 目的是为了更容易理解spring源码. 下面我们就进入正题, 看看spring的循环依赖源码. 一、getBean整体流程目标很明确了, 就是要看看spring如何解决循环依赖的. 代码入口是refresh()#finishBeanFactoryInitialization(beanFactory); 二、拆解研究流程中的每一步调用方法beanFactory.preI...

Mybatis执行流程源码分析【代码】【图】

第一部分：项目结构 user_info表：只有id和username两个字段 User实体类: public class User {private String username;private String password;public String getUsername() {return username;}public void setUsername(String username) {this.username = username;}public String getPassword() {return password;}public void setPassword(String password) {this.password = password;} }mapper:UserMapper 为根据id查询用户...

ArrayList源码分析--jdk1.8【代码】【图】

ArrayList概述??1. ArrayList是可以动态扩容和动态删除冗余容量的索引序列，基于数组实现的集合。??2. ArrayList支持随机访问、克隆、序列化，元素有序且可以重复。??3. ArrayList初始默认长度10，使用Object[]存储各种数据类型。ArrayList数据结构??数据结构是集合的精华所在，数据结构往往也限制了集合的作用和侧重点，了解各种数据结构是我们分析源码的必经之路。??ArrayList的数据结构如下：ArrayList源码分析 /** 用数组实现的...

SparkRdd实现单词统计源码分析【代码】

SparkRdd实现单词统计源码分析 1 手写单词统计 //设置任务名字 local本地模式 val conf=new SparkConf().setAppName("WC").setMaster("local") //通向spark集群的入口 val sc =new SparkContext(conf) // sc.textFile(args(0)).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortBy(_._2,false).saveAsTextFile(args(1)) 2 本地Debug调试信息 3 本地debug调试 (1) MapPartitionsRDD[7] at sortBy at SparkWordCou...

[Abp vNext 源码分析] - 5. DDD 的领域层支持(仓储、实体、值对象)【图】

一、简要介绍 ABP vNext 框架本身就是围绕着 DDD 理念进行设计的，所以在 DDD 里面我们能够见到的实体、仓储、值对象、领域服务，ABP vNext 框架都为我们进行了实现，这些基础设施都存放在 Volo.Abp.Ddd.Domain 项目当中。本篇文章将会侧重于理论讲解，但也只是一个抛砖引玉的作用，关于 DDD 相关的知识可以阅读 Eric Evans 所编写的《领域驱动设计：软件核心复杂性应对之道》。PS：该书也是目前我正在阅读的 DDD 理论书籍，因为...

Dubbo源码分析之处理请求（上）

提供端接收请求class NettyServerHandler extends ChannelDuplexHandler public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {NettyChannel channel = NettyChannel.getOrAddChannel(ctx.channel(), url, handler);try {handler.received(channel, msg);} finally {NettyChannel.removeChannelIfDisconnected(ctx.channel());} } 获取或者创建新的netty包装通道，后续发送响应需要使用网络通道stat...

Django REST framework认证权限和限制源码分析【图】

1.首先我们进入这个initial（）里面看下他内部是怎么实现的。 2.我们进入里面看到他实现了3个方法，一个认证，权限频率 3.我们首先看下认证组件发生了什么权限：啥都没返回，self.permission 条件不执行了。 4.频率组件小总结一下继续：

首页 / 更多教程 / GBDT之GradientBoostingClassifier源码分析

GBDT之GradientBoostingClassifier源码分析

内容导读

内容图文

GradientBoostingClassifier

可视化

内容总结

内容备注

内容手机端

【GBDT之GradientBoostingClassifier源码分析】教程文章相关的互联网学习教程文章

飞鸽传书源码分析四-消息发送【代码】【图】

精尽MyBatis源码分析 - MyBatis初始化（四）之 SQL 初始化（下）

xxl-job源码分析【代码】【图】

Dubbo 源码分析 - 集群容错之 Router【代码】

Mybaits源码分析九之sql执行流程【代码】

spring boot启动源码分析 afterRefresh【代码】

一、dubbo源码分析【图】

ForkJoinPool 源码分析

3.2spring源码系列----循环依赖源码分析【代码】【图】

Mybatis执行流程源码分析【代码】【图】

ArrayList源码分析--jdk1.8【代码】【图】

SparkRdd实现单词统计源码分析【代码】

[Abp vNext 源码分析] - 5. DDD 的领域层支持(仓储、实体、值对象)【图】

Dubbo源码分析之处理请求（上）

Django REST framework认证权限和限制源码分析【图】

更多教程 - 最新教程

更多教程 - 最热教程