首页 / MAC / Machine Learning Done Wrong

Machine Learning Done Wrong

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了Machine Learning Done Wrong，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含7341字，纯文字阅读大概需要11分钟。

内容图文

Machine Learning Done Wrong

Statistical modeling is a lot like engineering.

In engineering, there are various ways to build a key-value storage, and each design makes a different set of assumptions about the usage pattern. In statistical modeling, there are various algorithms to build a classifier, and each algorithm makes a different set of assumptions about the data.

When dealing with small amounts of data, it’s reasonable to try as many algorithms as possible and to pick the best one since the cost of experimentation is low. But as we hit “big data”, it pays off to analyze the data upfront and then design the modeling pipeline (pre-processing, modeling, optimization algorithm, evaluation, productionization) accordingly.

As pointed out in my previous post, there are dozens of ways to solve a given modeling problem. Each model assumes something different, and it’s not obvious how to navigate and identify which assumptions are reasonable. In industry, most practitioners pick the modeling algorithm they are most familiar with rather than pick the one which best suits the data. In this post, I would like to share some common mistakes (the don‘t-s). I’ll save some of the best practices (the do-s) in a future post.

1. Take default loss function for granted

Many practitioners train and pick the best model using the default loss function (e.g., squared error). In practice, off-the-shelf loss function rarely aligns with the business objective. Take fraud detection as an example. When trying to detect fraudulent transactions, the business objective is to minimize the fraud loss. The off-the-shelf loss function of binary classifiers weighs false positives and false negatives equally. To align with the business objective, the loss function should not only penalize false negatives more than false positives, but also penalize each false negative in proportion to the dollar amount. Also, data sets in fraud detection usually contain highly imbalanced labels. In these cases, bias the loss function in favor of the rare case (e.g., through up/down sampling).

2. Use plain linear models for non-linear interaction

When building a binary classifier, many practitioners immediately jump to logistic regression because it’s simple. But, many also forget that logistic regression is a linear model and the non-linear interaction among predictors need to be encoded manually. Returning to fraud detection, high order interaction features like "billing address = shipping address and transaction amount < $50" are required for good model performance. So one should prefer non-linear models like SVM with kernel or tree based classifiers that bake in higher-order interaction features.

3. Forget about outliers

Outliers are interesting. Depending on the context, they either deserve special attention or should be completely ignored. Take the example of revenue forecasting. If unusual spikes of revenue are observed, it‘s probably a good idea to pay extra attention to them and figure out what caused the spike. But if the outliers are due to mechanical error, measurement error or anything else that’s not generalizable, it’s a good idea to filter out these outliers before feeding the data to the modeling algorithm.

Some models are more sensitive to outliers than others. For instance, AdaBoost might treat those outliers as "hard" cases and put tremendous weights on outliers while decision tree might simply count each outlier as one false classification. If the data set contains a fair amount of outliers, it‘s important to either use modeling algorithm robust against outliers or filter the outliers out.

4. Use high variance model when n<<p

SVM is one of the most popular off-the-shelf modeling algorithms and one of its most powerful features is the ability to fit the model with different kernels. SVM kernels can be thought of as a way to automatically combine existing features to form a richer feature space. Since this power feature comes almost for free, most practitioners by default use kernel when training a SVM model. However, when the data has n<<p (number of samples << number of features) -- common in industries like medical data -- the richer feature space implies a much higher risk to overfit the data. In fact, high variance models should be avoided entirely when n<<p.

5. L1/L2/... regularization without standardization

Applying L1 or L2 to penalize large coefficients is a common way to regularize linear or logistic regression. However, many practitioners are not aware of the importance of standardizing features before applying those regularization.

Returning to fraud detection, imagine a linear regression model with a transaction amount feature. Without regularization, if the unit of transaction amount is in dollars, the fitted coefficient is going to be around 100 times larger than the fitted coefficient if the unit were in cents. With regularization, as the L1 / L2 penalize larger coefficient more, the transaction amount will get penalized more if the unit is in dollars. Hence, the regularization is biased and tend to penalize features in smaller scales. To mitigate the problem, standardize all the features and put them on equal footing as a preprocessing step.

6. Use linear model without considering multi-collinear predictors

Imagine building a linear model with two variables X1 and X2 and suppose the ground truth model is Y=X1+X2. Ideally, if the data is observed with small amount of noise, the linear regression solution would recover the ground truth. However, if X1 and X2 are collinear, to most of the optimization algorithms‘ concerns, Y=2*X1, Y=3*X1-X2 or Y=100*X1-99*X2 are all as good. The problem might not be detrimental as it doesn‘t bias the estimation. However, it does make the problem ill-conditioned and make the coefficient weight uninterpretable.

7. Interpreting absolute value of coefficients from linear or logistic regression as feature importance

Because many off-the-shelf linear regressor returns p-value for each coefficient, many practitioners believe that for linear models, the bigger the absolute value of the coefficient, the more important the corresponding feature is. This is rarely true as (a) changing the scale of the variable changes the absolute value of the coefficient (b) if features are multi-collinear, coefficients can shift from one feature to others. Also, the more features the data set has, the more likely the features are multi-collinear and the less reliable to interpret the feature importance by coefficients.

So there you go: 7 common mistakes when doing ML in practice. This list is not meant to be exhaustive but merely to provoke the reader to consider modeling assumptions that may not be applicable to the data at hand. To achieve the best model performance, it is important to pick the modeling algorithm that makes the most fitting assumptions -- not just the one you’re most familiar with.

If you like the post, you can follow me (@chengtao_chu) on Twitter or subscribe to my blog "ML in the Valley". Also, special thanks Ian Wong (@ihat) for reading a draft of this.

Cheng-Tao Chu

Director of Analytics at Codecademy. Specialties: data engineering and machine learning. Formerly: Google, LinkedIn and Square.

原文：http://www.cnblogs.com/yymn/p/4677603.html

内容总结

以上是互联网集市为您收集整理的Machine Learning Done Wrong全部内容，希望文章能够帮你解决Machine Learning Done Wrong所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/1312271.html

来源：【匿名】

【上一篇】mac 开发环境采坑【下一篇】Mac电脑设置hosts的方法（图文步骤）

更多 ►

【Machine Learning Done Wrong】教程文章相关的互联网学习教程文章

常见错误：Apple Mach-O Linker Error【代码】

常见错误描述：Apple Mach-O Linker Error这类错误的错误信息最后一行通常如下：Command /Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/clang failed with exit code 1发生这种错误的原因通常是因为项目中存在同名类造成链接错误。有可能是你存在两个类名称都一样，也可能是因为你在不同的.m文件中定义了同样的const变量。这类错误需要自己看错误信息中给出的大长串路径，从中找出你的那个重名类或者变量名称，以此来...

Mac内核XNU的Mach子系统的一个完整过程的代码跟踪【代码】

一个完整的mach子系统 mach子系统包括了很多内核功能的实现，比如VM子系统（内存管理）、host子系统（主机硬件信息的处理）、thread子系统（thread相关实现）、exc子系统（异常处理相关）；现在拿thread_act为例来跟踪一下代码，希望能够简单地了解vm子系统的概况。（1）thread_act子系统的实现分为两部分： thread_actServer.c和thread_actUser.c，分别实现了内核中mach msg消息接收和发送的各个API。基本逻辑是：调用thread_...

[iOS逆向实战之七]看懂mach-o(2)【图】

个人原创，转帖请注明来源:cnblogs.com/jailbreaker接上一篇看懂mach-o(1)，本文继续讲紧随mach-o的header文件的load command加载命令，看下面2张图，分别是hopper中显示的第一个load command区域和segment_command的定义：第一张图截取的是第一个load command，从第一张图所知道，cmd类型是segment_command，就是截图的第2张图，依次分析：1.cmd 是load command的类型,本文中值＝1就是LC_SEGMENT，,LC_SEGMENT的含义是(将文件中的...

Basic Concepts in OS X Operation System（OSX系统的一些基本概念），准确地说是mach内核的一些基本概念【代码】

TasksA task is a logical representation of an execution environment. Tasks are usedin order to divide system resources between each running program. Each taskhas its own virtual address space and privilege level. Each task contains one ormore threads. The tasks address space and resources are shared between eachof its threads.On Mac OS X, new tasks can be created using either the task create() fun...

The TensorFlow library wasn‘t compiled to use SSE instructions, but these are available on your mach【代码】【图】

问题描述： The TensorFlow library wasnt compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.问题解决： tensorflow库没有编译，不能使用SSE，FMA等指令，但是他们可以用来加速你的CPU计算，这只影响CPU如果你使用的是GPU版本的tensorflow那么在GPU上的所有操作都不会受益于SSE指令。在py文件头部加入以下内容： import os os.environ['TF_CPP_MIN_LOG_LEVEL']=...

java.lang.UnsatisfiedLinkError,mach-o,但Mac10.6.2上的体系结构错误【代码】

我试图在本地计算机上运行一个项目. 我试图加载这个jnilib文件,该文件是从Mac 10.6.2上该项目的运行实例获得的,System.load(lib.getAbsolutePath());然后我抛出了这个异常：java.lang.UnsatisfiedLinkError, mach-o but wrong architecture我已经用文件命令检查了jnilib：libScreenMatchProxy.jnilib: Mach-O dynamically linked shared library i386我猜这是因为jnilib文件是32位编译的,而Mac10.6自带的Java是64位.因此,我转到Jav...

python-OpenCv(Leopard)中的“ Mach-o,但结构错误”【代码】

我已经使用CMake从doc安装了OpenCV.我的?/ .profile文件是：export DYLD_LIBRARY_PATH=/usr/local/mysql/lib/ export ARCHFLAGS="-arch i386 -arch x86_64" export ARCHFLAGS="-arch i386" export VERSIONER_PYTHON_PREFER_64_BIT=yes export VERSIONER_PYTHON_PREFER_32_BIT=no export PATH=/opt/local/bin:/opt/local/sbin:$PATH export PYTHONPATH=/Library/Python/2.6/site-packages:$PYTHONPATH export DYLD_FALLBACK_LI...

mac os x显示进程产生的Mach与BSD系统调用的数量【代码】【图】

// syscalls_test.c#include <stdio.h> #include <fcntl.h>> #include <unistd.h> #include <mach/mach.h>int main() {int i, fd;mach_port_t p;kern_return_t kr;setbuf(stdout, NULL);printf("My pid is %d\n", getpid());printf("Note the number of Mach and Unix system calls, and press <enter>");(void)getchar();// At this point, we will have some base numbers of Mach and Unix// system calls made so ...

MACH3 飞雕卡变频器继电器控制开关

需要的材料：变频器说明书 MACH3飞雕卡使用说明要点：把变频器启动方式调整为端子启动（大约有3中方式，有手动的，面板上启动按钮启动，有继电器控制的，还有PWM控制的）飞雕卡24V供电给到飞雕卡24V接口，负极接到AVM端子，out1接到继电器控制引脚；继电器输出引脚接变频器的COM口和S1口；当然了如果要控制反转的话，增加一路继电器，还要连接COM口和S2口；

iOS高级进阶系列之-项目开发基础（下）Mach-O与链接器，Symbol！【图】

前言上篇文章多环境配置、Mach-O与链接器，但是Symbol还没又说道，这篇文章我们继续上篇文章内容讲下去 .xconnfig补充上面文章在介绍多环境配置的时候讲到了.xconnfig，说到了.xconnfig可以统一管理环境配置，这里可以根据不同的条件配置不同的设置，我们那Other Linker Flags来说明 [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vprpgo6G-1618818463303)(https://upload-images.jianshu.io/upload_...

iOS高级进阶系列之项目开发基础（上）多环境配置，Mach-O与链接器。【图】

前言最近对项目进行优化，就顺便写一些日常开发中会用到的中高级开发技巧。这篇文章聊一下下面三个内容：多环境配置，Mach-O与链接器，Symbol。多环境配置聊到多环境配置，我们先说几个概念上图就是每个项目都存在的，下面解释下红框内的内容：1.Project:包含了项目所有的代码，资源文件，所有信息。2.Target:对指定代码和资源文件的具体构建方式。3.Scheme:对指定Target的环境配置。我尝试加入各种iOS开发交流群，群里的气氛大致就...

MAC - 最热教程

Windows安装Chocolatey或Mac安装Homebr...如何在Mac上更新Chrome？设置apple watch解锁mac显示无法通信解...mac下安装magento2+nginx配置访问 VMware 16 安装苹果MAC OS 10.13 原版...mac鼠标会动但无法点击苹果Mac窗口整理、排列、缩放工具：Moo...php获取计算机唯一标识信息(cpu,网卡,M...MAC下MySQL的安装与使用 mac机启动apache出现问题启动不了

首页 / MAC / Machine Learning Done Wrong

Machine Learning Done Wrong

内容导读

内容图文

1. Take default loss function for granted

2. Use plain linear models for non-linear interaction

3. Forget about outliers

4. Use high variance model when n<<p

5. L1/L2/... regularization without standardization

6. Use linear model without considering multi-collinear predictors

7. Interpreting absolute value of coefficients from linear or logistic regression as feature importance

Cheng-Tao Chu

内容总结

内容备注

内容手机端

【Machine Learning Done Wrong】教程文章相关的互联网学习教程文章

MAC - 最新教程

MAC - 最热教程