PythonLib - 机器学习库专题

Posted on 2019-02-22 | Modified: 2019-03-17 | In Notes | 0 |

Words count in article: 1,347 | Reading time ≈ 6

Python 库简介 - 机器学习库专题。

#回归
from sklearn.linear_model import LinearRegression #线性回归
from sklearn.linear_model import Ridge #岭回归，可选alpha=.5
from sklearn.linear_model import BayesianRidge #贝叶斯岭回归
from sklearn.linear_model import Lasso #Lasso回归
from sklearn.tree import DecisionTreeRegressor #决策树回归
from sklearn.ensemble import RandomForestRegressor #随机森林回归
#分类
from sklearn.linear_model import LogisticRegression #逻辑回归
from sklearn.svm import SVC, LinearSVC #支持向量机
from sklearn.ensemble import RandomForestClassifier #随机森林分类器
from sklearn.neighbors import KNeighborsClassifier #K近邻
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron #感知机
from sklearn.linear_model import SGDClassifier #随机梯度下降分类器
from sklearn.tree import DecisionTreeClassifier #决策树分类器
#特征工程
import sklearn.preprocessing as preprocessing

参考：Ducumentation，User Guide，中文文档，API 参考

scikit-learn 线性回归算法库小结

https://kite.com/python/docs/sklearn

sklearn 库的学习 - yeal - CSDN 博客

Source: yhat

Source: map （蓝色圆框是决策条件，绿色方框是可选算法）

数据与预处理 Preprocessing

`sklearn.datasets`: Datasets

主要分成三类方法。

The dataset loaders：加载小型标准数据集。
- load_boston([return_X_y]) 波士顿房价回归
- load_iris([return_X_y]) 鸢尾花分类
- load_linnerud([return_X_y]) 多元回归
- load_digits([n_class, return_X_y]) 手写数字分类
- 这些数据集可用于快速验证算法性能。Seaborn 也有类似方法：seaborn.load_dataset('datasetName')
The dataset fetchers：下载和加载大型数据集。
The dataset generation functions：生成模拟数据集。

默认返回一个类字典，也可用.引用。包含：‘data’（数据），‘target’（标签），‘DESCR’（描述），‘filename’（文件地址）等。

如果加参数default=False则返回类型是元组tuple(X,y)，包含一个数据集矩阵data和标签向量target。

from sklearn import datasets

df = datasets.load_iris()
print(df['data'])
print(df.target)
print(df.target_names)
print(df['feature_names'])
print(df.DESCR)

降维 Dimensionality reduction

分类 Classification

回归 Regression

聚类 Clustering

模型选择 Model selection

1 2	#回归 from xgboost import XGBRegressor #xgboost回归

参考资料：

XGBoost Documentation — xgboost 0.81 documentation

XGBoost 中文文档 - ApacheCN

introduction to xgboost

XGBoost 的原理

XGBoost: A Scalable Tree Boosting System （Paper）

Introduction to Boosted Trees （PPT）

（待续）

参考资料：

Welcome to LightGBM’s documentation! — LightGBM documentation

LightGBM 中文文档 - ApacheCN

参考资料：

Statsmodels：模块简介

Documentation

Python Statsmodels 统计包之 OLS 回归

Python 中做时间序列分析 | GA 小站

《python 时间序列分析》或者 Complete guide to create a Time Series Forecast (with Codes in Python)【翻译版《时间序列预测全攻略（附带 Python 代码）》】

Citation

When using statsmodels in scientific publication, please consider using the following citation:

Seabold, Skipper, and Josef Perktold. “Statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference. 2010.

Bibtex entry:

@inproceedings{seabold2010statsmodels,
  title={Statsmodels: Econometric and statistical modeling with python},
  author={Seabold, Skipper and Perktold, Josef},
  booktitle={9th Python in Science Conference},
  year={2010},
}

参考资料：

Introduction — PyFlux 0.4.7 documentation

AR(I)MA 时间序列建模过程——步骤和 python 代码 - 简书

（待续）

1	import tensorflow as tf

参考 TensorFlow 文档。（More）

Tensor 是 Google 开源的深度学习框架，如其名 “张量流”，即以处理张量形式的数据流见长。

（待续）

Theano

Welcome — Theano 1.0.0 documentation - Deep Learning

Theano 教程系列 | 莫烦 Python

偏重符号代数处理（from），学术性更强。

（待续）

Tflearn

TFLearn | TensorFlow Deep Learning Library

【深度学习 Hello World 系列（一）】用 TFLearn 实现 MNIST - 简书

（待续）

1	form tensorflow import keras

参考： Keras 中文文档

Source: datacamp

（待续）

参考：PyTorch， PyTorch 中文文档

浅谈 Pytorch 与 Torch 的关系

PyTorch 中文手册（pytorch handbook）

（待续）

参考：Caffe | Caffe Tutorial，Caffe | Deep Learning Framework

（待续）

NVCaffe

参考：NVCaffe User Guide :: Deep Learning Frameworks Documentation

NVCaffe™ is an NVIDIA-maintained fork of BVLC Caffe tuned for NVIDIA GPUs, particularly in multi-GPU configurations.

参考资料：

MXNet: A Scalable Deep Learning Framework

Documentation

（待续）

参考资料：

http://www.paddlepaddle.org/

参考资料：

Natural Language Toolkit — NLTK 3.4 documentation，NLTK Book

Python 数据科学入门教程：NLTK - 简书

自然语言处理库。

（待续）

Jieba

参考资料：

jieba - PyPI

GitHub - fxsjy/jieba: 结巴中文分词

中文分词。

Jiagu

参考资料：

https://github.com/ownthink/Jiagu，[思知](https://www.ownthink.com/>)

Jiagu 以 BiLSTM 等模型为基础，使用大规模语料训练而成。将提供中文分词、词性标注、命名实体识别、关键词抽取、文本摘要、新词发现等常用自然语言处理功能。参考了各大工具优缺点制作，将 Jiagu 回馈给大家。

使用深度学习模型，效率较慢。（性能评估）

（待续）

参考资料：

GitHub - explosion/spaCy: Industrial-strength Natural Language …

spaCy Usage Documentation

Python spaCy | 我爱自然语言处理

如何用 Python 处理自然语言？（Spacy 与 Word Embedding）

（待续）

参考资料：

gensim: Topic modelling for humans - Radim Řehůřek

（待续）

参考资料：

GitHub - hankcs/HanLP: 自然语言处理中文分词词性标注命名实体识别 …

HanLP 在线演示

参考资料：

lingvo

各种 NLP 操作难实现？谷歌开源序列建模框架 Lingvo - 掘金

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

参考资料：

Microsoft Cognitive Toolkit (CNTK)

让我们一起来学习 CNTK 吧 - 知乎

语音识别。

（待续）

参考资料：

opencv-python · PyPI，OpenCV-Python Tutorials - Read the Docs

OpenCV · GitHub

OpenCV 中文指南

OpenCV-Python 中文文档

给深度学习入门者的 Python 快速教程 - 番外篇之 Python-OpenCV - 知乎

（待续）

数据与预处理 Preprocessing

sklearn.datasets: Datasets

降维 Dimensionality reduction

分类 Classification

回归 Regression

聚类 Clustering

模型选择 Model selection

Citation

Theano

Tflearn

NVCaffe

Jieba

Jiagu

`sklearn.datasets`: Datasets