【4】进程管理-12-为Python函数提供轻量级管道任务--Joblib

Joblib 包括为Python函数提供轻量级管道任务(pipeline job)服务的一系列工具,包括透明磁盘IO缓冲、快速序列化、简单并行化运行、日志服务等,为大数据集的快速可靠处理进行了优化,特别针对numpy数组的处理进行了优化。

项目主页:http://www.open-open.com/lib/view/home/1423032691467

文档说明:http://pythonhosted.org/joblib

一、IO缓冲

二、并行化运行

>>> from math import sqrt
>>> [sqrt(i ** 2) for i in range(10)]
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

我们也可以指定两个CPU来运行

>>> from math import sqrt
>>> from joblib import Parallel, delayed
>>> Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

五、讨论:

1 .Multiprocessing-backed parallel loops cannot be nested below threads

解决办法:不用怎么办,这个报错仅仅是一个提示

It always works even we you get, it's just that nested calls are not supported by multiprocessing, therefore joblib detects that and convert nested calls into sequential code on the fly and let you know that this happens. So the warning is not a bug, it's a feature. Closing. Feel free to reopen if you think you get the warning in cases where you don't expect it to happen (e.g. not a nested parallelism case).

参考资料:https://github.com/joblib/joblib/issues/180

2.can’t pickle instancemethod objects

问题原因:joblib.Parallel(n_jobs=20)(joblib.delayed()(sample) for sample in self.chip)  , joblib不能在类中被利用

参考资料:

http://www.open-open.com/lib/view/open1423032691467.html

个人公众号,比较懒,很少更新,可以在上面提问题:

更多精彩,请移步公众号阅读:

Sam avatar
About Sam
专注生物信息 专注转化医学