测试脚本: https://gist.github.com/markus-beuckelmann/8bc25531b11158431a5b09a45abd6276
很好奇这一代 M1 Pro Max 在 Python 科学计算上的提升有多大,之前 v 友测的上一代 M1 的算力在不谈功耗的情况下大概和 i5 互有胜负: https://v2ex.com/t/733777
1
haogefeifei Nov 5, 2021 纯多核运算应该占不到什么便宜,不过哪来用丝毫不差就是了
M1: Dotted two 4096x4096 matrices in 0.77 s. Dotted two vectors of length 524288 in 0.27 ms. SVD of a 2048x1024 matrix in 0.90 s. Cholesky decomposition of a 2048x2048 matrix in 0.11 s. Eigendecomposition of a 2048x2048 matrix in 7.55 s. 虚拟机 AMD 3700X 4.1Ghz: Dotted two 4096x4096 matrices in 0.44 s. Dotted two vectors of length 524288 in 0.03 ms. SVD of a 2048x1024 matrix in 0.58 s. Cholesky decomposition of a 2048x2048 matrix in 0.10 s. Eigendecomposition of a 2048x2048 matrix in 6.16 s. |
2
hiwind Nov 5, 2021
size 应该定为多少合适?我来试一下 10 核 M1 Pro
|
3
hiwind Nov 5, 2021 直接跑 M1 Pro
好像不咋滴 Dotted two 4096x4096 matrices in 0.67 s. Dotted two vectors of length 524288 in 0.26 ms. SVD of a 2048x1024 matrix in 1.04 s. Cholesky decomposition of a 2048x2048 matrix in 0.09 s. Eigendecomposition of a 2048x2048 matrix in 9.24 s. |
4
wilhexm Nov 5, 2021 16 inch M1 Max 24 Core GPU
Dotted two 4096x4096 matrices in 0.55 s. Dotted two vectors of length 524288 in 0.25 ms. SVD of a 2048x1024 matrix in 1.32 s. Cholesky decomposition of a 2048x2048 matrix in 0.08 s. Eigendecomposition of a 2048x2048 matrix in 6.79 s. |
5
pb941129 Nov 5, 2021 via iPhone 之前帖子回复过 16 寸 i9 的跑分,刚在 Monterey 上跑了下,速度基本上一致。从楼上 M1 Pro 的速度来看,感觉如果是用于 Python 科学计算的话,M1 Pro 还是做不了啥事……
|
6
Aspector Nov 5, 2021
Deprecated since version 1.20: The native libraries on macOS, provided by Accelerate, are not fit for use in NumPy since they have bugs that cause wrong output under easily reproducible conditions. If the vendor fixes those bugs, the library could be reinstated, but until then users compiling for themselves should use another linear algebra library or use the built-in (but slower) default, see the next section.
现在的 numpy 用 Accelerate 了吗?苹果是没管这些 bug ? |
7
icyalala Nov 5, 2021 M1 不用 Accelerate 就相当于在 Intel 上不用 AVX2
|
8
EyreYoung Nov 5, 2021 18 款 i7-8750 (好像是这个)
Dotted two 2048x2048 matrices in 0.07 s. Dotted two vectors of length 262144 in 0.02 ms. SVD of a 1024x512 matrix in 0.05 s. Cholesky decomposition of a 1024x1024 matrix in 0.01 s. Eigendecomposition of a 1024x1024 matrix in 0.63 s. Dotted two 4096x4096 matrices in 0.63 s. Dotted two vectors of length 524288 in 0.10 ms. SVD of a 2048x1024 matrix in 0.35 s. Cholesky decomposition of a 2048x2048 matrix in 0.09 s. Eigendecomposition of a 2048x2048 matrix in 4.15 s. |
9
boboliu Nov 5, 2021 @Aspector #6
@icyalala #7 https://github.com/numpy/numpy/pull/18874 > This pull request is to add support for Accelerate back to NumPy |
10
dbsquirrel Nov 5, 2021 Dotted two 4096x4096 matrices in 1.85 s.
Dotted two vectors of length 524288 in 0.24 ms. SVD of a 2048x1024 matrix in 0.68 s. Cholesky decomposition of a 2048x2048 matrix in 0.15 s. Eigendecomposition of a 2048x2048 matrix in 5.75 s. 风扇直接起飞,mbp 2016 ( 2.9 GHz i5 ) |
12
0Vincent0Zhang0 Nov 5, 2021 M1 Max 64g 现在的结果:
Dotted two 4096x4096 matrices in 0.70 s. Dotted two vectors of length 524288 in 0.25 ms. SVD of a 2048x1024 matrix in 1.99 s. Cholesky decomposition of a 2048x2048 matrix in 0.10 s. Eigendecomposition of a 2048x2048 matrix in 10.36 s. 还有待优化。 |
13
hiwind Nov 5, 2021
跟环境好像有点关系吧
两个 NOT_AVAILABLE 是不是对结果有影响? @astrophys @Aspector Dotted two 4096x4096 matrices in 0.65 s. Dotted two vectors of length 524288 in 0.26 ms. SVD of a 2048x1024 matrix in 0.93 s. Cholesky decomposition of a 2048x2048 matrix in 0.09 s. Eigendecomposition of a 2048x2048 matrix in 9.90 s. This was obtained using the following Numpy configuration: blas_mkl_info: NOT AVAILABLE blis_info: NOT AVAILABLE openblas_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/arm64-builds/lib'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/opt/arm64-builds/lib'] blas_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/arm64-builds/lib'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/opt/arm64-builds/lib'] lapack_mkl_info: NOT AVAILABLE openblas_lapack_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/arm64-builds/lib'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/opt/arm64-builds/lib'] lapack_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/arm64-builds/lib'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/opt/arm64-builds/lib'] Supported SIMD extensions in this NumPy install: baseline = NEON,NEON_FP16,NEON_VFPV4,ASIMD found = ASIMDHP not found = ASIMDDP |
14
astrophys OP 贴个 2019 16 寸 i9 64g 的结果:
Dotted two 4096x4096 matrices in 0.45 s. Dotted two vectors of length 524288 in 0.05 ms. SVD of a 2048x1024 matrix in 0.29 s. Cholesky decomposition of a 2048x2048 matrix in 0.07 s. Eigendecomposition of a 2048x2048 matrix in 3.23 s. |
15
astrophys OP @dejavuwind 用 MKL 和多线程肯定会快,我贴的是有 MKL 的。
|
16
tiramice Nov 5, 2021
w-2175 虚拟机 8 核
Dotted two 4096x4096 matrices in 0.29 s. Dotted two vectors of length 524288 in 0.03 ms. SVD of a 2048x1024 matrix in 0.50 s. Cholesky decomposition of a 2048x2048 matrix in 0.12 s. Eigendecomposition of a 2048x2048 matrix in 4.47 s. |
17
astrophys OP @Aspector 在 numpy 的 1.20.0 版本移除了 accelerate framework 的支持,今天正好有人问了这个问题: https://stackoverflow.com/questions/69848969/how-to-build-numpy-from-source-linked-to-apple-accelerate-framework#
|
18
sharpy Nov 5, 2021
16 寸 i9
Dotted two 4096x4096 matrices in 0.41 s. Dotted two vectors of length 524288 in 0.04 ms. SVD of a 2048x1024 matrix in 0.28 s. Cholesky decomposition of a 2048x2048 matrix in 0.07 s. Eigendecomposition of a 2048x2048 matrix in 2.89 s. |
19
volvo007 Nov 5, 2021
2020 mbp13 intel 顶配
Dotted two 4096x4096 matrices in 0.98 s. Dotted two vectors of length 524288 in 0.20 ms. SVD of a 2048x1024 matrix in 0.49 s. Cholesky decomposition of a 2048x2048 matrix in 0.11 s. Eigendecomposition of a 2048x2048 matrix in 4.16 s. |
20
cxxlxx Nov 5, 2021
@haogefeifei 为啥我 5900x 比你差好多,无论是 wsl 还是 Windows
Dotted two 4096x4096 matrices in 0.39 s. Dotted two vectors of length 524288 in 0.14 ms. SVD of a 2048x1024 matrix in 1.34 s. Cholesky decomposition of a 2048x2048 matrix in 0.08 s. Eigendecomposition of a 2048x2048 matrix in 4.80 s. |
21
rpman Nov 5, 2021
apple silicon 的支持还在修修补补阶段, 要用可以自己找 commit 去编译
|
22
thedrwu Nov 6, 2021 via Android
本地们调试能画图就行,运算丢给服务器和超算了
|
24
yangbin9317 Nov 6, 2021
Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
Dotted two 4096x4096 matrices in 0.34 s. Dotted two vectors of length 524288 in 0.02 ms. SVD of a 2048x1024 matrix in 1.03 s. Cholesky decomposition of a 2048x2048 matrix in 0.61 s. Eigendecomposition of a 2048x2048 matrix in 9.66 s. |
25
20015jjw Nov 6, 2021
16c mac pro / 96g
Dotted two 4096x4096 matrices in 0.28 s. Dotted two vectors of length 524288 in 0.02 ms. SVD of a 2048x1024 matrix in 0.56 s. Cholesky decomposition of a 2048x2048 matrix in 0.07 s. Eigendecomposition of a 2048x2048 matrix in 4.00 s. 我比较好奇的是,这么小规模的测试,误差很大吧... |
26
20015jjw Nov 6, 2021
@20015jjw 再跑了一次
Dotted two 4096x4096 matrices in 0.26 s. Dotted two vectors of length 524288 in 0.02 ms. SVD of a 2048x1024 matrix in 0.50 s. Cholesky decomposition of a 2048x2048 matrix in 0.07 s. Eigendecomposition of a 2048x2048 matrix in 3.77 s. 第三个相差都 10%了... 两次前后跑的,该跑的东西啥都没关 |
28
MongkeMary Nov 6, 2021
16 寸低配 MBP M1 Pro 10 核
Dotted two 4096x4096 matrices in 0.56 s. Dotted two vectors of length 524288 in 0.25 ms. SVD of a 2048x1024 matrix in 0.67 s. Cholesky decomposition of a 2048x2048 matrix in 0.08 s. Eigendecomposition of a 2048x2048 matrix in 6.88 s. |
29
MongkeMary Nov 6, 2021
@astrophys 有没有 MKL 还是很关键的,这种运输 openblas 的性能和 MKL 还是有差距的
|
30
astrophys OP @MongkeMary 是的呀,m1 的话就看有没有用 accelerate framework 了
|
31
Aspector Nov 8, 2021 via iPhone |
32
shinecurve Nov 12, 2021
暗影精灵 7
i7-11800H Dotted two 4096x4096 matrices in 0.39 s. Dotted two vectors of length 524288 in 0.05 ms. SVD of a 2048x1024 matrix in 0.26 s. Cholesky decomposition of a 2048x2048 matrix in 0.08 s. Eigendecomposition of a 2048x2048 matrix in 2.57 s. 给大家做一个参考 |
33
lqcc Dec 4, 2021 M1 macbook air ,用的 accelerate 库编译的 numpy ,速度还可以。
Dotted two 4096x4096 matrices in 0.60 s. Dotted two vectors of length 524288 in 0.11 ms. SVD of a 2048x1024 matrix in 0.52 s. Cholesky decomposition of a 2048x2048 matrix in 0.06 s. Eigendecomposition of a 2048x2048 matrix in 5.98 s. This was obtained using the following Numpy configuration: blas_mkl_info: NOT AVAILABLE blis_info: NOT AVAILABLE openblas_info: NOT AVAILABLE accelerate_info: extra_compile_args = ['-I/System/Library/Frameworks/vecLib.framework/Headers'] extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)] blas_opt_info: extra_compile_args = ['-I/System/Library/Frameworks/vecLib.framework/Headers'] extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)] lapack_mkl_info: NOT AVAILABLE openblas_lapack_info: NOT AVAILABLE openblas_clapack_info: NOT AVAILABLE flame_info: NOT AVAILABLE lapack_opt_info: extra_compile_args = ['-I/System/Library/Frameworks/vecLib.framework/Headers'] extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)] Supported SIMD extensions in this NumPy install: baseline = NEON,NEON_FP16,NEON_VFPV4,ASIMD found = ASIMDHP,ASIMDDP,ASIMDFHM not found = |