Question 1

How fast is NumPy matrix multiplication?

Accepted Answer

NumPy matrix multiplication (np.matmul) is extremely fast thanks to optimized BLAS libraries. A 100x100 matmul takes just 0.012 ms, a 500x500 takes 0.676 ms, and a 1000x1000 takes 5.447 ms. This is because NumPy delegates to hardware-optimized routines (Apple Accelerate, OpenBLAS, or Intel MKL) that use SIMD instructions, cache blocking, and multi-threading. The Python overhead is negligible compared to the BLAS computation.

Question 2

Why is eigenvalue computation so much slower than matrix multiplication?

Accepted Answer

Eigenvalue computation is inherently more expensive: it requires O(10n^3) operations compared to O(n^3) for matmul. At 1000x1000, eigenvalues take 237.5 ms vs 5.4 ms for matmul — roughly 44x slower. This is because eigenvalue computation requires iterative QR steps that converge gradually, while matmul is a single direct computation. SVD (105 ms at 1000x1000) falls between the two at approximately O(4n^3).

Question 3

What is the fastest matrix operation in NumPy?

Accepted Answer

Trace (np.trace) is the fastest operation — it simply sums diagonal elements in O(n) time. At 1000x1000 it takes just 0.003 ms. Transpose is the second fastest at 0.641 ms for 1000x1000 (O(n^2) copy). These operations do not invoke LAPACK at all. Among LAPACK-backed operations, matmul is fastest due to highly optimized BLAS Level 3 (GEMM) implementations.

Question 4

How does matrix size affect NumPy performance?

Accepted Answer

Performance scales super-linearly with matrix size due to the cubic complexity of most operations. Doubling the matrix dimension (e.g., 500 to 1000) increases computation time by roughly 8x for O(n^3) operations. However, BLAS libraries achieve better hardware utilization at larger sizes — the 1000x1000 matmul runs at higher GFLOPS than the 100x100 one because it can better utilize cache lines, SIMD pipelines, and multi-core parallelism.

Question 5

Which BLAS backend does NumPy use and does it matter?

Accepted Answer

NumPy supports multiple BLAS backends: Apple Accelerate (macOS default), OpenBLAS (Linux default), Intel MKL (Anaconda default), and BLIS. For matmul at 1000x1000, MKL and Accelerate are typically 10-30% faster than OpenBLAS. You can check your backend with np.show_config(). For most operations the difference is under 2x, but for eigenvalue/SVD computations the LAPACK implementation quality matters more than the BLAS backend.

Operation	Matrix Size	Avg Time (ms)	Std Dev (ms)	Ops/Second	Complexity
matmul	10x10	0.006	0.010	166,667	O(n^3)
matmul	50x50	0.208	0.406	4,808	O(n^3)
matmul	100x100	0.012	0.002	83,333	O(n^3)
matmul	250x250	0.096	0.001	10,417	O(n^3)
matmul	500x500	0.676	0.071	1,479	O(n^3)
matmul	1000x1000	5.447	0.888	184	O(n^3)
inverse	10x10	0.010	0.008	100,000	O(n^3)
inverse	50x50	0.211	0.351	4,739	O(n^3)
inverse	100x100	0.095	0.013	10,526	O(n^3)
inverse	250x250	0.558	0.033	1,792	O(n^3)
inverse	500x500	2.670	0.110	375	O(n^3)
inverse	1000x1000	15.627	0.670	64	O(n^3)
determinant	10x10	0.300	0.592	3,333	O(n^3)
determinant	50x50	0.133	0.238	7,519	O(n^3)
determinant	100x100	0.390	0.701	2,564	O(n^3)
determinant	250x250	0.229	0.005	4,367	O(n^3)
determinant	500x500	1.377	0.643	726	O(n^3)
determinant	1000x1000	4.486	0.127	223	O(n^3)
eigenvalues	10x10	0.340	0.629	2,941	O(n^3)
eigenvalues	50x50	0.474	0.401	2,110	O(n^3)
eigenvalues	100x100	1.919	0.129	521	O(n^3)
eigenvalues	250x250	12.876	0.102	78	O(n^3)
eigenvalues	500x500	48.411	0.341	21	O(n^3)
eigenvalues	1000x1000	237.491	2.326	4	O(n^3)
svd	10x10	0.291	0.539	3,436	O(n^3)
svd	50x50	0.661	0.844	1,513	O(n^3)
svd	100x100	0.863	0.043	1,159	O(n^3)
svd	250x250	5.305	0.410	189	O(n^3)
svd	500x500	22.130	0.771	45	O(n^3)
svd	1000x1000	105.221	1.920	10	O(n^3)
transpose	10x10	0.001	0.001	1,000,000	O(n^2)
transpose	50x50	0.002	0.001	500,000	O(n^2)
transpose	100x100	0.004	0.001	250,000	O(n^2)
transpose	250x250	0.026	0.008	38,462	O(n^2)
transpose	500x500	0.103	0.031	9,709	O(n^2)
transpose	1000x1000	0.641	0.035	1,560	O(n^2)
trace	10x10	0.002	0.002	500,000	O(n)
trace	50x50	0.002	0.001	500,000	O(n)
trace	100x100	0.002	0.001	500,000	O(n)
trace	250x250	0.002	0.002	500,000	O(n)
trace	500x500	0.004	0.004	250,000	O(n)
trace	1000x1000	0.003	0.002	333,333	O(n)
qr	10x10	0.029	0.024	34,483	O(n^3)
qr	50x50	0.972	1.830	1,029	O(n^3)
qr	100x100	0.184	0.013	5,435	O(n^3)
qr	250x250	1.079	0.038	927	O(n^3)
qr	500x500	4.586	0.150	218	O(n^3)
qr	1000x1000	24.492	0.313	41	O(n^3)

Matrix Operation Benchmark — NumPy Performance Across Matrix Sizes

Methodology

Key Findings

Frequently Asked Questions

Related Tools