SPDX-License-Identifier: GPL-3.0-or-later
-->
+## v4.1.0
+
+### Notable Changes
+
+#### Benchmark with CLI
+
+Comparing the performance of new features is difficult without reliable
+benchmarking, so now the CLI supports a `--benchmark` argument. Just call
+`nano-pow --benchmark 100` to execute a `work_generate` benchmark of 100 samples
+of random blockhashes. There is also a convenience script in the package to run
+1000 samples by using `npm run benchmark`.
+
+#### Refactor GPU compute shader
+
+After extensive additional testing, the real performance gains for the WebGPU
+compute shader come from two sources. The first is tuning a good balance of
+workgroup size and dispatch size, and to that end, the workgroup size has been
+set to 96 which was found during testing on an Nvidia RTX 3070 to be the lowest
+value that still saturated warp occupancy and active thread count. The second is
+native 64-bit data types, and WGSL simply does not support `u64` integers yet,
+so the compute shader has been reverted to a much simpler version that performs
+just as well as more complex versions across tens of thousands of benchmark
+samples.
+
+### Other Changes
+
+Store seed and blockhash in fast shared workgroup memory.
+
+Fix documentation in inline help and manual page.
+
+Fix CLI default port collision with server default port.
+
+Change timestamp on server log files to something more user-friendly.
+
+Test concurrent curl requests to NanoPow server.
+
+Fix action listed in server response error message.
+
+Prevent unnecessary favicon load in puppeteer source and in test webpage.
+
+
+
## v4.0.11
### Notable Changes