Chris Duncan [Sun, 9 Mar 2025 09:20:49 +0000 (01:20 -0800)]
Use file contents injected directly into HTML string to load NanoPow into puppeteer browser page instance. Add flags marked by Chrome dev team. WebGPU still does not seem to work; this is using WebGL, but it is at least working.
Chris Duncan [Sat, 1 Mar 2025 06:19:48 +0000 (22:19 -0800)]
Overhaul both WebGPU and WebGL to use vec4 for parallel operation hinting on supported hardware. Refactor WebGL BLAKE2b to simplify pixel-coordinate-based nonce variation, to unroll main G mix function loop for performance, and to better differentiate between search and validate processes. Simplify vertex shader now that it is only required for drawing the fullscreen quad and not for pixel coordinates. Create new downsampling fragment shader which enables larger canvases and more nonces per frame without introducing lag due to synchronous readback. Maintain canvas between draw calls unless effort has changed. Attempt to handle WebGL context loss, with improved reset function, by reinitializing class. Reduce promise stack increases when waiting for query result. Fix color buffer clearing by using correct API function. Improve nonce seed generation in both WebGL and WebGPU by switching from crypto random to insecure random which is OK in the context of PoW. Reduce garbage collection by reusing static variables. Add debugging throughout that obeys user-provided debug flag which is now stored as a static variable as well. Add Typescript typings for new WebGL types. Fix minor issues with test page. Add benchmark results table.
Chris Duncan [Wed, 5 Feb 2025 14:18:33 +0000 (06:18 -0800)]
Found the culprit. Atomic exchange is actually 40ms slower than atomic load, so revert to conditional load-and-store. Makes sense, it's doing two operations.
Chris Duncan [Wed, 5 Feb 2025 05:01:41 +0000 (21:01 -0800)]
Benchmarking shows vec4 version actually ended up being slower, probably due to increased overhead and register pressure. Revert to vec2 implementation.
Chris Duncan [Mon, 3 Feb 2025 23:15:59 +0000 (15:15 -0800)]
Convert eight sequential rounds of vec2 G mixing into parallelized four rounds of vec4 G. Read threshold direct from uniform to save a redundant assignment. Fix test page validation executing on every single input event. Delete benchmark file since it is completely outdated. Delete bundle to be reuploaded after tweaking new build. Update comment documentation.
Chris Duncan [Sun, 26 Jan 2025 08:20:52 +0000 (00:20 -0800)]
Overhaul NanoPowGl to use vector operations which greatly simplifies the Blake2b algorithm. Simplify nonce generation and work output by taking advantage of WebGL2 32-bit vector RGBA pixel type options. Fix validation in NanoPowGl by restricting result to passed nonce. Clean up unused variables. Minor test page fixes.
Chris Duncan [Sat, 25 Jan 2025 07:29:26 +0000 (23:29 -0800)]
Revert to single dimensional workgroup size since that apparently does in fact matter in regards to wavefronts and not just number of threads per workgroup.
Chris Duncan [Tue, 21 Jan 2025 18:14:49 +0000 (10:14 -0800)]
Update README with new options object. Add example of async module loading in HTML. Reword global namespace pollution option to discourage it as a practice. Add sections for acknowledgements and licenses. Whitespace.