I've been working on a fork of vllm-mlx (OpenAI-compatible LLM server for Apple Silicon) to make it actually usable for coding agents. The upstream project is great but was
missing production-grade tool calling, reasoning separation, and multi-turn performance.
What I added (37 commits):
- Tool calling that works — streaming + non-streaming, supports MiniMax and Hermes/Qwen3 formats. 4/4 accuracy on structured function calling benchmarks.
- Reasoning separation — MiniMax-M2.5 mixes reasoning into its output with no tags. Built a heuristic parser that cleanly separates reasoning from content (0% leak rate, was 60%
with the generic parser).
- Prompt cache for SimpleEngine — persistent KV cache across requests. On 33K-token coding agent contexts: TTFT goes from 28s to 0.3s on cache hit. This is the single biggest
improvement for multi-turn use.
- 1500+ tests — parsers, engine, server, tool calling. The upstream had minimal test coverage.
Benchmarks (Mac Studio M3 Ultra, 256GB):
Qwen3-Coder-Next-6bit (80B MoE, 3B active):
- Decode: 65 tok/s
- Prefill: 1090-1440 tok/s
- TTFT (cache hit, 33K context): 0.3s
MiniMax-M2.5-4bit (229B MoE):
- Decode: 33-38 tok/s
- Deep reasoning with tool calling
I built this to run OpenClaw locally on my Mac instead of paying for cloud APIs. Qwen3-Coder-Next at 65 tok/s with tool calling is genuinely usable — not a toy demo.
Quick start:
pip install git+https://github.com/raullenchai/vllm-mlx.git
python -m vllm_mlx.server \
--model lmstudio-community/Qwen3-Coder-Next-MLX-6bit \
--tool-call-parser hermes --port 8000
GitHub: https://github.com/raullenchai/vllm-mlx
I built this because I got tired of Claude choking when I pasted 5,000 lines of server logs, or worrying about leaving sensitive environment variables in my chat history forever.
What is it? vnsh (vanish) is a CLI tool and web app that encrypts data client-side and uploads it to a host-blind storage (Cloudflare R2). It generates a link where the decryption key is in the URL hash fragment.
Architecture:
Encryption: AES-256-CBC via WebCrypto API.
Transport: The key never leaves your device (browser or CLI). The server only sees an encrypted blob.
Storage: Cloudflare Workers + R2.
Integration: It has a native MCP (Model Context Protocol) server. If you use Claude Code or desktop, the agent can "read" these links directly without you pasting the text.
The Stack: Typescript, Hono, Cloudflare Workers, React (Web), Node (CLI).
It's fully open source. I'm looking for feedback on the crypto implementation and the MCP integration flow.
Running a long Claude Code session? Need to step away from your desk? Claw lets you monitor and control Claude Code from any device with a browser.
See what Claude is doing in real-time from any screen
Send quick responses (yes/no/continue) with one tap
Interrupt with Ctrl+C when things go sideways
Monitor everything — sessions, windows, panes, git status, system stats
Most crypto users have custodial assets (e.g. those held on an exchange) which don't have a unique address. Therefore we need some identity layer to tie together a specific users transactions for tax purposes. That can be Google sign-in, Coinbase sign-in, throwaway email + password
IoTeX Network | Palo Alto, California | Full Stack/Frontend Engineers | Full-time/Part-time/Intern | https://iotex.io IoTeX is building the auto-scalable and privacy-centric blockchain infrastructure designed and optimized for the Internet of Things (IoT). Full Stack/Frontend engineers are needed to speed up our product development process.
Apply here: https://iotex.io/careers
IoTeX Network | Palo Alto, California | Full Stack/Frontend Engineers | Full-time/Part-time/Intern | https://iotex.io
IoTeX is building the auto-scalable and privacy-centric blockchain infrastructure designed and optimized for the Internet of Things (IoT). Full Stack/Frontend engineers are needed to speed up our product development process.