More

raullen · 2026-02-26T05:52:47 1772085167

I've been working on a fork of vllm-mlx (OpenAI-compatible LLM server for Apple Silicon) to make it actually usable for coding agents. The upstream project is great but was missing production-grade tool calling, reasoning separation, and multi-turn performance.

  What I added (37 commits):

  - Tool calling that works — streaming + non-streaming, supports MiniMax and Hermes/Qwen3 formats. 4/4 accuracy on structured function calling benchmarks.
  - Reasoning separation — MiniMax-M2.5 mixes reasoning into its output with no tags. Built a heuristic parser that cleanly separates reasoning from content (0% leak rate, was 60%
   with the generic parser).
  - Prompt cache for SimpleEngine — persistent KV cache across requests. On 33K-token coding agent contexts: TTFT goes from 28s to 0.3s on cache hit. This is the single biggest
  improvement for multi-turn use.
  - 1500+ tests — parsers, engine, server, tool calling. The upstream had minimal test coverage.

  Benchmarks (Mac Studio M3 Ultra, 256GB):

  Qwen3-Coder-Next-6bit (80B MoE, 3B active):
  - Decode: 65 tok/s
  - Prefill: 1090-1440 tok/s
  - TTFT (cache hit, 33K context): 0.3s

  MiniMax-M2.5-4bit (229B MoE):
  - Decode: 33-38 tok/s
  - Deep reasoning with tool calling

  I built this to run OpenClaw locally on my Mac instead of paying for cloud APIs. Qwen3-Coder-Next at 65 tok/s with tool calling is genuinely usable — not a toy demo.

  Quick start:

  pip install git+https://github.com/raullenchai/vllm-mlx.git
  python -m vllm_mlx.server \
    --model lmstudio-community/Qwen3-Coder-Next-MLX-6bit \
    --tool-call-parser hermes --port 8000

  GitHub: https://github.com/raullenchai/vllm-mlx

raullen · 2026-01-25T05:11:10 1769317870

OP here.

I built this because I got tired of Claude choking when I pasted 5,000 lines of server logs, or worrying about leaving sensitive environment variables in my chat history forever.

What is it? vnsh (vanish) is a CLI tool and web app that encrypts data client-side and uploads it to a host-blind storage (Cloudflare R2). It generates a link where the decryption key is in the URL hash fragment.

Architecture:

Encryption: AES-256-CBC via WebCrypto API.

Transport: The key never leaves your device (browser or CLI). The server only sees an encrypted blob.

Storage: Cloudflare Workers + R2.

Integration: It has a native MCP (Model Context Protocol) server. If you use Claude Code or desktop, the agent can "read" these links directly without you pasting the text.

The Stack: Typescript, Hono, Cloudflare Workers, React (Web), Node (CLI).

It's fully open source. I'm looking for feedback on the crypto implementation and the MCP integration flow.

Repo: https://github.com/raullenchai/vnsh Web: https://vnsh.dev

raullen · 2026-01-17T20:15:57 1768680957

Running a long Claude Code session? Need to step away from your desk? Claw lets you monitor and control Claude Code from any device with a browser.

See what Claude is doing in real-time from any screen Send quick responses (yes/no/continue) with one tap Interrupt with Ctrl+C when things go sideways Monitor everything — sessions, windows, panes, git status, system stats

raullen · on Jan 16, 2021

Mine dies for 10min

raullen · on Oct 29, 2020

Why cointracker starts with Coinbase/Google account, rather than an BTC/ETH addr?

chanfest22 · on Oct 29, 2020

Most crypto users have custodial assets (e.g. those held on an exchange) which don't have a unique address. Therefore we need some identity layer to tie together a specific users transactions for tax purposes. That can be Google sign-in, Coinbase sign-in, throwaway email + password

rwiggum · on Oct 29, 2020

With proper BTC usage, you never reuse an address. It's perfect forward secrecy.

raullen · on Oct 10, 2020

This thread demonstrates how little people care about their privacy...

raullen · on Jan 3, 2019

IoTeX Network | Palo Alto, California | Full Stack/Frontend Engineers | Full-time/Part-time/Intern | https://iotex.io IoTeX is building the auto-scalable and privacy-centric blockchain infrastructure designed and optimized for the Internet of Things (IoT). Full Stack/Frontend engineers are needed to speed up our product development process. Apply here: https://iotex.io/careers

raullen · on Jan 1, 2019

IoTeX Network | Palo Alto, California | Full Stack/Frontend Engineers | Full-time/Part-time/Intern | https://iotex.io IoTeX is building the auto-scalable and privacy-centric blockchain infrastructure designed and optimized for the Internet of Things (IoT). Full Stack/Frontend engineers are needed to speed up our product development process.

Apply here: https://iotex.io/careers

raullen · on April 3, 2016

https://github.com/search?q=%22hash_salt%3D%22&ref=searchres... is a better link.

raullen · on Dec 3, 2015

Google's HTTP loadbanlancer and CDN have supported H2 for a long while.