What kind of GPU are you running this on? My 3080 seems to take about 30 seconds per image with 50 passes. I'm wondering if I'm missing out on some optimizations. Could just be the quality of Linux NVidia drivers.
I'd recommend trying a different fork. Perhaps you're using the the official one. I believe that one still "ramps up the system" on every image generation. Other repos do the ramp up only once.
I'm using 512x768 as the default, but a quick test shows only a marginal difference in speed between the two. I'll have to give Windows a try to see if it's the driver holding me back. Do you have any tips or resources for up-scaling the image after?