One tip to speed up how you're finding the closest matches -- openai's embedding vectors are normalized and all fall upon a sphere. So, the cosine similarity is equal to the dot product. Might speed things up a bit to use only the dot product.
The attention mechanism in transformer models also uses this trick - just dot product. I learned this during a long conversation with ChatGPT the other night - an excerpt of its reply to me:
The dot product between the query and key representations is similar to computing the cosine similarity between two vectors. The cosine similarity is a measure of the similarity between two vectors in a multi-dimensional space, and is defined as the dot product of the vectors normalized by their magnitudes.
The dot product of the query and key representations can be seen as an un-normalized version of the cosine similarity, in the sense that it computes the dot product of the two vectors. The result is a scalar value, which represents the similarity between the two vectors, the larger the scalar, the more similar the vectors are.