Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Dataiter – Python classes for data manipulation (github.com/otsaloma)
3 points by otsaloma on Oct 22, 2020 | hide | past | favorite | 1 comment


Hi! Working in data science I spend most of my time downloading data, putting it together, cleaning or piping data somewhere. When working with R, I've learned to appreciate dplyr (and tidyverse) for their API consistency and the pipe operator, and I miss that when working with Python. Base Python and Pandas get the job done, but I find them often unpleasant to use.

So, I created a Python package with classes to handle different kinds of data, currently DataFrame for tabular data, ListOfDicts for possibly hierarchic data typically used with various JSON APIs, and the latest addition GeoJSON for spatial data. The classes provide various basic data manipulation methods.

My focus here has been on creating a nice API. I don't usually deal with huge datasets, so there's no performance innovation here. The DataFrame class in built on top of NumPy, so it does fast vectorized computation, but is likely to be a bit slower than Pandas.

Happy to hear any feedback and answer any questions!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: