Re: data-oriented data structures... random (not XOR)...
I have been running multiple iterations of entire ipv4 space port scan. As of now only scanning top 25 used ports (nmap, masscan have toplists). I wanted to be able to do simple data analysis _very_ efficiently time-wise, being OK to sacrifice memory for that.
If you want (time-wise) O(1) item insert (into SORTED btw), fetch, lookup, and port status check to simply be a matter of bitshifting (similarly with counting), then:
1. Bitfield array to store status of up to 32 ports (1 bit for state (open/closed) => 32 bit bitfield)
2. ...that's it. Each scan result is to be found at
`bitfields[(unsigned int) ipv4_address]`
In C:
```
// bitfield for port status, for each IP
struct port_field {
bool p1:1;
bool p2:1;
bool p3:1;
bool p4:1;
bool p5:1;
// in C, gotta write it out - of course we could use a macro to generate this
...
bool p32:1;
};
```
This will use 16 GiB of memory for the whole mapped space:
in_addr_t u32_ip; // unsigned 32 bit int
struct port_field *p_target_bitfield;
int *p_target_int;
// ... to insert:
if (!(u32_ip = inet_addr(token))) { // `token` is string with actual ip (e.g. from text file)
printf("line %lu: IPv4 address not valid: %s\n", line_count, s_ip);
} else {
p_target_bitfield = &(ip_space[u32_ip]); // unsigned int ipv4 as 'index'
p_target_int = (int *) ((void *) p_target_bitfield); // cast bitfield* to int\*
// set bit at port_index:
*p_target_int = ((1 << port_index) | *p_target_int);
// now, port identified by `port_index` is at `(1 << port_index) | *p_target_int)`
// where `p_target_int` is pointer to port status bitfield cast into signed int32
```
It works - pretty nifty :) i'm sure i could make it much more pretty tho.
But a kind of 'columnar-bitfieldish' in-memory O(1) for everything:)*
Hm I guess you're right - I'm misusing the term. And yeah! Will experiment with it; what's neat is that it's not that much code to mess around with these basic notions...
Re: 224 << 24 - you're right; so many unusable actual addresses. It's just kind of neat to actually map out whole ipv4 space to memory. But yes lots of it unneeded, I'll see if I can add minimum-computation-possible mapping translation so that everything still stays ~kind of O(1).
Thank you for your comments! :)
edit P.S. 25 x 512MiB arrays - actually thank you, I thought of doing sth like that at first, but now forget why didn't start experimenting with that sort-of-actual-columnar-store from the beginning.. anyway, nice to quickly mess around with multiple base data layouts (I'll try that one next I think), would recommend anyone wanting to attain base knowledge on e.g. data layouts for data analysis...
I have been running multiple iterations of entire ipv4 space port scan. As of now only scanning top 25 used ports (nmap, masscan have toplists). I wanted to be able to do simple data analysis _very_ efficiently time-wise, being OK to sacrifice memory for that.
If you want (time-wise) O(1) item insert (into SORTED btw), fetch, lookup, and port status check to simply be a matter of bitshifting (similarly with counting), then:
1. Bitfield array to store status of up to 32 ports (1 bit for state (open/closed) => 32 bit bitfield)
2. ...that's it. Each scan result is to be found at `bitfields[(unsigned int) ipv4_address]`
In C:
``` // bitfield for port status, for each IP
```This will use 16 GiB of memory for the whole mapped space:
```
port_field)); ```When scanning (or reading in scan results):
```
```It works - pretty nifty :) i'm sure i could make it much more pretty tho.
But a kind of 'columnar-bitfieldish' in-memory O(1) for everything:)*