Hacker Newsnew | past | comments | ask | show | jobs | submit | kingkilr's commentslogin

(pyca/cryptography dev here)

As Steve notes, Rust does support s390x. Even prior to shipping Rust code, we never tested or claimed to support s390x.

If there's genuine interest in more people supporting s390x in the open source world, folks will need to do the work to make it possible to build, test, and run CI on it. IBM recently contributed official PPC64le support to pyca/cryptography (by way of Github Actions runners so we could test and build in CI), and they've been responsive on weird platform issues we've hit (e.g., absl not support ppc64le on musl: https://github.com/pyca/infra/pull/710#issuecomment-31789057...). That level of commitment is what's really required to make a platform practical, treating "well, it's in C and every platform has a C compiler" as the sum total of support wasn't realistic.


If you don't mind, how did you get into cryptography development? I have heard many say that don't do this unless you're experienced but I wonder how one becomes more experienced if you don't do it by yourself.


RealPage was DoJ. As was the Google search litigation where DoJ proposed Google divest Chrome.

Which is by way of saying, the FTC and Chair Khan were not responsible for those.


I would strongly implore people not to follow the example this post suggests, and write code that relies on this monotonicity.

The reason for this is simple: the documentation doesn't promise this property. Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property. If you decide to depend on it, you're setting yourself up for a bad time when PostgreSQL decides to change their implementation strategy, or you move to a different database.

Further, the stated motivations for this, to slightly simplify testing code, are massively under-motivating. Saving a single line of code can hardly be said to be worth it, but even if it were, this is a problem far better solved by simply writing a function that will both generate the objects and sort them.

As a profession, I strongly feel we need to do a better job orienting ourselves to the reality that our code has a tendency to live for a long time, and we need to optimize not for "how quickly can I type it", but "what will this code cost over its lifetime".


> […] code that relies on this monotonicity. The reason for this is simple: the documentation doesn't promise this property. Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property.

The "RFC for UUIDv7", RFC 9562, explicitly mentions monotonicity in §6.2 ("Monotonicity and Counters"):

    Monotonicity (each subsequent value being greater than the last) is 
    the backbone of time-based sortable UUIDs. Normally, time-based UUIDs 
    from this document will be monotonic due to an embedded timestamp; 
    however, implementations can guarantee additional monotonicity via 
    the concepts covered in this section.
* https://datatracker.ietf.org/doc/html/rfc9562#name-monotonic...

In the UUIDv7 definition (§5.7) it explicitly mentions the technique that Postgres employs for rand_a:

    rand_a:
        12 bits of pseudorandom data to provide uniqueness as per
        Section 6.9 and/or optional constructs to guarantee additional 
        monotonicity as per Section 6.2. Occupies bits 52 through 63 
        (octets 6-7).
* https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-vers...

Note: "optional constructs to guarantee additional monotonicity". Pg makes use of that option.


>explicitly mentions monotonicity

>optional constructs

So it is explicitly mentioned in the RFC as optional, and Pg doesn't state that they guaranty that option. The point still stands, depending on optional behavior is a recipe for failure when the option is no longer taken.


It's mentioned in the RFC as being explicitly monotonic based the time-based design.

Implementations that need monotonicity beyond the resolution of a timestamp-- like when you allocate 30 UUIDs at one instant in a batch-- can optionally use those additional bits for that purpose.

> Implementations SHOULD employ the following methods for single-node UUID implementations that require batch UUID creation or are otherwise concerned about monotonicity with high-frequency UUID generation.

(And it goes on to recommend the obvious things you'd do: use a counter in those bits when assigning a batch; use more bits of time precision; etc.)

The comment in PostgreSQL before the implementation makes it clear that they chose the third option for this in the RFC:

     * variant bits. To ensure monotonicity in scenarios of high-
     * frequency UUID generation, we employ the method "Replace
     * LeftmostRandom Bits with Increased Clock Precision (Method 3)",
     * described in the RFC. ...


> It's mentioned in the RFC as being explicitly monotonic based the time-based design.

It's explicitly partially monotonic.

Or as other people would call it, "not monotonic".

People are talking past each other based on their use of the word "monotonic".


It's explicitly monotonic, except for apps that have a very fast ID rate, in which case there are recommended approaches (the word "SHOULD" is used in an RFC) to make it work. And PostgreSQL used one of these recommended approaches and documented it.


> It's explicitly monotonic, except for apps that have a very fast ID rate

"might generate two IDs in the same millisecond" is not a very exotic occurrence. It makes a big difference whether the rest is guaranteed or not.

> And PostgreSQL used one of these recommended approaches and documented it.

Well that's the center of the issue, right? OP's interpretation was that PostgreSQL did not document such, and so it shouldn't be relied upon. If it is a documented promise, then things are fine.

But is it actually in the documentation? A source code comment saying it uses a certain method isn't a promise that it will always use that method.


> Well that’s the center of the issue, right? OP’s interpretation was that PostgreSQL did not document such, and so it shouldn’t be relied upon.

And the correct answer is…we don’t know. We have a commit that landed and the explanation of the commit; we don’t have the documentation for the corresponding release of Postgres, because…it doesn’t exist yet. Because monoticity is an important feature for UUIDv7, it would be very odd if Postgres used an implementation that took the extra effort to use a nanosecond-level time value as the high-order portion of the variant part of the UUID instead of the minimum millisecond-level time, but not document that, but any assumption about what will be the documented, reliable-going-forward advertised feature is speculative until the documentation exists and is finalized.

OTOH, its perfectly fine to talk about what the implementation allows now, because that kind of thing is important to the decision about what should be documented and committed to going forward.


The point of the extra bits is to allow the application developer to keep monotonicity in the "not very exotic occurrence" scenario. The purpose is to be monotonic. I feel like you are missing the core concept.


I'm not missing anything, the problem is a lot of people using sloppy wording or mixing up the two modes.

This comment thread is about the guaranteed level of monotonicity. Yes, those bits exist. But you can't depend on them from something that only promises "UUIDv7". You need an additional promise that it's configured that way and actually using those bits to maintain monotonicity.


> It's explicitly partially monotonic.

From the RFC:

    Monotonicity (each subsequent value being greater than the last) is 
    the backbone of time-based sortable UUIDs. Normally, time-based UUIDs 
    from this document will be monotonic due to an embedded timestamp
"time-based UUIDs from this document will be monotonic". "will be monotonic".

I'm not sure how much more explicit this can be made.

The intent of UUIDv7 is monotonicity. If an implementation is not monotonic then it's a bug in the implementation.


> So it is explicitly mentioned in the RFC as optional […]

The use of rand_a for extra monotonicity is optional. The monotonicity itself is not optional.

§5.7 states:

    Alternatively, implementations MAY fill the 74 bits, 
    jointly, with a combination of the following subfields, 
    in this order from the most significant bits to the least, 
    to guarantee additional monotonicity within a millisecond:
Guaranteeing additional monotonicity means that there is already a 'base' level of monotonicity, and there are provisions for even more ("additional") levels of it. This 'base level' is why §6.2 states:

    Monotonicity (each subsequent value being greater than the last) is 
    the backbone of time-based sortable UUIDs. Normally, time-based UUIDs 
    from this document will be monotonic due to an embedded timestamp; 
    however, implementations can guarantee additional monotonicity via 
    the concepts covered in this section.
"Backbone of time-based sortable UUIDs"; "additional monotonicity". Additional: adding to what's already there.

* https://datatracker.ietf.org/doc/html/rfc9562


"this monotonicity" that OP suggests people not use is specifically the additional monotonicity.

Or to put it another way: OP is suggesting you don't depend on it being properly monotonic, because the default is that it is only partially monotonic.


> Normally, time-based UUIDs from this document will be monotonic due to an embedded timestamp; however, implementations can guarantee additional monotonicity via the concepts covered in this section.

“Normally, I am at home because I do not have a reason to go out; however, sometimes I am at home because I am sleeping.”

Notice how this statement does not actually mean that I am always at home.


I was recently bit doing a Postgres upgrade by the Postgres team considering statements like `select 1 group by true` fine to silently break in Postgres 15. See https://postgrespro.com/list/thread-id/2661353 - and this behavior remains undocumented in https://www.postgresql.org/docs/release/ . It's an absolutely incredible project, and I don't disagree with the decision to classify it as wontfix - but it's an anecdote to not rely on undefined behavior!


The "optional" portion is this part of the spec, not the time part.

> implementations can guarantee additional monotonicity via the concepts covered in this section


The “time part” is actually two different parts: the required millisecond-level ordering and the optional use of part of rand_a (which postgres does) to provide higher-resolution (nanosecond, in the postgres case) time ordering when combined with the required portion.

So, no, the “time part” of the postgres implementation is, in part, one of the options discussed in the spec, not merely the “time part” required in the spec.


Relying on an explicitly documented implementation behavior that the specification explicitly describes as an option is not an issue. Especially if the behavior is only relied on in a test, where the worst outcome is a failed testcase that is easily fixed.

Even if the behavior went away, UUIDs unlike serials can always be safely generated directly by the application just as well as they can be generated by the database.

Going straight for that would arguably be the "better" path, and allows mocking PRNG to get sequential IDs.


Software is arbitrary. Any so-called "guarantee" is only as good as the developers and organizations maintaining a piece of software want to make it regardless of prior statements. At some point, the practical likelihood of a documented, but not guaranteed, process being violated vs. the willful abandonment of a guarantee start to look very similar.... at which point nothing saves you.

Sometimes the best you can do is recognize who you're working with today, know how they work, and be prepared for those people to be different in the future (or of a different mind) and for things to change regardless to expressed guarantees.

....unless we're talking about the laws of physics... ...that's different...


The test should do a set comparison, not an ordered list comparison, if it wants to check that the same 5 accounts were returned by the API. I think it's as simple as that.

The blogpost is interesting and I appreciated learning the details of how the UUIDv7 implementation works.


Don’t you think that depends on what you’re guaranteeing in your api? If you’re guaranteeing that your api returns the accounts ordered you need to test for that. But I do agree in general that using a set is the correct move.


The test is a very strange example indeed. Is it testing the backend, the database or both? If the api was guaranteeing ordered values, pre-uuid7 the backend must have sorted them by other means before returning, making the test identical. If the backend is not guaranteeing order, that shouldn't be tested either.


As a counter-argument, it will inevitably turn into a spec if it becomes widely-used enough.

What was that saying, like: “every behavior of software eventually becomes API”



Yes, that one! Thanks :)


Consider the incentives you're setting up there. An API contract goes both ways, the vendor promises some things and not others to preserve flexibility, and the user has to abide by it to not get broken in the future. If you unilaterally ignore the contract, even plan to do so in advance, then eventually kindness and capacity to accommodate such abuse will run might run out and they may switch to an adversarial stance. See QUIC for example which is a big middle finger to middle boxes.


Sure, there is a risk. But, it all depends on how great and desirable the benefits are.


In enterprise land. In proof of concept land that's not quite true (but does become true if the concept works)


I agree, optimizing for readability and maintainability is almost always the right choice.


> Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property.

Huh?

If the docs were to guarantee it, they guarantee it. Why are you looking for everything to be part of RFC UUIDv7?

Failure of logic.


Their next sentence explains. Other databases might not make that guarantee, including future versions of Postgres.


I too am missing the win on this. It is breaking the spec, and does not seem like it offers a significant advantage. In the eventual event where you have a collection of UUID7 you are only ever going to be able to rely on the millisecond precision anyway.


You say it's breaking the spec, but is it?

From https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-versio...:

"UUIDv7 values are created by allocating a Unix timestamp in milliseconds in the most significant 48 bits and filling the remaining 74 bits, excluding the required version and variant bits, with random bits for each new UUIDv7 generated to provide uniqueness as per Section 6.9. Alternatively, implementations MAY fill the 74 bits, jointly, with a combination of the following subfields, in this order from the most significant bits to the least, to guarantee additional monotonicity within a millisecond:

   1.  An OPTIONAL sub-millisecond timestamp fraction (12 bits at
       maximum) as per Section 6.2 (Method 3).

   2.  An OPTIONAL carefully seeded counter as per Section 6.2 (Method 1
       or 2).

   3.  Random data for each new UUIDv7 generated for any remaining
       space."
Which the referenced "method 3" is:

"Replace Leftmost Random Bits with Increased Clock Precision (Method 3):

For UUIDv7, which has millisecond timestamp precision, it is possible to use additional clock precision available on the system to substitute for up to 12 random bits immediately following the timestamp. This can provide values that are time ordered with sub-millisecond precision, using however many bits are appropriate in the implementation environment. With this method, the additional time precision bits MUST follow the timestamp as the next available bit in the rand_a field for UUIDv7."


> It is breaking the spec […]

As per a sibling comment, it is not breaking the spec. The comment in the Pg code even cites the spec that says what to do (and is quoted in the post):

     * Generate UUID version 7 per RFC 9562, with the given timestamp.
     *
     * UUID version 7 consists of a Unix timestamp in milliseconds (48
     * bits) and 74 random bits, excluding the required version and
     * variant bits. To ensure monotonicity in scenarios of high-
     * frequency UUID generation, we employ the method "Replace
     * LeftmostRandom Bits with Increased Clock Precision (Method 3)",
     * described in the RFC. […]


I don't think most people will heed this warning. I warned people in a programming forum that Python ordering of objects by insertion time was a implementation detail, because it's not guaranteed by any PEP [0]. I could literally write a PEP compliant Python interpreter and could blow up in someone's code because they rely on the CPython interpreter behavior.

[0]: https://mail.python.org/pipermail/python-dev/2017-December/1...


> I warned people in a programming forum that Python ordering of objects by insertion time was a implementation detail, because it's not guaranteed by any PEP

PEPs do not provide a spec for Python, they neither cover the initial base language before the PEP process started, nor were all subsequent language changes made through PEPs. The closest thing Python has to a cross-implementation standard is the Python Language Reference for a particular version, treating as excluded anything explicitly noted as a CPython implementation detail. Dictionaries being insertion-ordered went from a CPython implementation detail in 3.6 to guaranteed language feature in 3.7+.


That definitely was true, and I use to jitter my code a little to deliberately find and break tests that depended on any particular ordering.

It's now explicitly documented to be true, and you can officially rely on it. From https://docs.python.org/3/library/stdtypes.html#dict:

> Changed in version 3.7: Dictionary order is guaranteed to be insertion order.

That link documents the Python language's semantics, not the behavior of any particular interpreter.


Most code does not live for a long time. Similar to how consumer products are built for planned obsolescence, code is also built with a specific lifespan in mind.

If you spend time making code bulletproof so it can run for like 100 years, you will have wasted a lot of effort for nothing when someone comes along and wipes it clean and replaces it with new code in 2 years. Requirements change, code changes, it’s the nature of business.

Remember any fool can build a bridge that stands, it takes an engineer to make a bridge that barely stands.


>Most code does not live for a long time.

Sure, and here I am in a third company doing cloud migration and changing our default DB from MySQL to SQL server. The pain is real, 2 year long roadmap is now 5 years longer roadmap. All because some dude negotiated a discount on cloud services. And we still develop integrations that talk to systems written for DOS.


What? Okay, so assume that most code doesn't last. It doesn't mean that you should purposefully make it brittle for basically no additional benefit? If as you say, it's about making the most with as little as possible (which is what the bridge analogy usually refers to), then surely adding a single function (to actually enforce the ordering you want) to make your code more robust is one of the best examples of that?


Uh, more people work on 20-year-old codebases than you'd think.


And yet these people are dwarved by the number of developers crunching out generic line of business CRUD apps every day.


Since last year's AMG case in the Supreme Court, the FTC is not authorized to seek monetary relief in these cases.

The FTC can seek monetary relief if this order is violated.


I don't know Brett super well so I can't speak to the rest of his background, but it's not correct that the Obama admin asked him to take over DDS.

DDS's founding head was Chris Lynch, who served in that role until the middle of the Trump administration, when he left government service and that's when Brett got the job.


There ya go. +/- details.

I saw him present on DDS and his backstory at BSidesLV a few years ago and did a bit of non-profit govt<>tech chatter with the team there.

Correct, it was in the middle of the Trump Admin.


While I don't love the proliferation of dependencies, from a risk perspective the raw number of dependencies isn't always the right metric.

Looking at the authors and publishers numbers from https://github.com/rust-secure-code/cargo-supply-chain it's clear a lot of these are maintained by the same set of trusted folks.


This is true today, will it remain true?


You could argue the same thing for many big monolithic C projects though. How many of the original authors/maintainers are left in OpenSSL or the Linux kernel?

My main worry about Rust dependencies is not so much the number, it's that it's still a fairly young ecosystem that hasn't stabilized yet, packages come and go fairly quickly even for relatively basic features. For instance for a long time lazy_static (which is one of the dependencies listed here) was the de-facto standard way of dealing with global data that needed an initializer. Apparently things are changing though, I've seen many people recommend once_cell over it (I haven't had the opportunity to try it yet).

Things like tokio are also moving pretty fast, I wouldn't be surprised if something else took over in the not-so-far future.

It's like that even for basic things: a couple of years ago for command line parsing in an app I used "argparse". It did the job. Last week I had to implement argument parsing for a new app, at first I thought about copy/pasting my previous code, but I went on crates.io and noticed that argparse hadn't been updated in 2years and apparently the "go to" argument parsing lib was now "clap". So I used clap instead. Will it still be used and maintained two years from now? Who knows.


I switched ripgrep to clap 4 years ago. And that was well after clap had already become the popular "go to" solution.

Some parts of the ecosystem are more stable than others. That's true. And it takes work to know which things are stable and which aren't.

And yet, some things just take a longer time to improve. lazy_static has been stable and unchanged for a very long time and it works just fine. You don't need to switch to once_cell if you don't want to. lazy_static isn't going anywhere. The real change here, I think, is that we're hoping to get a portion of once_cell into std so that you don't need a dependency at all for it.

The async ecosystem is definitely moving more quickly because it just hasn't had that much time to stabilize. If you're using async in Rust right now then you're probably an early adopter and you'll want to make sure you're okay with the costs that come with that.


Interesting that I missed clap when I wrote that program a few years ago then. In my defence "argparse" is a lot more explicit than "clap" for a such a library. Also argparse's last update was 2 years ago, so there's been quite a bit of overlap.

I guess what I'm saying is that it's an other problem with the current package ecosystem: you often end up finding multiple packages purporting to do what you need, and it can be tricky to figure out which one you want. As an example, if you want a simple logging backend the log crate currently lists 6 possibilities: https://crates.io/crates/log

I picked "simple_logger" basically at random.


> it's an other problem with the current package ecosystem: you often end up finding multiple packages purporting to do what you need, and it can be tricky to figure out which one you want

I'm trying to remember the last language I've used where people didn't say that.

Hmm... clojure? Nop.

Javascript? Nop nop nop.

Python? Hahaha I can't even remember all the package managers: virtualenv, venv, pipenv, poetry, ...


Seems like an unavoidable problem unless you buy into a curated ecosystem. Like, yeah, the cost of a decentralized ecosystem is that you have to do your due diligence on which crate to use, if any. (For example, I don't even bother with a log helper crate because it just isn't necessary for simple cases.)


Well yes if they're published as part of the same project as lots of these are. In C/C++ you wouldn't do this because consuming a library is a pain so you want to minimise the number of dependencies. In Rust, what would be 1 library in C often gets broken up into a few that are published together in order to allow people to depend on only the functionality they need.


It's also worth pointing out that once a version is published to crates.io, it can't be altered, specifically to prevent social engineering attacks. If you're worried about it, that means you can audit the frozen codebase for any given version from a top-level crate down through the dependencies, and once that trust is established, it can't be leveraged for a silent dependency change later on, which can only happen through a version update on the end-user's side.


What if I audit something down a few levels and find it lacking - how do I force update everything to not use the bad version?



I think it's fair to say that this work is quite likely to qualify :-)


Rust has a few interlocking behaviors that provide its memory safety, a few of the most important are:

- The borrow checker enforces mutable XOR shared references.

- The compiler does not allow use of local variables before they're assigned to, requires structs to be completely initialized, etc..

- All the builtin datastructures perform bounds checks

- The compiler disallows deferencing raw pointers except in unsafe blocks.

There's a lot of good things to be said about modern C++, particular smart pointers. However, it's significantly less resilient to common mistakes than Rust is: https://alexgaynor.net/2019/apr/21/modern-c++-wont-save-us/


Not that I consider the overall sentiment of the linked article wrong, but this...

> Dereferencing a nullptr gives a segfault (which is not a security issue, except in older kernels). Dereferencing a nullopt however, gives you an uninitialized value as a pointer, which can be a serious security issue.

...betrays a complete lack of understanding what Undefined Behavior is/implies. That's not something you want to see in an article discussing memory safety.


Sure you can.

First, in a philosophical sense: pointers and x86 CPUs are real, ultimately any safe abstraction must be built on unsafe primitives. The ability and need to do that aren't specific to memory unsafety, we do that all over software engineering.

Second, empirically, my experience has been that the design of these abstractions can be safe, but moreover that the cordoning off of unsafe blocks makes 3p auditing for memory unsafety _much_ easier to do. It can be orders of magnitude faster than reviewing an entire C or C++ codebase.


A TCB should be dozens of lines, not thousands. More code means more places for more bugs to hide.

My experience in Safe Haskell was that, if you have to ask each module individually whether it has a safety property, then you've already created too much work for yourself. Instead, require every module to structurally encode the desired invariant.

Or, in fewer words: If you want memory safety, don't have `unsafe` blocks.


I don't have any data on exploitability, but 19 of the last 22 vulnerabilities (since 2018) have C-induced memory unsafety as a cause: https://curl.haxx.se/docs/security.html


Oh thanks, that at least gives some idea of the potential. I see e.g. "HTTP/2 trailer out-of-bounds read" and "SSL out of buffer access"... I guess there might be some candidates.


If you start from when Morris worm got released into UNIX world, there will be plenty to chose from.


Quite a bit broader than "libcurl's HTTP/HTTPS handling", though.

Which it sounds like the answer is nonzero. But significantly smaller than "every C bug since 1988".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: