Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To explain the slightly horrible fact that there are two modes, strict and weak with strict mode being enabled on a per file basis; people just couldn't agree on whether the types should be strict or if they should use PHP's existing coercion rules. (Or even a 3rd new set of conversion rules, but lets ignore them).

Having an optional mode per file allows:

People who want to write code with strict types can do so. People who want to write code with weak types can do so. People who want to write code without these new fangled types can continue to do so.

All without any issues of compatibility between running code on the current version of PHP and what will be the next one, and without any compatibility issues for libraries that are solely written in one mode, being used in applications that are written in the other mode.

\o/

To be honest, I'm completely in the strict camp - but it's awesome that we've gotten a solution that should allow everyone to use the mode they want, without splitting the PHP ecosystem.



The big benefit is you don't end up with "weakly-typed", "strongly-typed" and "untyped" APIs. There are just "typed" and "untyped" ones, and you can choose the behaviour that suits you best.


> To be honest, I'm completely in the strict camp

Can you explain why? Are you using PHP in a web context? Because everything from the web is a string.

So wouldn't it make the most sense to let the functions using the web data coerce to integer, and just work?

How does putting (int) before the arguments to function help anything?

I actually liked Ze'ev's proposal because it let you be weak while making sure you did not coerce obviously bad data.

Anyway, as a member of the strong camp, can you explain?


> Because everything from the web is a string.

I would start by saying that everything, regardless of domain, is just a stream of bits. Which is completely useless, just like your assertion.

And I know what you meant, but you're also wrong. A JSON object is only a string before being interpreted. An x-www-form-urlencoded is actually a map in which values can be arrays instead of primitives. Such forms often correspond to domain models with a clear definition.

There's also no such thing as "obviously bad data". All data is good in the proper context, therefore automatic conversions that try to make this distinction do not make sense. I don't necessarily know how PHP behaves, but in another popular language there's a world of difference between "077" and "77". There's also a world of difference between integer, floating point and fixed point and the details are never irrelevant.


> therefore automatic conversions that try to make this distinction do not make sense.

You have to convert it somewhere. I don't see how the caller converting it is any better than the recipient doing it. Having the caller do it seems quite pointless when the recipient is anyway doing it.

Your answer about how everything is bits was quite useless since you completely missed the point. Your input is a string, you have to convert it someplace. Weak mode has the recipient do it. Strict mode you have to do it yourself, and then the recipient double checks.

I see no value in the second option - the actual conversion in both cases is identical.

But Danack disagrees, so I asked him to explain. Your answer was not helpful at all.


@ars, you completely missed my point. Lets go over it again. Tell me how should the following things convert:

     77
     077
     77.0


I GOT your point. I don't care about your point because it is not the question I am asking.

Why are you answering something I did not ask?

I am asking why does Danack prefer strict mode. There is absolutely nothing in my question that cares about the specific details how you convert bits to types, other than that you do.

My question is entirely about WHO does the conversion. NOT about the conversion itself.

(Oh, and the thing about bad-data has a defined meaning that went over your head because you are not familiar with the debate here. In this context bad-data means data loss on conversion. So "1" to 1 is fine, but "1 a" to 1 is not.)


The conversion itself is very relevant because you cannot establish a default conversion that should happen, therefore the conversions should be explicit, answering the WHO. This is why I asked you about what should the conversion produce in those examples.

And also in the conversion from "1.1" to 1.1 there is loss of information, because the two representations are not isomorphic. Care to guess why?


This is the point that ars is making. He is saying that it has to be a runtime check and/or conversion because it is coming over the wire, as a string, at runtime.

So the question he is asking is, if you have to do the check at runtime anyway, what is the benefit of the type hinting? Isn't it just belt and braces?

It seems like a perfectly legitimate question to me.

And by the way, to those downvoters who don't seem to be able to tell the difference between a comment you disagree with and spam, can you please contribute to the conversation by hitting the reply button or alternatively get lost? Only you're ruining it for the rest of us. Thanks


> Having the caller do it seems quite pointless when the recipient is anyway doing it.

Actually, I disagree. The caller is the only one who has semantic information about what the variable (and hence its value) means. All the callee (recipient) can do is blind cast it. The caller on the other hand can interpret it because it knows the meaning (talking about the developer, not the engine).


> Can you explain why?

Over the years, we have refactored our PHP code base to something which is much more bug proof, and a lot of it is because we demand certain types to be passed to types. Currently we use doc types (for which the IDE helps us heaps) and type hinting for objects being passed as arguments.

For a given process (e.g. form submission) there will nevertheless be an entry point where strings from the web are passed in. But if you can minimize that area as much as possible, beyond that one place (a function or class) which understands the mapping from incoming string types to PHP types, you end up with a code base which behaves mostly like a statically type language.

This has reduced a huge subset of bugs which are caused by unexpected input being passed to a function. Having the language itself tell you when you made an error statically while writing is much better than having to wait until runtime.

A similar argument would stand for why we moved from dynamically created strings sent to the database, towards a database abstraction layer where we pick up syntax errors at time of writing.


> Can you explain why?

Hopefully.

> How does putting (int) before the arguments to function help anything?

It wouldn't. Anyone who is casting from an unknown type to an int by using just `(int)` is doing something wrong in my opinion.

Even in web-based applications there are at least two layers of code: i) One where the type of the values are unknown and they are represented as strings. ii) One where the types of the values are known.

At the boundary between these two layers you should have code that inspects the strings that represent the input values, check that they are acceptable, and convert them to the desired type. If the input values cannot be converted to the desired type, the code needs to give an error that is both specific to the type of error so that a computer can understand it, as well as provide a human understandable explanation of why the conversion was not allowed.

The reason why I want strong types is that I never, ever want to blindly cast from one type to another. The decision about how to convert from one type to another, should always be made at a boundary between areas of the application where types are known, and the areas where the types are unknown. I always want to be forced to make that decision in the right place, using code that gives useful errors and messages, rather than having the value coerced into the desired type.

tl;dr I won't use (int) to cast, I will use something like the code below.

cheers Dan

    function validateOrderAmount($value) : int {
        $count = preg_match("/[^0-9]*/", $value);
        
        if ($count) {
            throw new InvalidOrderAmount("Order amount must contain only digits.");
        }

        $value = intval($value);

        if ($value < 1) {
            throw new InvalidOrderAmount("Order amount must be one or more.");
        }
        
        if ($value >= MAX_ORDER_AMOUNT) {
            throw new InvalidOrderAmount("You can only order ".MAX_ORDER_AMOUNT." at a time.");
        }
        
        return $value;
    }

    function processOrderRequest() {
        $orderAmount = validateOrderAmount($_REQUEST['orderAmount']);
    
        //Yay, our IDE/static code analyzer can tell that $amount is an int if the code reached here.
        placeOrder($orderAmount);
    }


Thank you for the reply! (The rest of the replies to me got seriously derailed....)

So from your code it looks like the only benefit of strict mode is in case you forget to do the validation/conversion it will warn you? I guess that's reasonable. Is there any other benefit?

To me it seems that Ze'ev's proposal would be even better for you - it does the equivalent of the validation and conversion automatically including with an error if it doesn't validate.

You wrote "let's ignore that", but it really seems like to best of all worlds to me. Any idea why it was rejected so badly? Is it because the coercion rules are different from the rest of PHP?


> Is there any other benefit?

Strict types make it easier to reason about code, that the tl;dr version.

> To me it seems that Ze'ev's proposal would be even better for you - > it does the equivalent of the validation and conversion automatically > including with an error if it doesn't validate.

Rather than having int 'types' which we can reason about, it has int 'values' which are harder to reason about. Types can be reasoned about just by looking at the code. Values can only be reasoned about when running code. A contrived example:

    function foo(int $bar){...}

    foo(36/$value);
In strict mode, this would be reported as an error by code analysis.

For the coercive scalar type proposal, this code works - except when it doesn't. This code works when $value = 1, 2, 3, 4 and breaks when $value = 5.

This is the fundamental difference; whether conversions between types have to be explicitly done by code, and so any implicit or incorrect conversion can be detected by static code analysis tools, or whether the conversions are done at run time, and so cannot be analyzed fully.

This means most of these errors will be discovered by users on the production servers. Strict mode allows you to eliminate these types of errors.

Yes, this means I need to add a bit of code to do the explicit conversion, but I just don't convert between values that much. Once a value is loaded from a users request, config file or wherever, it is converted once into the type it needs to be. After that, any further change in type is far more likely to be me making a mistake, rather than an actual need to change the type.

> Any idea why it was rejected so badly? Is it because the coercion rules are different from the rest of PHP?

At least in part it was because the RFC was seen as a way to block strict types; about half of the RFC text is shitting on people desires for strict types, which did not make people who want strict types be very receptive. If it had been brought up 6 months ago, there is a good chance it would have passed, or at least would have been closer.

Some parts of the proposal were good - other parts were nuts that were pretty obvious the result of the RFC only being created once the dual mode RFC was announced and about to be put to the vote, with a very high chance of passing.

* Good - "7 dogs" not longer being converted to "7" if someone tries to use it as an int.

* Bad - Different mode for internal function vs userland functions e.g. "Unlike user-land scalar type hints, internal functions will accept nulls as valid scalars." and other small differences. This is even more nuts than you might realise as it means if you extend an internal class, and overload some of the methods on the class, those methods will behave differently to the non-overloaded methods.

* Bad - Subtle and hard to fix BC breaks in conversion which are probably not right anyway. e.g. false -> int # No more conversion from bool true -> string # No more conversion from bool

It is a shame that the discussion became so contentious. It would have been good if the conversion rules could have been tidied up, but all the time and energy had been used up the not particularly productive discussion.


> Because everything from the web is a string.

Do you consider JSON a string? Would you manipulate it as a string or use a JSON parser?

> How does putting (int) before the arguments to function help anything?

It throws an error in case you receive bad input, and as we all know you will receive bad input.


You get json from the browser?

Of course I manipulate the data - that's exactly what weak mode does, convert the strings into integers. I just don't see how doing the conversion myself manually helps anything.

> It throws an error in case you receive bad input, and as we all know you will receive bad input.

It does no such thing. (int) will simply turn bad input into a zero.


> You get json from the browser?

Yes.


Really? All the pages you program with forms, and links and whatever are sending you json?

I have no doubt you CAN do it, but most of the time you don't.

And since most of the time you are dealing with strings my question stands: Why do the conversion manually instead of letting the nice new feature do it for you.


You've heard of JavaScript I presume?


Even AJAX sites do not typically send all data back as JSON. Most of the time they use normal form-urlencoded data.

If you did send everything as JSON you would be bypassing everything PHP does with form/url data to make things easier for you (for example arrays). That doesn't seem like a good engineering tradeoff.


Sending structured data from JavaScript to PHP is much easier via json and in PHP its as easy as calling json_decode() to get back and object or array depending on your preference.


PHP accepts nested key value pairs and frameworks can take advantage of that when accessing the $_REQUEST object. If you're bypassing the functionality that's baked into forms then you're going to have to put it back in at some point or reinvent it yourself.

And then you can't make a request to the server directly unless you format your data as JSON, which is a bit inconvenient, especially if you're debugging a problem.

So if using JSON as a container isn't gaining you some other benefit then it's probably best avoided.


If you live in one framework and always will, then I see your point. But if you want to have the ability to code out native solutions, use other tools where other tools may be more appropriate, or discipline your projects to be a little less biased, then asking PHP to parse JSON is not a huge buy.

That's how we've done it here and the modularization it has provided us has been incredible. PHP's json_encode and json_decode are also quite fast.


I don't really get your reasoning there. Forms are the mechanism for making parameterised requests over HTTP. If REST is your platform then it makes sense to start with them, regardless of any framework you are using.

I'm not saying using JSON as an envelope won't work - it is clearly working for you - just that I wouldn't start there, and I can't see that you couldn't work with the request object directly.


Maybe I'm not understanding our disagreement, but if I am incorrect please help me understand what I am missing.

In your preferred way, data is exchanged over HTTP which uses percent encoding to pass data. I'm stating that a serialized object, in this case JSON, provides more benefit.

Both require overhead to encode and decode, but I believe that serializing your data allows for a more consistent exchange that (provided I am understanding how you post and retrieve data) actually may reduce size, increase the variance of what can be transmitted, and remove specific limitations that HTTP may encounter.

I'm genuinely curious if I am missing something.


First of all, I don't want to oversell this. I'm not overly precious about it and I'm not saying that having started with HTTP as a platform I wouldn't add JSON later.

I'm sure I could find myself in a position where I would want to standardise on JSON as a container for requests/responses, though see my last paragraph about Atom.

That said, my instinctive reaction against using JSON as an envelope is that it adds a layer of abstraction (and potential obfuscation) that I don't see an immediate benefit for. It may hark back to my experiences with SOAP. My mantra is to exploit the existing protocol to its fullest before extending it, and to do the simplest thing before adding complexity.

Let's presuming we're still at a basic level of interaction through a website. Treating something as a form with fields such as name="user[email]", name="user[password]", name="user[telephone][mobile]" etc, seems more discoverable to me, as a developer at least.

For one thing, I know there is no JSON translation layer to go through. For another, I can get a server to generate a form that I can use to test the interface quickly and easily. To do the same with JSON would require me to have some JS intercepting the submit event so that it can convert the contents to JSON before posting. So now I can't use a terminal based browser to do my testing. Which means maybe I can't automate some testing strategy so easily.

If we're talking about a more sophisticated RESTful API, I would probably choose ATOM over JSON, because ATOM is built on XML and therefore is defined by a schema and can be interpreted by the browser. Specifically, it provides the rel attribute for discoverability. JSON payloads can implement this too, but you have to choose your extension.

In fairness, if I were doing a RESTful API, I'd probably be thinking about being able to implement interfaces for ATOM, JSON, and HTML, plus whatever cool new thing is just around the corner.


Thanks for clarifying.

I'll agree that your way definitely provides less abstraction, and my viewpoint doesn't perceive jt in that way. That being said, I've made a similar case against ORMs, so I understand your position.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: