October 2014 on internals@php

November 28, 2014php, internals@, english

Cet article est aussi disponible en français.

You can follow me @pascal_martin, and there is an RSS feed of the posts I write in English.

809 messages have been exchanged in October 2014 on PHP’s internals@ mailing-list — a bit more than in September.

As a graph representing the number of mails per month for the last three years, we’d get:

First of all, PHP 5.6 has entered its normal cycle of releases, with a first maintenance version at the beginning of the month.

Nikita Popov has written the RFC: Remove deprecated functionality in PHP 7, suggesting to remove from PHP 7 everything that was flagged as deprecated on all 5.x versions including, for example : ext/ereg, ext/mysql, affectation of new by reference, functions to manipulate magic_quotes, # comments in php.ini, … Considering the impact the removal of the first two points might have, this RFC might get voted in three steps (one vote for each of the first two points, and a third one for the remaining ideas).

Kris Craig noted that ext/mysql has been obsolete for a while — and the fact some sites and tutorials are still referencing it should not prevent its removal. Also, Rasmus Lerdorf said most PHP users get their installations and PHP extensions from their distributions and those tend to package PECL extensions as if they were provided directly with PHP.

Johannes Schlüter insisted that, independently of the eventual removal of ext/mysql, educating users is important, to ensure they stop using this extension — and Derick Rethans added that ext/mysql having a procedural interface is not a good enough reason in itself to justify its removal. Zeev Suraski also reminded us that every time a feature is broken of removed, updating becomes a bit harder for users.

On the other hand, as Pierre Joye answered, if deprecated features never get removed, there is no use in flagging them as deprecated and one could just stop doing that.

In the middle of the month, Zeev Suraski announced the RFC: PHP 7.0 timeline, which aims to quickly release PHP 7 (in about one year), even if it means not getting some features. Things that break compatibility should make it for PHP 7.0 (as BC-breaks are only possible for major versions), but other features that don’t cause BC-breaks could arrive with PHP 7.1 or 7.2. Waiting for too long might also lead to a version with too many incompatibilities, which would slow its adoption down.

Xinchen Hui answered indicating he thought one year was a bit long. Others quickly noted it might be better to not go too fast : several ideas could yet be taken into account for PHP 7 — even if only compatibility-breaking stuff must go into the 7.0 version (still, one should not forget anything, to avoid having to wait for another ten years).

Rasmus Lerdorf also indicated the master branch might require a bit of time to get stabilized — even more on a project with its fair share of legacy code, like PHP, where each change can have consequences that are not always very well anticipated.

Variadic functions having been added to PHP 5.6, Marco Pivetta proposed to flag as deprecated, for PHP 5.7, functions call_user_func, call_user_func_array, func_get_args, func_num_args and func_get_arg — in order to remove them later, for PHP 7. They could, if necessary, be re-implemented in user-space, for the compatibility of applications using them.

Andrea Faulds answered this removal would cause a great BC-break for applications, which would no longer work with a basic install of PHP 7 — even if it is possible to re-implement them in user-space. Removing those functions would also not bring any real gain.

Still, the idea of removing (now useless) functions from PHP core to move them to user-space also makes sense… Even if it probably wouldn’t be great when it comes to performances.

After a new XML remote-debugging protocol has been implemented for phpdbg, Pierre Joye noted it could be interesting to check whether it is possible to get closer to the DBGp protocol already used by Xdebug. Joe Watkins explained the implementation of that new protocol took place following discussions with PhpStorm’s developers. Bob Weinand then listed a few differences between those two — which could maybe have been added to DBGp?

Stas Malyshev also said that working outside the PHP project for several weeks before merging an entire feature might not be the best way to go for a component that’s integrated in PHP: more discussions could have taken place if those developments had been done in a feature branch of PHP’s repository (well, it has not necessarily been the case in the past — but it goes the same way for PHP as a whole, where changes are not reviewed much). That being said, it would be great if work done here could lead to a unified debugging protocol for PHP.

David Soria Parra then indicated going with an RFC could have been profitable, to think about setting up a new protocol and the best suited solution. Julien Pauli added having a separate website for phpdbg was odd (it’s not usual for components of PHP) and its content could be moved to php.net (it seems to be planned). As phpdbg is a part of PHP, it should follow PHP’s processes.

A few days later and following suggestions in the previous thread, Ferenc Kovacs explained RMs of PHP 5.6 had decided to revert changes brought to phpdbg and not published yet (remote debugging support): as this component is a part of PHP, its major evolutions might require a stabilisation phase and they should follow the process in place for PHP’s evolutions — mostly, going through RFCs. Those changes have been reverted a few hours later, and will come back in the future.

Andrea Faulds wrote the RFC: Big Integer Support, proposing to set up, at an internal level, two kind of integers: traditional integers (limited to numbers that can fit on 32 or 64 bits) and integers based on GMP, which would allow to manipulate numbers as large as one wants. These two internal types would be exposed to user-space as a single integer type, with no size-limit.

Stas Malyshev answered adding a new internal type would require all extensions to take it into account — which means many of those would have to be updated.

Regarding some license issues with GMP, the RFC has been updated later. A benchmark has also been published to show this proposal doesn’t have any real impact on PHP performances, even if more in-depth tests should be done.

After about two weeks, discussions on this matter have stopped, without any decision being reached. We’ll see in the next few months if this RFC comes back to the front of the stage.

Joe Watkins wrote the RFC: UString, which introduces a new UString class, that would provide features related to Unicode strings manipulations. Basically, this UString class would, in a way, take the place of the existing mbstring extension, which is not perfect, with an implementation based on ICU (already used by Intl), faster than mbstring’s one.

The first answers have been rather positive. Pierre Joye added it would be great if this extension was always enabled, to facilitate its adoption.

Dmitry Stogov noted that, if this RFC doesn’t include using UString objects from the engine or other extension, it is incomplete and doesn’t answer all needs (for example, Unicode strings couldn’t be used as keys of arrays, as they would be objects and not real strings — unless another idea gets accepted). Of course, as Johannes Schlüter indicated, a better approach would be a real adoption of Unicode, but PHP 6 showed this is not that easy.

In the previous thread, Stas Malyshev announced he had begun working on RFC: Objects as hash keys, which proposes to add a new magic method (like __hash() or __toKey()) that would allow us to use objects as array keys.

Joe Watkins suggested __toScalar() could be an appropriated name, closer to what this method would actually do, without limiting one to the currently discussed context — but this name would not explain the usefulness of this method.

Alexander Lisachenko added using an interface for this idea could also be a good idea, instead of adding a new magic method — even if a magic-method would be closer to the PHP way of doing things.

Etienne Kneuss noted this RFC doesn’t actually mean using objects as array keys (it would require a lot of work on re-writing PHP HashTables’ handling), but only simplifies syntax. As such, foreach, key() or array_keys() would not return objects, but only their hashes, as calculated by this new method.

Andrea Faulds wrote the RFC: Readonly Properties, based on the fact it is now hard to define a property that can be read from the outside of a class, while not making it writable in the process: one has to write getters/setters and/or play with __get/__set (in any case, we have to write code and its not really efficient). To solve this, the RFC proposes to add a new readonly keyword, that could be used on class properties, to indicate which ones should be readable from the outside of that class.

Rowan Collins noted it would make user-space objects closer to those exported by some extensions (which can have read-only properties) — and it would indeed be great not to have to write numerous lines of code to get this feature. Others suggested to add more precise access-rules, with a syntax like var $callback as rwxrw---x;.

Still, as Jordi Boggiano indicated, another possibility would be to talk again about RFC: Property Accessors Syntax, which was going farther while allowing what’s proposed here — which only answers a portion of the different needs. Nikita Popov also noted the readonly keyword is not necessarily explicit and, contrary to getter methods, properties can’t be defined in an interface. Maybe one could already reserve the syntax for accessors, while only implementing a small part for now?

In the end, Andrea Faulds removed this RFC, as it was a bit confusing and only answered a portion of what was needed.

Kris Craig asked why PHP has $_GET and $_POST super-globals but no $_PUT nor $_DELETE, which could be useful when it comes to developing REST APIs.

Andrea Faulds noted this two variables are poorly named: $_GET corresponds to parameters received from the query-string, while $_POST contains data from the request’s body — another way of seeing things is they correspond to form methods, and not to HTTP primitives. Still, switching to variables like $_QUERY, $_BODY and $_REQUEST could have some sense, but would completely destroy compatibility. On the other hand, Rasmus Lerdorf indicated users know how to use $_GET and $_POST and adding aliases might be going a bit far: of all that might be confusing in PHP, this specific point would be near the bottom he the list.

Michael Wallner then said looking at the pecl_http extension and the RFC: Add pecl_http to core could be a good idea.

The discussion has been intense, with almost one hundred mails in a few days, but I don’t feel like it lead to a decision. In any case, $_GET and $_POST are here to stay — but we might see some other aliases appear in the future…

A few days later, Sherif Ramadan announced he had begun working on RFC: Standardized PHP Http Interface.

Florian Margaine answered PHP should focus on providing implementations more than interfaces: up to user-space to decide how the code should look like. Removing the super-globals ($_GET, $_POST, …) would also break pretty much all existing applications and adoption of PHP 7 would greatly suffer — even if they are not the most perfect interface that could be.

On the other hand, GPC variables are not quite perfect and an interface provided by PHP might unify a bit the way each framework now comes with its own HttpRequest class.

Larry Garfield also indicated there were discussions around that matter on the FIG’s mailing-list and user-space is probably more appropriate for this kind of experimentations.

Andrea Faulds wrote the RFC: Safe Casting Functions, because the existing explicit transtyping operators never fail and never raise any kind of error — which can be dangerous (especially if used on user-supplied data), as they can return pretty much anything if used on garbage-data.

This RFC aims to add three functions, that would validate their input instead of blindly transtype it:

  • to_int() : would only accept integers, floats containing integer values that can fit in integers, or strings containing a textual representation of integers.
  • to_float() : would only accept floats, integers and strings representing floats.
  • and to_string() : would only accept strings, integers, floats and objects that can be transtyped to strings.

The question of the return-value of these functions has quickly been asked (typically, is false a value?), as well as the idea of having them throw exceptions — which could depend on an optional parameter.

Stas Malyshev noted some validation rules already exist in the ext/filter extension and completing them could be more interesting that adding yet another set of rules somewhere else. Using names such as to_int() could cause some confusion for users, as the behavior would not be the same as (int) and a name like lossless_int() might be more appropriate.

Discussions where not finished at the end of the month, and I’m guessing they will go one next month ;-)

Nikita Popov re-launched the RFC: Exceptions in the engine, which was originally targeting 5.6 — not all errors will be turned into exceptions, but a majority of them could be. As before, answers have been rather positive.

Thomas Gossmann asked whether it is possible to remove the function keyword from methods declarations. Levi Morrison quickly indicated this idea had been rejected before, mostly because it would make searching for definitions and code reading harder — while not bringing much to the table. He is not the only one who thinks that.

Votes have begun on RFC: loop + or control structure. With 4 “yes” and 11 “no” votes, it didn’t pass — Leigh, author of this RFC, explained why he, himself, voted “no”. We’ll see if someone else brings back this idea in the future.

Leigh announced the RFC: 64 bit format codes for pack() and unpack() had passed, with 100% “yes” votes.

Changes corresponding to the RFC: Catchable “call to a member function of a non-object”, which passed this summer, have been merged.

Davey Shafik reported an E_DERECATED warning is always raised, with PHP 5.6, if always_populate_raw_post_data is set to something else than -1 — and the default value is 0. It is yet the best possible compromise, with a default configuration value that doesn’t break compatibility, but also warns users this is a feature that will be removed in the future.

Following a question asked by Sebastian Bergmann, Adam Harvey set up the new Supported Versions page on php.net: it shows which versions of PHP are currently supported — and until when.

Stas Malyshev brought the RFC: Filtered unserialize() back on the stage. It aims to change the unserialize() function, so we can forbid objects unserialization, or limit them to a set of listed classes.

Miloslav Hůla wrote the RFC: Access to aliases definition by reflection, suggesting to expose namespaces aliases definitions (set up with use) to user-space, through the Reflection API.

In the middle of the month, Levi Morrison indicated he has a working implementation for RFC: Return Type Declarations and a few changes had been made to the RFC. Which means it’s likely votes will begin on this RFC soon.

internals@lists.php.net is the mailing-list of the developers of PHP; it is used to discuss the next evolutions of the language and to talk about enhancement suggestions or bug reports.
This mailing-list is public and anyone can subscribe from the page Mailing Lists, or read its archives using HTTP from php.internals or through the news server news://news.php.net/php.internals.

You liked this post? Spread the word!

This blog has recently been migrated to a static site generator and I haven't had time to setup a commenting solution yet.

With a bit of luck, I'll manage to take care of this in a few weeks ;-)