Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'std.format.read.formattedRead' overloads to return a Tuple with values read #8647

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

iK4tsu
Copy link
Contributor

@iK4tsu iK4tsu commented Dec 10, 2022

This enhancement was mentioned in a Discord discussion as a potentially better alternative to the existing std.file.slurp. Although std.file.slurp does a similar thing, it is restricted to reading from files. This feature could be extended to any input range and replace the currently implemented in std.file since that one "combines multiple things together that would be more useful if they were separate and composable" (quoted from @pbackus).

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @iK4tsu! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8647"

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Dec 10, 2022

It seems the documentation is not style-compliant. How do I document the template and the function correctly? I tried to look for existing examples but couldn't find any.

@pbackus
Copy link
Contributor

pbackus commented Dec 10, 2022

You need to document both the outer template and the inner function.

@iK4tsu iK4tsu force-pushed the feat-formattedRead-tuple-return branch from 5b7cb0c to 1870b04 Compare December 10, 2022 19:07
@iK4tsu
Copy link
Contributor Author

iK4tsu commented Dec 10, 2022

update: autosquashed commits since CI passed

Copy link
Member

@ljmf00 ljmf00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach. I'm basically in favor of this. I think it's worth adding a changelog entry for this. Can you also please add a unittest for a CTFE formattedRead?

@ljmf00 ljmf00 added Needs Changelog A changelog entry needs to be added to /changelog Needs Approval Atila Neves labels Dec 10, 2022
std/format/read.d Outdated Show resolved Hide resolved
import std.typecons : tuple;

auto expected = tuple("hello", 124, 34.5);
auto result = "hello!124:34.5".formattedRead!("%s!%s:%s", string, int, double);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the format string can be checked at compile-time, why would one need to pass in types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format string is not responsible for unformatting into the types the user wants, it's only to know the places in the input string that need to be unformatted and to tell the formatted the restrictions of the characters it's reading. Using %f can still be valid for both float and double for example, and it asserts statically when the types are not format-compatible.

float value;
"123".formattedRead!"%d"(value);
// Error: static assert:  "incompatible format character for floating point argument: %d"

It also asserts on orphan format specifiers or if the number of arguments is higher than the required by the format string. This is useful for generic code when we don't have total control over the argument quantity or the format string. The format string might be generated during compile time causing it to assert if some malformation occurs, or the variadic template arguments might be calculated at compile time asserting for the same reasons.

Copy link
Contributor Author

@iK4tsu iK4tsu Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think I found a bug in checkFormatException for the %f format.

int i;
"123".formattedRead!"%f"(i);
// should static assert but instead throws at run time with:
// std.format.FormatException@/dlang/dmd/linux/bin64/../../src/phobos/std/format/internal/read.d(104): Wrong unformat specifier '%f' for int

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think I found a bug in checkFormatException for the %f format.

int i;
"123".formattedRead!"%f"(i);
// should static assert but instead throws at run time with:
// std.format.FormatException@/dlang/dmd/linux/bin64/../../src/phobos/std/format/internal/read.d(104): Wrong unformat specifier '%f' for int

Could you please file a bug report for this?

@pbackus
Copy link
Contributor

pbackus commented Dec 12, 2022

Just leaving the tuple members default-initialized when there's nothing to read is not good enough, because it gives the user no way to distinguish between a default value read from the input and a default value returned due to incorrectly formatted input. For example:

auto result1 = formattedRead!int("0", "%d");
auto result2 = formattedRead!int("", "%d");
assert(result1 == result2); // no difference

We need some other way of signalling an incomplete read in situations like this.

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Dec 12, 2022

We could extend the type to return the number of arguments read as the first tuple argument. But that would defeat the practicality of this, and kinda force the user to handle the value, even if it means mapping it out.

We also could flip the signature and add an optional out parameter for the number of arguments read. But for that, I think the user might as well just use the already existing overload that receives a tuple since they would have to handle the value separately anyway.

Tuple!(int, float) values;
if (!"1 2.125".formattedRead("%s %s", values) == 2) ...
...

Or perhaps adding a sentinel template argument for that: countReadArgs or something like that. But I think this other way to distinguish these cases should be something that does not take away the practicality of the implementation and should not get in the way of the user/be forced upon them.

@pbackus
Copy link
Contributor

pbackus commented Dec 12, 2022

Another option is to throw an exception—iirc this is what slurp does.

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Dec 12, 2022

Not really what slurp does. It does throw but only when the format is malformed (a format exception) or if there are remaining characters in the current line being parsed.
This throws:

writeFile("deleteme", "123 ");
scope(exit) removeFile("deleteme");
"deleteme".slurp!(int)("%d");
// object.Exception@/dlang/dmd/linux/bin64/../../src/phobos/std/file.d(5191): Trailing characters at the end of line: ` '

This is runs fine:

writeFile("deleteme", "123");
scope(exit) removeFile("deleteme");
assert("deleteme".slurp!(int, string)("%d %s").equal([tuple(123, "")]));

@ljmf00
Copy link
Member

ljmf00 commented Dec 16, 2022

Just leaving the tuple members default-initialized when there's nothing to read is not good enough, because it gives the user no way to distinguish between a default value read from the input and a default value returned due to incorrectly formatted input. For example:

auto result1 = formattedRead!int("0", "%d");
auto result2 = formattedRead!int("", "%d");
assert(result1 == result2); // no difference

We need some other way of signalling an incomplete read in situations like this.

I think the way is to throw an exception, but I would say to also provide an overload to allow the user to specify a tuple to avoid throwing and allocating on these cases.

@pbackus
Copy link
Contributor

pbackus commented Dec 16, 2022

@ljmf00 The existing overloads of formattedRead already accept a tuple as an argument; see the example here: https://github.com/dlang/phobos/blob/v2.101.1/std/format/read.d#L355-L361

@ljmf00
Copy link
Member

ljmf00 commented Dec 17, 2022

@ljmf00 The existing overloads of formattedRead already accept a tuple as an argument; see the example here: https://github.com/dlang/phobos/blob/v2.101.1/std/format/read.d#L355-L361

Ok cool. Then I think if the user wants to use this prettier version, my suggestion is to raise an exception. If the user wants to use it for control flow they have alternatives and exceptions are zero cost if not raised, so it's fine to do it.

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Dec 19, 2022

There's also the possibility of making it an optional choice too. Instead of always raising an exception we can have a flag to disable it (this would match the current slurp behavior).

formattedRead(Flag!"exhaustive" exhaustive = Yes.exhaustive, ...)
{
    ...
    auto argsRead = .formattedRead(...);
    static if (exhaustive) enforce(argsRead == Args.length)
    ...
} 

This would make the transition to deprecate slurp in favor of this easier I think.

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Dec 19, 2022

Another bug I found while playing with formattedRead: when using Tuple and the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:

Tuple!(int, string) tup;
"123".formattedRead!"%d %s"(tup);
// Error: static assert:  "Orphan format specifier: %s"

The non-templated version executes just fine.

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Dec 20, 2022

With only an exception being thrown there's no real way to handle failures. The best I could think of is:

only("123 num", "123")
    .formattedRead!(int, string)("%d %s")
    .map!nullable
    .handle!(<ExceptionName>, RangePrimitive.front, (e, r) => typeof(r.front).init );
    // [Tuple!(int, string)(123, "num"), Nullable.null]

But if the exception also stored info on how many arguments were read and the tuple result then the user could handle it better if they wanted to. There may be a high possibility the user would like to keep the read arguments.

@ljmf00
Copy link
Member

ljmf00 commented Dec 23, 2022

Another bug I found while playing with formattedRead: when using Tuple and the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:

Tuple!(int, string) tup;
"123".formattedRead!"%d %s"(tup);
// Error: static assert:  "Orphan format specifier: %s"

The non-templated version executes just fine.

File a bug report.

@ljmf00
Copy link
Member

ljmf00 commented Dec 23, 2022

@atilaneves we need to decide an acceptable design for this feature. What's your thoughts about the ideas we discussed above?

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Jan 4, 2023

Another bug I found while playing with formattedRead: when using Tuple and the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:

Tuple!(int, string) tup;
"123".formattedRead!"%d %s"(tup);
// Error: static assert:  "Orphan format specifier: %s"

The non-templated version executes just fine.

File a bug report.

done: #8661

@ljmf00 ljmf00 requested a review from atilaneves January 5, 2023 18:50
@atilaneves
Copy link
Contributor

I agree with @pbackus and think that throwing an exception is the right way to deal with missing data.

@ljmf00
Copy link
Member

ljmf00 commented Jan 23, 2023

I agree with @pbackus and think that throwing an exception is the right way to deal with missing data.

I think it makes sense too. So you are in favor of the overall change with the particularity of using an exception, on these cases?

@iK4tsu can you add a changelog entry so we can get this in?

@ljmf00
Copy link
Member

ljmf00 commented Jan 23, 2023

Also, use an exception, as suggested by @atilaneves

@iK4tsu iK4tsu force-pushed the feat-formattedRead-tuple-return branch from 1870b04 to 4489cd8 Compare March 22, 2023 21:17
@iK4tsu
Copy link
Contributor Author

iK4tsu commented Mar 22, 2023

Sorry for the delay on this. Now both functions optionally throw a FormatException if the format range ends early and not all arguments are filled. This optional behavior is activated by default but can be ignored by passing No.exhaustive as an argument. Is this a good compromise, or is it best if the exception is always enforced?

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Mar 22, 2023

can you add a changelog entry so we can get this in?

I will add a changelog entry when the final design gets the green light. That way I don't have to update it every time I change the design here.

@iK4tsu iK4tsu force-pushed the feat-formattedRead-tuple-return branch from 4489cd8 to 71f05e6 Compare March 22, 2023 21:26
@iK4tsu
Copy link
Contributor Author

iK4tsu commented Mar 22, 2023

update: rebased with master

@pbackus
Copy link
Contributor

pbackus commented Mar 22, 2023

What's the use-case for No.exhaustive? If the user wants to handle partial reads, it seems to me that they should use the overload that actually reports how much was read.

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Mar 23, 2023

What's the use-case for No.exhaustive? If the user wants to handle partial reads, it seems to me that they should use the overload that actually reports how much was read.

The main usage is to comply with the current slurp implementation. Currently slurp does nothing if the range ends early and arguments are not filled. By following slurp's implementation, it can be replaced by this version of formattedRead without breaking changes.

The other usage is when the user does not care about the number of args filled at all but wants to chain the call. With the overload, they would ignore the result but they would be forced to declare all default initialized variables beforehand. With the forced exception, they would be forced to use std.exception.handle.

By being optional it still follows the current design. The "error" value can be ignored, and all unfilled arguments remain unchanged, in this case, with their default initializers, while providing a compatible way to replace the alternative/s.

One issue remains IMO if the exception is indeed thrown and caught. The user knows something happened with the format, but not in specific. If it failed due to unfilled arguments, the user won't know it - by the FormatException - nor how many were filled. Should I change the Exception to a more specific one that also stores how many arguments were filled? It could potentially store the tuple with all arguments as well, but that would require it to be a templated exception. Just throwing an exception doesn't really match what can be done with the overload.

@pbackus
Copy link
Contributor

pbackus commented Mar 23, 2023

The main usage is to comply with the current slurp implementation.

IMO the way slurp handles partial reads is a mistake (and arguably a bug), not something we should copy.

The other usage is when the user does not care about the number of args filled at all but wants to chain the call.

Under what circumstances, exactly, would a user call formattedRead without caring whether the data they asked for is actually read?

Should I change the Exception to a more specific one that also stores how many arguments were filled? It could potentially store the tuple with all arguments as well, but that would require it to be a templated exception.

I think the plain FormatException is fine. The purpose of this new overload is to make things more convenient in the common case, not to cover every possible use case. Most of the time, the outcomes you care about are "all arguments read successfully" and "something went wrong." If a user wants more detailed information, they can use the existing overloads.

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Mar 27, 2023

IMO the way slurp handles partial reads is a mistake (and arguably a bug), not something we should copy.

Well, but wouldn't that make it harder to deprecate slurp in favor of this by Phobos' standards?

Under what circumstances, exactly, would a user call formattedRead without caring whether the data they asked for is actually read?

The same situations where users write code like:

int a, b, c;

// don't care about the output
// don't care if the format ends early
formattedRead!someFormat(a, b, c);

// use a b c

or any slurp usage.

But once again, this was more to facilitate the transition from one to another and to keep the existing nonthrowing logic of the other overloads.

@pbackus
Copy link
Contributor

pbackus commented Mar 27, 2023

Well, but wouldn't that make it harder to deprecate slurp in favor of this by Phobos' standards?

As far as I know there is no plan to deprecate or remove slurp.

If you mean, would it make it harder to refactor slurp to use the new formattedRead overloads internally—yes, it would. But I would rather have a good API for future users of the new formattedRead overloads and force slurp to use a workaround than have a good API for slurp and force future users of formattedRead to avoid a footgun (which is what No.exhaustive would be).

The same situations where users write code like:

int a, b, c;

// don't care about the output
// don't care if the format ends early
formattedRead!someFormat(a, b, c);

// use a b c

In other words: situations where users write buggy code because the API makes it easy to do the wrong thing. :)

Since we have the opportunity here to design a new API, I think we should try to avoid making it easy to do the wrong thing—which meas not including a "do the wrong thing" flag.

@iK4tsu
Copy link
Contributor Author

iK4tsu commented Mar 28, 2023

Since we have the opportunity here to design a new API, I think we should try to avoid making it easy to do the wrong thing—which meas not including a "do the wrong thing" flag.

Fair enough. I will change it and create a changelog entry when I have a bit of free time.

Copy link
Member

@ljmf00 ljmf00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Fix the style issues reported by buildkite

@LightBender LightBender added Salvage This PR needs work, but we want to keep it and it can be salvaged. and removed Atila Neves labels Oct 27, 2024
@LightBender
Copy link
Contributor

LightBender commented Oct 27, 2024

@iK4tsu if you can resolve the style issues found by buildkite, we'll put the auto-merge tag on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Approval Needs Changelog A changelog entry needs to be added to /changelog Salvage This PR needs work, but we want to keep it and it can be salvaged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants