Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative Path and x-plat I/O papercuts #64890

Open
joelmartinez opened this issue Feb 7, 2022 · 4 comments
Open

Relative Path and x-plat I/O papercuts #64890

joelmartinez opened this issue Feb 7, 2022 · 4 comments
Milestone

Comments

@joelmartinez
Copy link

On recommendation from @jeffhandley, opening an issue for discussion :) It started with this tweet:
https://twitter.com/joelmartinez/status/1490120089957552132

Is there a good #dotnet library/nuget for path manipulation and file I/O ... one that lets me easily deal with relative paths, the working directory, ensuring directories exist, and works well across platforms?

Feel like I always have to reinvent the wheel when doing that stuff.

Qualifying what I meant, I was referring to a type of application that I've often worked on ... some CLI tool. To be clear, none of these issues have been showstoppers, it's just that every time I build a tool, I inevitably forget all of the challenges that I've already solved before, and end up either rewriting the solutions entirely or copy/pasting from project to project. There's a chance that some of these have been solved by newer APIs that I'm just not familiar with.

I'm going to try to remember and list some of these things below:

  • Many times, I'm working with relative paths that might come from command line arguments, or configuration files:
    • I'll often have a "target directory", which can sometimes be different than the current working directory. It would be useful to be able to set some sort of context so I didn't have to manage my own context and send around that "root" path in my code.

      • Of course, now that I'm writing this out, I got curious and searched docs, and turns out I'd just literally never noticed this API to set the current directory ... that's on me 🙃 That's a useful trick I'll carry forward, but I've found myself having to do things in both target directories and the actual current working directory where the user ran the command. It'd be useful to not have to set some global state.
    • It's of course trivial to use Path.Combine(target, filename), but in some instances, there's nested files, and if I'm enumerating all the files in some directory (ie. Directory.EnumerateFiles (from, "*.*", SearchOption.AllDirectories)), then I have to do string manipulation to cut the from directory from that path.

    • There's no (easy) way to copy all the contents of a directory to another directory ... look at the example given here in the docs to copy all files in a directory to another directory. Sure it's not rocket science, but that sample is also only copying the contents of a single dir, and not enumerating all the subdirectories. To do that involves either introducing recursive logic, or SearchOption.AllDirectories, but then there's the pathing issues described in the previous bullet point.

    • There's no built-in way (that I'm aware of) to copy async. Lots of digital ink spilled across blogs and forums about how to do this in different ways through the years. At a minimum, a File.CopyAsync would be awesome to have. There's Stream.CopyToAsync that I thought was promising, but my first attempt to integrate this into my async-heavy program led me to issues where the stream would already be closed by the time the code happened to be invoked (I mixed yield return with async, I think was my problem). Bit of a pit-of-failure rather than pit-of-success here.

    • Once you start copying these relative and nested files, then you inevitably run into the fact that the file won't be created if the directory doesn't already exist. I wish there was at least an overload that created the directory, but I always end up writing some extension method like:

      /// <returns>The directory path</returns>
      public static string EnsureDirectory(this string dir)
      {
          if (!Directory.Exists (dir))
              Directory.CreateDirectory (dir);
      
          return dir;
      }
      
      /// <returns>The directory path</returns>
      public static string EnsureFileDirectory(this string value)
      {
          string dir = Path.GetDirectoryName(value);
          return dir.EnsureDirectory ();
      }
    • ~ can be confusing for users, because on shells like powershell and bash, passing in a path on the CLI will automatically expand that; sweet, that's great. But users will often then think to add something like ~/dev/blah in a config file thinking that will do the same thing, which it obviously won't. So I've written code to look for ~ at the beginning of a path, and replace it with something like Environment.GetFolderPath. In other cases, I've wanted ~ to be a shorthand for "the application root", or "the project root" ... so this one feels like a common-enough thing that there could be built-in support somewhere.

    • Talking about settings-provided paths ... users can be pretty inconsistent, sometimes they'll put relative paths that start with Path.DirectorySeparatorChar in the config file, which is problematic because this code Path.Combine ("some", "relative", "/path") returns /path 😱 So I often have to carefully scrub that user input so it plays nice with Path.Combine

  • x-plat stuff ... honestly it's been a hot minute since I've had to directly deal with some of these issues, but that might just be because some of my recent work was mostly just me and I haven't strongly exercised it anywhere but on this mac, so I'm guessing I'll probably have some of these issues if I were more thorough.
    • I remember having issues with the mixing of / and \ ... I usually end up writing some extension method that just .Replace the opposite separator char with Path.DirectorySeparatorChar, but I kind of wish I didn't have to worry about that and could just mix them all over the place since I'm usually combining paths that could be from different sources, ie. provided by a user on windows, but executed on a build agent on linux.

Honestly ... I know that I've had other x-plat papercuts, but like I said above, it's been a while and I'm having trouble remembering them. All I know is that I always think to myself, "I wish I didn't have to account for this" when my intent always feels relatively simple ("copy these files from here to there", for example).

Happy to chat more, or answer any questions anyone might have for me around these scenarios. Thanks!

Thanks @timheuer @jeffhandley and @khalidabuhakmeh for engaging on twitter :D

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.IO untriaged New issue has not been triaged by the area owner labels Feb 7, 2022
@ghost
Copy link

ghost commented Feb 7, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

On recommendation from @jeffhandley, opening an issue for discussion :) It started with this tweet:
https://twitter.com/joelmartinez/status/1490120089957552132

Is there a good #dotnet library/nuget for path manipulation and file I/O ... one that lets me easily deal with relative paths, the working directory, ensuring directories exist, and works well across platforms?

Feel like I always have to reinvent the wheel when doing that stuff.

Qualifying what I meant, I was referring to a type of application that I've often worked on ... some CLI tool. To be clear, none of these issues have been showstoppers, it's just that every time I build a tool, I inevitably forget all of the challenges that I've already solved before, and end up either rewriting the solutions entirely or copy/pasting from project to project. There's a chance that some of these have been solved by newer APIs that I'm just not familiar with.

I'm going to try to remember and list some of these things below:

  • Many times, I'm working with relative paths that might come from command line arguments, or configuration files:
    • I'll often have a "target directory", which can sometimes be different than the current working directory. It would be useful to be able to set some sort of context so I didn't have to manage my own context and send around that "root" path in my code.

      • Of course, now that I'm writing this out, I got curious and searched docs, and turns out I'd just literally never noticed this API to set the current directory ... that's on me 🙃 That's a useful trick I'll carry forward, but I've found myself having to do things in both target directories and the actual current working directory where the user ran the command. It'd be useful to not have to set some global state.
    • It's of course trivial to use Path.Combine(target, filename), but in some instances, there's nested files, and if I'm enumerating all the files in some directory (ie. Directory.EnumerateFiles (from, "*.*", SearchOption.AllDirectories)), then I have to do string manipulation to cut the from directory from that path.

    • There's no (easy) way to copy all the contents of a directory to another directory ... look at the example given here in the docs to copy all files in a directory to another directory. Sure it's not rocket science, but that sample is also only copying the contents of a single dir, and not enumerating all the subdirectories. To do that involves either introducing recursive logic, or SearchOption.AllDirectories, but then there's the pathing issues described in the previous bullet point.

    • There's no built-in way (that I'm aware of) to copy async. Lots of digital ink spilled across blogs and forums about how to do this in different ways through the years. At a minimum, a File.CopyAsync would be awesome to have. There's Stream.CopyToAsync that I thought was promising, but my first attempt to integrate this into my async-heavy program led me to issues where the stream would already be closed by the time the code happened to be invoked (I mixed yield return with async, I think was my problem). Bit of a pit-of-failure rather than pit-of-success here.

    • Once you start copying these relative and nested files, then you inevitably run into the fact that the file won't be created if the directory doesn't already exist. I wish there was at least an overload that created the directory, but I always end up writing some extension method like:

      /// <returns>The directory path</returns>
      public static string EnsureDirectory(this string dir)
      {
          if (!Directory.Exists (dir))
              Directory.CreateDirectory (dir);
      
          return dir;
      }
      
      /// <returns>The directory path</returns>
      public static string EnsureFileDirectory(this string value)
      {
          string dir = Path.GetDirectoryName(value);
          return dir.EnsureDirectory ();
      }
    • ~ can be confusing for users, because on shells like powershell and bash, passing in a path on the CLI will automatically expand that; sweet, that's great. But users will often then think to add something like ~/dev/blah in a config file thinking that will do the same thing, which it obviously won't. So I've written code to look for ~ at the beginning of a path, and replace it with something like Environment.GetFolderPath. In other cases, I've wanted ~ to be a shorthand for "the application root", or "the project root" ... so this one feels like a common-enough thing that there could be built-in support somewhere.

    • Talking about settings-provided paths ... users can be pretty inconsistent, sometimes they'll put relative paths that start with Path.DirectorySeparatorChar in the config file, which is problematic because this code Path.Combine ("some", "relative", "/path") returns /path 😱 So I often have to carefully scrub that user input so it plays nice with Path.Combine

  • x-plat stuff ... honestly it's been a hot minute since I've had to directly deal with some of these issues, but that might just be because some of my recent work was mostly just me and I haven't strongly exercised it anywhere but on this mac, so I'm guessing I'll probably have some of these issues if I were more thorough.
    • I remember having issues with the mixing of / and \ ... I usually end up writing some extension method that just .Replace the opposite separator char with Path.DirectorySeparatorChar, but I kind of wish I didn't have to worry about that and could just mix them all over the place since I'm usually combining paths that could be from different sources, ie. provided by a user on windows, but executed on a build agent on linux.

Honestly ... I know that I've had other x-plat papercuts, but like I said above, it's been a while and I'm having trouble remembering them. All I know is that I always think to myself, "I wish I didn't have to account for this" when my intent always feels relatively simple ("copy these files from here to there", for example).

Happy to chat more, or answer any questions anyone might have for me around these scenarios. Thanks!

Thanks @timheuer @jeffhandley and @khalidabuhakmeh for engaging on twitter :D

Author: joelmartinez
Assignees: -
Labels:

area-System.IO, untriaged

Milestone: -

@jozkee jozkee added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Mar 16, 2022
@adamsitnik
Copy link
Member

There's no (easy) way to copy all the contents of a directory to another directory

We are soon (7.0) going to introduce Directory.Copy APIs: #60903 (#62375)

Once you start copying these relative and nested files,

Would that be still needed with Directory.Copy?

There's no built-in way (that I'm aware of) to copy async.

FileStream implements CopyToAsync:

static async Task CopyToAsync(string source, string destination)
{
    using FileStream src = new (source, FileMode.Open, FileAccess.Read, FileShare.None, 0, true);
    using FileStream dst = new (destination, new FileStreamOptions()
    {
        Access = FileAccess.Write,
        Share = FileShare.None,
        Mode = FileMode.Create,
        BufferSize = 0,
        Options = FileOptions.Asynchronous,
        PreallocationSize = src.Length
    });
    await src.CopyToAsync(dst);
}

but I agree that we could introduce File.CopyToAsync to make it easier for the end users.

if (!Directory.Exists (dir)) Directory.CreateDirectory (dir);

The exist check is redundant, Directory.CreateDirectory already takes care of that (Unix, Windows)

@jozkee
Copy link
Member

jozkee commented Mar 18, 2022

@joelmartinez thanks for taking the time of explaining your pain-points when using our APIs, I would like to add to @adamsitnik's comment my own thoughts about it.
In some of the points you make there are existing issues but others represent undiscussed ideas; I think this issue can be breakdown into smaller ones that can be more easily tackled. I will try to do that before closing this one.

I'll often have a "target directory", which can sometimes be different than the current working directory. It would be useful to be able to set some sort of context so I didn't have to manage my own context and send around that "root" path in my code.

I've found myself having to do things in both target directories and the actual current working directory where the user ran the command. It'd be useful to not have to set some global state.

There are many ways you can deal with this. The simplest one is probably the one you mentioned using Path.Combine().
Nevertheless, we could provide a type that fakes a "temp working directory" and handles all path manipulation for you + provide relevant IO operations, see this sketch class https://gist.github.com/Jozkee/c80c841eb6231a713094c8d6a85b0e72
And as alternative to that, we could add extension methods to DirectoryInfo to achieve the same.
I don't know if this is something worth adding but at least is worth discussing so we should create an issue with the API proposal.

Once you start copying these relative and nested files, then you inevitably run into the fact that the file won't be created if the directory doesn't already exist. I wish there was at least an overload that created the directory, but I always end up writing some extension method like:

We can address that in the same API proposed above by adding a method, say CreateFile(bool createDirectory), to create the directory, if not exists, before creating the file.

~ can be confusing for users, because on shells like powershell and bash, passing in a path on the CLI will automatically expand that; sweet, that's great. But users will often then think to add something like ~/dev/blah in a config file thinking that will do the same thing, which it obviously won't. So I've written code to look for ~ at the beginning of a path, and replace it with something like Environment.GetFolderPath. In other cases, I've wanted ~ to be a shorthand for "the application root", or "the project root" ... so this one feels like a common-enough thing that there could be built-in support somewhere.

I doubt we could implement some built-in support for expanding the tilde character in .NET, it would represent a huge breaking change, but is doable and it could be something that you may need to opt-in via an env. var. or something similar.

Talking about settings-provided paths ... users can be pretty inconsistent, sometimes they'll put relative paths that start with Path.DirectorySeparatorChar in the config file, which is problematic because this code Path.Combine ("some", "relative", "/path") returns /path 😱 So I often have to carefully scrub that user input so it plays nice with Path.Combine

Would it be better if Path.Combite throws (or indicated that something was wrong) in such case? if so you could argue for an API (say bool Path.TryCombine(...)) where there is strictness when combining paths.

I remember having issues with the mixing of / and \ ... I usually end up writing some extension method that just .Replace the opposite separator char with Path.DirectorySeparatorChar, but I kind of wish I didn't have to worry about that and could just mix them all over the place since I'm usually combining paths that could be from different sources, ie. provided by a user on windows, but executed on a build agent on linux.

Similar-ish issues:
#28263
#25011

I would like to think that the problem is when you use Windows paths (\) on Unix and not the other way around. It represents a breaking change on Unix and may be something that needs to be throughly discussed and make opt-in.

@adamsitnik adamsitnik removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 18, 2022
@adamsitnik adamsitnik added this to the Future milestone Mar 18, 2022
@Symbai
Copy link

Symbai commented Nov 30, 2023

  • here's no built-in way (that I'm aware of) to copy async.

I was going to make a API suggestion for this. Because we already HAVE a File.WriteAllBytesAsync which calls an internal WriteToFileAsync method. Because copy is either an instant move or "write all bytes to the target and only when completed delete the source file" I don't really see why File.CopyAsync cannot just use WriteToFileAsync in the latter case.

What am I missing?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants