Skip to content

Commit f608d4d

Browse files
committed
hyperlink: rejigger how hyperlinks work
This essentially takes the work done in BurntSushi#2483 and does a bit of a facelift. A brief summary: * We reduce the hyperlink API we expose to just the format, a configuration and an environment. * We move buffer management into a hyperlink-specific interpolator. * We expand the documentation on --hyperlink-format. * We rewrite the hyperlink format parser to be a simple state machine with support for escaping '{{' and '}}'. * We remove the 'gethostname' dependency and instead insist on the caller to provide the hostname. (So grep-printer doesn't get it itself, but the application will.) Similarly for the WSL prefix. * Probably some other things. Overall, the general structure of BurntSushi#2483 was kept. The biggest change is probably requiring the caller to pass in things like a hostname instead of having the crate do it. I did this for a couple reasons: 1. I feel uncomfortable with code deep inside the printing logic reaching out into the environment to assume responsibility for retrieving the hostname. This feels more like an application-level responsibility. Arguably, path canonicalization falls into this same bucket, but it is more difficult to rip that out. (And we can do it in the future in a backwards compatible fashion I think.) 2. I wanted to permit end users to tell ripgrep about their system's hostname in their own way, e.g., by running a custom executable. I want this because I know at least for my own use cases, I sometimes log into systems using an SSH hostname that is distinct from the system's actual hostname (usually because the system is shared in some way or changing its hostname is not allowed/practical). I think that's about it. Closes BurntSushi#665, Closes BurntSushi#2483
1 parent 23e2113 commit f608d4d

File tree

12 files changed

+1267
-764
lines changed

12 files changed

+1267
-764
lines changed

Cargo.lock

+1-68
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

complete/_rg

+1
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,7 @@ _rg() {
305305
'--debug[show debug messages]'
306306
'--field-context-separator[set string to delimit fields in context lines]'
307307
'--field-match-separator[set string to delimit fields in matching lines]'
308+
'--hostname-bin=[executable for getting system hostname]:hostname executable:_command_names -e'
308309
'--hyperlink-format=[specify pattern for hyperlinks]:pattern'
309310
'--trace[show more verbose debug messages]'
310311
'--dfa-size-limit=[specify upper size limit of generated DFA]:DFA size (bytes)'

crates/core/app.rs

+82-7
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
580580
flag_glob_case_insensitive(&mut args);
581581
flag_heading(&mut args);
582582
flag_hidden(&mut args);
583+
flag_hostname_bin(&mut args);
583584
flag_hyperlink_format(&mut args);
584585
flag_iglob(&mut args);
585586
flag_ignore_case(&mut args);
@@ -1495,19 +1496,93 @@ This flag can be disabled with --no-hidden.
14951496
args.push(arg);
14961497
}
14971498

1499+
fn flag_hostname_bin(args: &mut Vec<RGArg>) {
1500+
const SHORT: &str = "Run a program to get this system's hostname.";
1501+
const LONG: &str = long!(
1502+
"\
1503+
This flag controls how ripgrep determines this system's hostname. The flag's
1504+
value should correspond to an executable (either a path or something that can
1505+
be found via your system's *PATH* environment variable). When set, ripgrep will
1506+
run this executable, with no arguments, and treat its output (with leading and
1507+
trailing whitespace stripped) as your system's hostname.
1508+
1509+
When not set (the default, or the empty string), ripgrep will try to
1510+
automatically detect your system's hostname. On Unix, this corresponds
1511+
to calling *gethostname*. On Windows, this corresponds to calling
1512+
*GetComputerNameExW* to fetch the system's \"physical DNS hostname.\"
1513+
1514+
ripgrep uses your system's hostname for producing hyperlinks.
1515+
"
1516+
);
1517+
let arg =
1518+
RGArg::flag("hostname-bin", "COMMAND").help(SHORT).long_help(LONG);
1519+
args.push(arg);
1520+
}
1521+
14981522
fn flag_hyperlink_format(args: &mut Vec<RGArg>) {
14991523
const SHORT: &str = "Set the format of hyperlinks to match results.";
15001524
const LONG: &str = long!(
15011525
"\
1502-
Set the format of hyperlinks to match results. This defines a pattern which
1503-
can contain the following placeholders: {file}, {line}, {column}, and {host}.
1504-
An empty pattern or 'none' disables hyperlinks.
1526+
Set the format of hyperlinks to match results. Hyperlinks make certain elements
1527+
of ripgrep's output, such as file paths, clickable. This generally only works
1528+
in terminal emulators that support OSC-8 hyperlinks. For example, the format
1529+
*file://{host}{file}* will emit an RFC 8089 hyperlink.
1530+
1531+
The following variables are available in the format string:
1532+
1533+
*{path}*: Required. This is replaced with a path to a matching file. The
1534+
path is guaranteed to be absolute and percent encoded such that it is valid to
1535+
put into a URI. Note that a path is guaranteed to start with a */*.
1536+
1537+
*{host}*: Optional. This is replaced with your system's hostname. On Unix,
1538+
this corresponds to calling *gethostname*. On Windows, this corresponds to
1539+
calling *GetComputerNameExW* to fetch the system's \"physical DNS hostname.\"
1540+
Alternatively, if --hostname-bin was provided, then the hostname returned from
1541+
the output of that program will be returned. If no hostname could be found,
1542+
then this variable is replaced with the empty string.
1543+
1544+
*{line}*: Optional. If appropriate, this is replaced with the line number of
1545+
a match. If no line number is available (for example, if --no-line-number was
1546+
given), then it is automatically replaced with the value *1*.
1547+
1548+
*{column}*: Optional, but requires the presence of **{line}**. If appropriate,
1549+
this is replaced with the column number of a match. If no column number is
1550+
available (for example, if --no-column was given), then it is automatically
1551+
replaced with the value *1*.
1552+
1553+
*{wslprefix}*: Optional. This is a special value that is set to
1554+
*wsl$/WSL_DISTRO_NAME*, where *WSL_DISTRO_NAME* corresponds to the value of
1555+
the equivalent environment variable. If the system is not Unix or if the
1556+
*WSL_DISTRO_NAME* environment variable is not set, then this is replaced with
1557+
the empty string.
1558+
1559+
Alternatively, a format string may correspond to one of the following
1560+
aliases: default, file, grep+, kitty, macvim, none, subl, textmate, vscode,
1561+
vscode-insiders, vscodium.
1562+
1563+
A format string may be empty. An empty format string is equivalent to the
1564+
*none* alias. In this case, hyperlinks will be disabled.
1565+
1566+
At present, the default format when ripgrep detects a tty on stdout all systems
1567+
is *default*. This is an alias that expands to *file://{host}{path}* on Unix
1568+
and *file://{path}* on Windows. When stdout is not a tty, then the default
1569+
format behaves as if it were *none*. That is, hyperlinks are disabled.
1570+
1571+
Note that hyperlinks are only written when colors are enabled. To write
1572+
hyperlinks without colors, you'll need to configure ripgrep to not colorize
1573+
anything without actually disabling all ANSI escape codes completely:
1574+
1575+
--colors 'path:none' --colors 'line:none' --colors 'column:none' --colors 'match:none'
15051576
1506-
The {file} placeholder is required, and will be replaced with the absolute
1507-
file path with a few adjustments: The leading '/' on Unix is removed,
1508-
and '\\' is replaced with '/' on Windows.
1577+
ripgrep works this way because it treats the *--color=(never|always|auto)* flag
1578+
as a proxy for whether ANSI escape codes should be used at all. This means
1579+
that environment variables like *NO_COLOR=1* and *TERM=dumb* not only disable
1580+
colors, but hyperlinks as well. Similarly, colors and hyperlinks are disabled
1581+
when ripgrep is not writing to a tty. (Unless one forces the issue by setting
1582+
*--color=always*.)
15091583
1510-
As an example, the default pattern on Unix systems is: 'file://{host}/{file}'
1584+
For more information on hyperlinks in terminal emulators, see:
1585+
https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda
15111586
"
15121587
);
15131588
let arg =

crates/core/args.rs

+122-11
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@ use grep::pcre2::{
1818
RegexMatcherBuilder as PCRE2RegexMatcherBuilder,
1919
};
2020
use grep::printer::{
21-
default_color_specs, ColorSpecs, HyperlinkPattern, JSONBuilder,
22-
PathPrinter, PathPrinterBuilder, Standard, StandardBuilder, Stats,
23-
Summary, SummaryBuilder, SummaryKind, JSON,
21+
default_color_specs, ColorSpecs, HyperlinkConfig, HyperlinkEnvironment,
22+
HyperlinkFormat, JSONBuilder, PathPrinter, PathPrinterBuilder, Standard,
23+
StandardBuilder, Stats, Summary, SummaryBuilder, SummaryKind, JSON,
2424
};
2525
use grep::regex::{
2626
RegexMatcher as RustRegexMatcher,
@@ -236,7 +236,7 @@ impl Args {
236236
let mut builder = PathPrinterBuilder::new();
237237
builder
238238
.color_specs(self.matches().color_specs()?)
239-
.hyperlink_pattern(self.matches().hyperlink_pattern()?)
239+
.hyperlink(self.matches().hyperlink_config()?)
240240
.separator(self.matches().path_separator()?)
241241
.terminator(self.matches().path_terminator().unwrap_or(b'\n'));
242242
Ok(builder.build(wtr))
@@ -774,7 +774,7 @@ impl ArgMatches {
774774
let mut builder = StandardBuilder::new();
775775
builder
776776
.color_specs(self.color_specs()?)
777-
.hyperlink_pattern(self.hyperlink_pattern()?)
777+
.hyperlink(self.hyperlink_config()?)
778778
.stats(self.stats())
779779
.heading(self.heading())
780780
.path(self.with_filename(paths))
@@ -814,7 +814,7 @@ impl ArgMatches {
814814
builder
815815
.kind(self.summary_kind().expect("summary format"))
816816
.color_specs(self.color_specs()?)
817-
.hyperlink_pattern(self.hyperlink_pattern()?)
817+
.hyperlink(self.hyperlink_config()?)
818818
.stats(self.stats())
819819
.path(self.with_filename(paths))
820820
.max_matches(self.max_count()?)
@@ -1126,11 +1126,21 @@ impl ArgMatches {
11261126
/// for the current system is used if the value is not set.
11271127
///
11281128
/// If an invalid pattern is provided, then an error is returned.
1129-
fn hyperlink_pattern(&self) -> Result<HyperlinkPattern> {
1130-
Ok(match self.value_of_lossy("hyperlink-format") {
1131-
Some(pattern) => HyperlinkPattern::from_str(&pattern)?,
1132-
None => HyperlinkPattern::default_file_scheme(),
1133-
})
1129+
fn hyperlink_config(&self) -> Result<HyperlinkConfig> {
1130+
let mut env = HyperlinkEnvironment::new();
1131+
env.host(hostname(self.value_of_os("hostname-bin")))
1132+
.wsl_prefix(wsl_prefix());
1133+
let fmt = match self.value_of_lossy("hyperlink-format") {
1134+
None => HyperlinkFormat::from_str("default").unwrap(),
1135+
Some(format) => match HyperlinkFormat::from_str(&format) {
1136+
Ok(format) => format,
1137+
Err(err) => {
1138+
let msg = format!("invalid hyperlink format: {err}");
1139+
return Err(msg.into());
1140+
}
1141+
},
1142+
};
1143+
Ok(HyperlinkConfig::new(env, fmt))
11341144
}
11351145

11361146
/// Returns true if ignore files should be processed case insensitively.
@@ -1838,6 +1848,107 @@ fn current_dir() -> Result<PathBuf> {
18381848
.into())
18391849
}
18401850

1851+
/// Retrieves the hostname that ripgrep should use wherever a hostname is
1852+
/// required. Currently, that's just in the hyperlink format.
1853+
///
1854+
/// This works by first running the given binary program (if present and with
1855+
/// no arguments) to get the hostname after trimming leading and trailing
1856+
/// whitespace. If that fails for any reason, then it falls back to getting
1857+
/// the hostname via platform specific means (e.g., `gethostname` on Unix).
1858+
///
1859+
/// The purpose of `bin` is to make it possible for end users to override how
1860+
/// ripgrep determines the hostname.
1861+
fn hostname(bin: Option<&OsStr>) -> Option<String> {
1862+
let Some(bin) = bin else { return platform_hostname() };
1863+
let bin = match grep::cli::resolve_binary(bin) {
1864+
Ok(bin) => bin,
1865+
Err(err) => {
1866+
log::debug!(
1867+
"failed to run command '{bin:?}' to get hostname \
1868+
(falling back to platform hostname): {err}",
1869+
);
1870+
return platform_hostname();
1871+
}
1872+
};
1873+
let mut cmd = process::Command::new(&bin);
1874+
cmd.stdin(process::Stdio::null());
1875+
let rdr = match grep::cli::CommandReader::new(&mut cmd) {
1876+
Ok(rdr) => rdr,
1877+
Err(err) => {
1878+
log::debug!(
1879+
"failed to spawn command '{bin:?}' to get \
1880+
hostname (falling back to platform hostname): {err}",
1881+
);
1882+
return platform_hostname();
1883+
}
1884+
};
1885+
let out = match io::read_to_string(rdr) {
1886+
Ok(out) => out,
1887+
Err(err) => {
1888+
log::debug!(
1889+
"failed to read output from command '{bin:?}' to get \
1890+
hostname (falling back to platform hostname): {err}",
1891+
);
1892+
return platform_hostname();
1893+
}
1894+
};
1895+
let hostname = out.trim();
1896+
if hostname.is_empty() {
1897+
log::debug!(
1898+
"output from command '{bin:?}' is empty after trimming \
1899+
leading and trailing whitespace (falling back to \
1900+
platform hostname)",
1901+
);
1902+
return platform_hostname();
1903+
}
1904+
Some(hostname.to_string())
1905+
}
1906+
1907+
/// Attempts to get the hostname by using platform specific routines. For
1908+
/// example, this will do `gethostname` on Unix and `GetComputerNameExW` on
1909+
/// Windows.
1910+
fn platform_hostname() -> Option<String> {
1911+
let hostname_os = match grep::cli::hostname() {
1912+
Ok(x) => x,
1913+
Err(err) => {
1914+
log::debug!("could not get hostname: {}", err);
1915+
return None;
1916+
}
1917+
};
1918+
let Some(hostname) = hostname_os.to_str() else {
1919+
log::debug!(
1920+
"got hostname {:?}, but it's not valid UTF-8",
1921+
hostname_os
1922+
);
1923+
return None;
1924+
};
1925+
Some(hostname.to_string())
1926+
}
1927+
1928+
/// Returns a value that is meant to fill in the `{wslprefix}` variable for
1929+
/// a user given hyperlink format. A WSL prefix is a share/network like thing
1930+
/// that is meant to permit Windows applications to open files stored within
1931+
/// a WSL drive.
1932+
///
1933+
/// If a WSL distro name is unavailable, not valid UTF-8 or this isn't running
1934+
/// in a Unix environment, then this returns None.
1935+
///
1936+
/// See: <https://learn.microsoft.com/en-us/windows/wsl/filesystems>
1937+
fn wsl_prefix() -> Option<String> {
1938+
if !cfg!(unix) {
1939+
return None;
1940+
}
1941+
let distro_os = env::var_os("WSL_DISTRO_NAME")?;
1942+
let Some(distro) = distro_os.to_str() else {
1943+
log::debug!(
1944+
"found WSL_DISTRO_NAME={:?}, but value is not UTF-8",
1945+
distro_os
1946+
);
1947+
return None;
1948+
};
1949+
Some(format!("wsl$/{distro}"))
1950+
}
1951+
18411952
/// Tries to assign a timestamp to every `Subject` in the vector to help with
18421953
/// sorting Subjects by time.
18431954
fn load_timestamps<G>(

crates/printer/Cargo.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ serde = ["dep:base64", "dep:serde", "dep:serde_json"]
2121
[dependencies]
2222
base64 = { version = "0.21.4", optional = true }
2323
bstr = "1.6.2"
24-
gethostname = "0.4.3"
2524
grep-matcher = { version = "0.1.6", path = "../matcher" }
2625
grep-searcher = { version = "0.1.11", path = "../searcher" }
26+
log = "0.4.5"
2727
termcolor = "1.3.0"
2828
serde = { version = "1.0.188", optional = true, features = ["derive"] }
2929
serde_json = { version = "1.0.107", optional = true }

0 commit comments

Comments
 (0)