Skip to content

Commit 516679e

Browse files
aitapMichaelChirico
andcommitted
Drop direct use of NAMED and LEVELS (#6420)
* Drop direct use of NAMED Since data.table now depends on R >= 3.3, the backports are no longer needed. Moreover, MAYBE_SHARED is currently a function, while MAYBE_REFERENCED expands to !NO_REFERENCES (which is a function). In debugging output, show MAYBE_REFERENCED (NAMED > 0) instead of NAMED. * Almost drop direct use of LEVELS getCharCE appeared in R-2.7, making it possible to check for strings _marked_ as UTF-8 or Latin-1. There is no marking as ASCII, so fixing IS_ASCII will have to wait for R >= 4.5. * NEWS entry for NAMED, LEVELS reduction * fine-tune NEWS --------- Co-authored-by: Michael Chirico <chiricom@google.com>
1 parent 9bf9dd3 commit 516679e

File tree

4 files changed

+9
-17
lines changed

4 files changed

+9
-17
lines changed

NEWS.md

+2
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66

77
1. Fixed a typo in the NEWS for the last release -- that's version 1.16.0, not 1.6.0; apologies. Thanks @r2evans for flagging, [#6443](https://github.com/Rdatatable/data.table/issues/6443).
88

9+
2. Continued work to remove non-API C functions, [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks Ivan Krylov for the PR and for writing a clear and concise guide about the R API: https://aitap.codeberg.page/R-api/.
10+
911
# data.table [v1.16.0](https://github.com/Rdatatable/data.table/milestone/30) (25 August 2024)
1012

1113
## BREAKING CHANGES

src/assign.c

+3-3
Original file line numberDiff line numberDiff line change
@@ -545,12 +545,12 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values)
545545
(TYPEOF(values)!=VECSXP && i>0) // assigning the same values to a second column. Have to ensure a copy #2540
546546
) {
547547
if (verbose) {
548-
Rprintf(_("RHS for item %d has been duplicated because NAMED==%d MAYBE_SHARED==%d, but then is being plonked. length(values)==%d; length(cols)==%d)\n"),
549-
i+1, NAMED(thisvalue), MAYBE_SHARED(thisvalue), length(values), length(cols));
548+
Rprintf(_("RHS for item %d has been duplicated because MAYBE_REFERENCED==%d MAYBE_SHARED==%d, but then is being plonked. length(values)==%d; length(cols)==%d)\n"),
549+
i+1, MAYBE_REFERENCED(thisvalue), MAYBE_SHARED(thisvalue), length(values), length(cols));
550550
}
551551
thisvalue = copyAsPlain(thisvalue); // PROTECT not needed as assigned as element to protected list below.
552552
} else {
553-
if (verbose) Rprintf(_("Direct plonk of unnamed RHS, no copy. NAMED==%d, MAYBE_SHARED==%d\n"), NAMED(thisvalue), MAYBE_SHARED(thisvalue)); // e.g. DT[,a:=as.character(a)] as tested by 754.5
553+
if (verbose) Rprintf(_("Direct plonk of unnamed RHS, no copy. MAYBE_REFERENCED==%d, MAYBE_SHARED==%d\n"), MAYBE_REFERENCED(thisvalue), MAYBE_SHARED(thisvalue)); // e.g. DT[,a:=as.character(a)] as tested by 754.5
554554
}
555555
SET_VECTOR_ELT(dt, coln, thisvalue); // plonk new column in as it's already the correct length
556556
setAttrib(thisvalue, R_NamesSymbol, R_NilValue); // clear names such as DT[,a:=mapvector[a]]

src/bmerge.c

-3
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,6 @@ Differences over standard binary search (e.g. bsearch in stdlib.h) :
1717
o non equi joins (no != yet) since 1.9.8
1818
*/
1919

20-
#define ENC_KNOWN(x) (LEVELS(x) & 12)
21-
// 12 = LATIN1_MASK (1<<2) | UTF8_MASK (1<<3) // Would use these definitions from Defn.h, but that appears to be private to R. Hence 12.
22-
2320
#define EQ 1
2421
#define LE 2
2522
#define LT 3

src/data.table.h

+4-11
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,10 @@
3333
// #include <signal.h> // the debugging machinery + breakpoint aidee
3434
// raise(SIGINT);
3535

36-
#define IS_UTF8(x) (LEVELS(x) & 8)
37-
#define IS_ASCII(x) (LEVELS(x) & 64)
38-
#define IS_LATIN(x) (LEVELS(x) & 4)
36+
/* we mean the encoding bits, not CE_NATIVE in a UTF-8 locale */
37+
#define IS_UTF8(x) (getCharCE(x) == CE_UTF8)
38+
#define IS_LATIN(x) (getCharCE(x) == CE_LATIN1)
39+
#define IS_ASCII(x) (LEVELS(x) & 64) // API expected in R >= 4.5
3940
#define IS_TRUE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]==TRUE)
4041
#define IS_FALSE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]==FALSE)
4142
#define IS_TRUE_OR_FALSE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]!=NA_LOGICAL)
@@ -60,14 +61,6 @@
6061
// for use with CPLXSXP, no macro provided by R internals
6162
#define ISNAN_COMPLEX(x) (ISNAN((x).r) || ISNAN((x).i)) // TRUE if either real or imaginary component is NA or NaN
6263

63-
// Backport macros added to R in 2017 so we don't need to update dependency from R 3.0.0
64-
#ifndef MAYBE_SHARED
65-
# define MAYBE_SHARED(x) (NAMED(x) > 1)
66-
#endif
67-
#ifndef MAYBE_REFERENCED
68-
# define MAYBE_REFERENCED(x) ( NAMED(x) > 0 )
69-
#endif
70-
7164
// If we find a non-ASCII, non-NA, non-UTF8 encoding, we try to convert it to UTF8. That is, marked non-ascii/non-UTF8 encodings will
7265
// always be checked in UTF8 locale. This seems to be the best fix Arun could think of to put the encoding issues to rest.
7366
// Since the if-statement will fail with the first condition check in "normal" ASCII cases, there shouldn't be huge penalty issues in

0 commit comments

Comments
 (0)