-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: add a marker for strictly non-turing-complete functions #7890
Comments
This is an interesting idea, and it sounds like it would solve real world use cases to me. That being said, I feel like the surface area at which this interacts with the rest of the language is quite minimal. The one effect mentioned in the proposal would be maybe allowing operator overloading with these stricter rules - I think it's fair to regard that as a separate, subsequent proposal. (Technically this proposal is a sort-of-superset of |
I honestly see no reason why it couldn't be an external tool, apart from stuff like safe operator overloading. But I do see reason why this, tool or language extension, could benefit from being integrated into Zig as a standard thing. One way or another, people would have to write such complexity annotations for the stdlib and more. And if Zig or that external tool don't come with ready-to-be-used definitions for the stdlib, I see two things happening:
If this feature would be integrated as a language extension, stdlib support is pretty much guaranteed from version 1.0 onwards. And if it's an external tool, but an official one shipping with Zig builds, then development of the tool is more likely to be kept in sync with stdlib changes and additions as well. Plus, if it's added as an external tool feature first, people could try it out, find pain points, find more opportunities, and then maybe it'll later still be more tightly integrated into the language, depending on the gained experience. |
Ahem. You can't count from one to ten in Zig without using a It would be a nice proposal in a language that provides more high-level constructs with specific constraints. And to be really useful, the stricter (and much more difficult) version of the proposal with time complexity guarantees would be needed, IMO. Knowing that a function is finite is a rather loose upper bound. Factoring numbers by trial division is finite. |
Extension to the big-O-like performance-indication system proposed here: the ability to use, as a function in the cost formula, the complexity assigned to a particular other operation, so as to handle different implementations/implementations with complicated formulae of an operator or |
@zzyxyzz Oh, damn, i forgot that
Indeed. I see the bare
In the little idealistic world inside my head, every programming language would be powerful enough to run basic proofs over software. No matter whether it's a systems programming language or some small scripting language. And I think that in a language that can already tell you how much memory a specific function call would consume, having the ability to express time and possibly even memory costs would fit in nicely. @vkcz Yeah, that does sound useful. |
@zzyxyzz Once can do it with a for loop given the range is known at comptime. for ([_]u0{0} ** 10) |_, i| {
// use i + 1
} |
@Evrey Your hope of wide scale is in vain, because loop termination is not trivially provable (Halting problem). Further the verification scales quadratically with code size, so one needs special tooling like cogent (which works only for limited complexity).
|
As the title says, this is a proposal for a new type of function, which is guaranteed to not be turing-complete (TC). I think that this will be a highly useful addition to Zig, especially for professional software development in fields with very strict coding rules. But let's first see what y'all think before fleshing anything out.
Syntax-wise this would just appear as perhaps an extra keyword. This keyword could be anything:
finite
,pure
,nontc
,… I'll usefinite
in this proposal, but that's far from a final decision.Why?
A non-TC function is 100% guaranteed to finish computation in finite time. In other words: It never hangs. There are many situations where certain guarantees about worst-case code runtimes are desirable, such as…
Being non-TC further opens the doors for a hypothetical later addition of even stricter annotations:
nopanic
marker that guarantees that no code path down that function is able to trigger a panic. Though really it should be a marker that rules out allnoreturn
calls down the path. Extra tricky when it comes to reading or writing memory, as segfaults could happen, which are potentiallynoreturn
.finite
annotation can then be expanded to further restrict the worst-case time complexity. After all, exponential time complexity in a non-TC function may still effectively be the same as just hanging.Syntax Examples
Writing a guaranteed non-TC function will be as simple as that:
Assuming
nopanic
and time complexity annotations to be added later, it all may look like this:How?
I can see two ways of implementing the TC checks. One is simple, very strict, and is also a non-TC problem to solve. The other way, well, I'll come back to that later.
In computer science, there are a bunch of minimal toy programming languages for experimentation, among them
GOTO
,WHILE
, andLOOP
. Of these,LOOP
is the most powerful programming language that is non-TC. Its control structures are limited to counted loops, function calls, and branches. This will form the basis for the simple implementation.The Simple Strategy
The rules for a
finite fn
are as follows:finite fn
s.finite fn
cannot have any recursion with the simple non-TC check.finite fn
are:if
, it's justO(1)
.switch
, as it's best caseO(1)
, average caseO(log(n))
, and worst caseO(n)
, depending on what code the compiler generates. All finite.for
, which is equivalent toLOOP
: A counted loop withO(n)
time complexity. All variants are allowed: Labelledfor
, normalfor
,for else
,inline for
.break
, as it just cutsfor
loops short.continue
, as it cannot increase the runtime of afor
loop. Just cuts an iteration of a loop body short.defer
anderrdefer
, also justO(1)
.catch
andtry
, which are just glorifiedif
s.suspend
andresume
. They are pretty much calls and returns with a bit of magic attached.In other words, the only disallowed control structures are
while
and calling non-TC functions.while
is capable of creating infinite loops, just as unbounded recursion is. Recursion is ruled out, as checking recursion bounds is the same problem to solve as checking bounds on awhile
loop.The Complex Strategy
To allow
while
loops and recursion, we could in theory add a theorem prover and run it for a maximum amount of time. I.e. we could try solving the halting problem. This could be combined with the already existing@setEvalBranchQuota
setting.Complexity Bounds
I already annotated the non-TC control structures with their time complexities. So in theory it shouldn't be too difficult to later add syntax for complexity bounds that the compiler checks against your code. What this syntax may look like is open for debate, however.
finite(a * b)
— quadratic-ish time based on e.g. two slicesfinite(min(a, b))
— linear time, e.g. a slice equality checkfinite(a + b)
— linear time, e.g. walking over two slices in a sequence or in lock stepa
as input, should we takea
-the-bound to mean the length ofa
-the-slice?x
, it's justx
in the time complexity bounds.finite(x.a + k)
, for example.finite
rules okay? Or do we just offer pseudo functions for stuff likemin
andsqrt
andlog
? I think pseudo will do, but calling any functions for e.g. querying container lengths would be neat.finite(_)
— linear time, unknown bound, e.g. for astrlen
impl that goes on until e.g. the calculated length overflows or the code segfaultsfinite(n * m)
finite(sqrt(n))
finite(linear)
finite(const)
finite(log)
Of course there's also the possibility to not just specify time complexity, but also memory complexity. Just as a potentially hanging function thanks to user inputs is undesirable, so is a function OOMing the process because of questionable user inputs. However, figuring out rules for computing memory complexity is yet another beast to tackle that is way out of scope for this proposal here.
It may be favourable to go for the generic keywords for generic complexities, possibly composable as pseudo functions like
log(log)
. This could make it much easier to compare the computed complexity of a function vs. the annotated one. It's allowed to write a function body that does the same or better as the given annotation, but doing worse is not allowed and triggers a compile error.Interoperability
Any
fn
can call anyfinite fn
. This also means that when assigning function pointers to variables, anyfinite fn
can be assigned to anyfn
pointer variable, as long as the type signatures match. No explicit type casting needed. However, the opposite is not true.Other Potential Applications
Zig preaches the »no surprise control flow« rule. That's the reason why operators aren't overloadable. I'm not a fan of that, I often write linear algebra code, and doing so without operators is tedious and hard to read. However, I understand the reasoning behind forbidden overloads and agree in general.
I think a
finite fn
would open up a possibility for a good operator overloading compromise. Potentially evennopanic finite fn
. Given that, when you see operators put to use in some linear algebra code, you may no longer know whether or not a function call may happen. But at least you can be comfortable that the operator code will finish quickly and that it won't crash your program. The rules can be arbitrarily harsh, e.g.finite(linear) fn operator_add(…) …
, which would allow loop-based scalar code for vectors, orO(1)
SIMD code.Downsides
Of course that's a bit to add to the Zig compiler. Once a
finite fn
is encountered, it has to walk down the AST looking for disallowed control flow structures, worst case throwing a theorem prover at the AST. Theorem prover or not, this will increase compile times.Further more, to really make this useful at all,
finite
annotations have to be added all over the stdlib. This will likely take way more time than implementing the new syntax and may even be the biggest deal-breaker.Zig also added async functions lately, in a way that strictly avoids »coloured functions«. However, this non-TC proposal here has the very purpose of adding »coloured functions«. Though it shall be noted that magical rules only apply when a non-TC function wants to call another function. A normal function can still call whatever it wants to however it wants to.
Upsides
I will, however, highlight the potential gains again. Adding
finite
and maybe evennopanic
will greatly help users in developing robust and secure software. It'll make accidental turing-completeness a thing of the past and thus greatly reduce vectors for DoS attacks.Together with time complexity bounds, designing realtime systems in Zig will be easier than ever. This would not only be interesting for hard realtime systems such as spaceprobe operating systems, but also for the development of soft realtime systems, such as video game engines.
finite
can also be utilised as a hint forcomptime
evaluation, where the compiler may just blindly evaluate all e.g. worst-case quadratic time functions, independently of whatever@setEvalBranchQuota
says.If, much later, we may add a theorem prover to also allow complicated non-TC checks to allow recursion or
while
, this opens the door for the addition of contracts.requires
,ensures
, andinvariant
annotations that must befinite
expressions and would be checked and enforced at comptime, zero runtime assertions generated. Comptime contracts would allow for a lot more code optimisations and even robuster Zig code.Open Questions
External Functions
When declaring external functions, it may be useful to be able to add
finite
rules, although the compiler is not able to check whether these rules are actually true. One example case would be adding something likefinite(linear)
tostrlen
. In general this would also be useful for annotating intrinsics, like SIMD intrinsics. (Did I mention that adding rules to the stdlib will be a potentially massive amount of work?)How should external annotations be added, though? Convert headers once and add them in manually? Or allow aliasing function decls with such annotations, so you can have converted C headers in one file and time complexity annotations in another?
Memory Complexity
As mentioned, memory complexity bounds would be neat. But this needs additional rules for the compiler to figure out.
Mixed Time Complexity
Sometimes in high perf code you see functions that dynamically branch between two algorithms to solve the same problem. The reason for that is simple. Imagine you have one branch solving the problem in
O(log(n))
, and the other one solving it inO(exp(n))
. These two graphs have an intersection point. Before that pointexp(n)
does actually better, and after it obviouslylog(n)
outperformsexp(n)
. Such functions try to guess the intersection point and branch there.How would one annotate this function now?
As stated above, if a function claims to be
finite(log)
, then implementing it in a way that performs worse results in a compiler error. So if we were to mark this branching function asfinite
, it would have to claim beingfinite(exp)
. Unless we figure out some syntax that is able to express this branching. The big-O notation can express differences between worst, best, and average case complexities. We could add this, too, possibly like this:This also answers the question of how one would annotate memory complexities. Add extra fields
m
,min_m
, etc. A simplefinite(x)
would then be an alias forfinite(max_t: x)
.Call Cost Annotations
The
finite
rules are simple expressions and can thus be evaluated for some input. Data comes in, a number comes out. We could utilise this to create a new@
intrinsic that performs this function selection for us.For example, we could rewrite the function
f
above as follows:Now, this approach is not perfect. For example, two functions with the same on-paper complexity may perform very differently, simply because one does e.g. more data moves than the other. Whether and how we could tackle this, e.g. by allowing scaling constants that on paper get crossed out, is also an open question.
Of course all this assumes us going for such complexity annotations in the first place, as opposed to doing just the simplest first step of having a bare
finite fn
of an unspecified complexity.The text was updated successfully, but these errors were encountered: