-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated output differs depending on order of input files #952
Comments
This is the mitigation that I've been using, but I want to emphasize that it's still far from optimal. A 2-line change to one file can cause thousands of lines of changes to generated dart bindings in totally unrelated files, just because some internal structure in the compiler changed. Diffs like this: ///
// Generated code. Do not modify.
// source: aaa/bbb.proto
//
// @dart = 2.12
// ignore_for_file: annotate_overrides,camel_case_types,constant_identifier_names,directives_ordering,library_prefixes,non_constant_identifier_names,prefer_final_fields,return_of_invalid_type,unnecessary_const,unnecessary_import,unnecessary_this,unused_import,unused_shown_name
import 'dart:core' as $core;
import 'package:protobuf/protobuf.dart' as $pb;
-import 'xxx.pb.dart' as $62;
+import 'xxx.pb.dart' as $63;
import '../yyy/zzz.pb.dart' as $1; It would be ideal to see the |
This is trivial to fix (just one line to sort arguments), but I'm unable to reproduce this in a unit test. So far I have this: Unit test// Copyright (c) 2013, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
import 'package:protoc_plugin/indenting_writer.dart';
import 'package:protoc_plugin/protoc.dart';
import 'package:protoc_plugin/src/generated/descriptor.pb.dart';
import 'package:protoc_plugin/src/generated/plugin.pb.dart';
import 'package:protoc_plugin/src/linker.dart';
import 'package:protoc_plugin/src/options.dart';
import 'package:protoc_plugin/src/output_config.dart';
import 'package:test/test.dart';
import 'dart:io';
void main() {
test(
'CodeGenerator generates imports in the same order regardless of argument order (#952)',
() {
// message A {}
final messageA = DescriptorProto()..name = 'A';
final fileA =
FileDescriptorProto()
..name = 'a.proto'
..messageType.add(messageA)
..syntax = 'proto3';
// message B {}
final messageB = DescriptorProto()..name = 'B';
final fileB =
FileDescriptorProto()
..name = 'b.proto'
..messageType.add(messageB)
..syntax = 'proto3';
// message HasA {
// A a = 1;
// }
final messageHasA =
DescriptorProto()
..name = 'HasA'
..field.add(
FieldDescriptorProto()
..name = 'a'
..number = 1
..type = FieldDescriptorProto_Type.TYPE_MESSAGE
..typeName = '.A'
..jsonName = 'a',
);
final fileHasA =
FileDescriptorProto()
..name = 'has_a.proto'
..dependency.addAll(['a.proto'])
..messageType.add(messageHasA)
..syntax = 'proto3';
// message HasB {
// B b = 1;
// }
final messageHasB =
DescriptorProto()
..name = 'HasB'
..field.add(
FieldDescriptorProto()
..name = 'b'
..number = 1
..type = FieldDescriptorProto_Type.TYPE_MESSAGE
..typeName = '.B'
..jsonName = 'b',
);
final fileHasB =
FileDescriptorProto()
..name = 'has_b.proto'
..dependency.addAll(['b.proto'])
..messageType.add(messageHasB)
..syntax = 'proto3';
// message HasAB {
// A a = 1;
// B b = 2;
// }
final messageHasAB =
DescriptorProto()
..name = 'HasAB'
..field.addAll([
FieldDescriptorProto()
..name = 'a'
..number = 1
..type = FieldDescriptorProto_Type.TYPE_MESSAGE
..typeName = '.A'
..jsonName = 'a',
FieldDescriptorProto()
..name = 'b'
..number = 2
..type = FieldDescriptorProto_Type.TYPE_MESSAGE
..typeName = '.B'
..jsonName = 'b',
]);
final fileHasAB =
FileDescriptorProto()
..name = 'has_ab.proto'
..dependency.addAll(['a.proto', 'b.proto'])
..messageType.add(messageHasAB)
..syntax = 'proto3';
final request1 =
CodeGeneratorRequest()
..protoFile.addAll([fileA, fileHasA, fileB, fileHasB, fileHasAB])
..fileToGenerate.addAll([
'has_a.proto',
'has_b.proto',
'has_ab.proto',
]);
File('request1.txt').writeAsStringSync(request1.toString());
final request2 =
CodeGeneratorRequest()
..protoFile.addAll([fileB, fileHasB, fileA, fileHasA, fileHasAB])
..fileToGenerate.addAll([
'has_b.proto',
'has_a.proto',
'has_ab.proto',
]);
File('request2.txt').writeAsStringSync(request2.toString());
final generatedFiles1 = generateFiles(request1);
final hasAB1 = generatedFiles1['has_ab.pb.dart']!;
final imports1 = importLines(
hasAB1,
).where((s) => s.contains('a.pb.dart') || s.contains('b.pb.dart'));
print(imports1);
final generatedFiles2 = generateFiles(request2);
final hasAB2 = generatedFiles2['has_ab.pb.dart']!;
final imports2 = importLines(
hasAB2,
).where((s) => s.contains('a.pb.dart') || s.contains('b.pb.dart'));
print(imports2);
},
);
}
Map<String, CodeGeneratorResponse_File> generateFiles(
CodeGeneratorRequest request,
) {
final response = CodeGeneratorResponse();
final options = parseGenerationOptions(request, response)!;
final generators = <FileGenerator>[];
for (final file in request.protoFile) {
generators.add(FileGenerator(file, options));
}
link(options, generators);
final files = <String, CodeGeneratorResponse_File>{};
for (final gen in generators) {
final name = gen.descriptor.name;
if (request.fileToGenerate.contains(name)) {
for (final genFile in gen.generateFiles(
const DefaultOutputConfiguration(),
)) {
assert(!files.containsKey(genFile.name));
files[genFile.name] = genFile;
}
}
}
return files;
}
List<String> importLines(CodeGeneratorResponse_File file) =>
file.content.split('\n').where((s) => s.startsWith('import')).toList(); This should run the code generator with the same input as the original repro in this issue, but somehow the imports are always in the same order in generated code. I've run out of time for now, I'll have a look later. In the meantime if anyone knows how to test this the fix is just this: diff --git a/protoc_plugin/lib/src/code_generator.dart b/protoc_plugin/lib/src/code_generator.dart
index 9f324c6..340c439 100644
--- a/protoc_plugin/lib/src/code_generator.dart
+++ b/protoc_plugin/lib/src/code_generator.dart
@@ -105,6 +105,9 @@ class CodeGenerator {
request.mergeFromCodedBufferReader(reader, extensions);
reader.checkLastTagWas(0);
+ request.protoFile
+ .sort((desc1, desc2) => desc1.name.compareTo(desc2.name));
+
final response = CodeGeneratorResponse();
// Parse the options in the request. Return the errors if any. |
Thanks for looking into it. It looks like this is effectively sorting the list of command-line arguments, right? If so, perhaps please read my comment above. This mitigation would make the output deterministic, but still means that if I change one line in one file I can have thousands of lines of diffs in unrelated files because some globally shared indices have changed. It would be ideal for the indices to be local to each file.
Curious. I'll meet you halfway with a self-contained (Dockerized) version of my initial report that hopefully you can at least reproduce: You can clone the Gist and run
|
Here's a self-contained example to demonstrate what I mean about outputs changing even when their inputs didn't change, even when the file names are always passed in sorted order on the command line: https://gist.github.com/wchargin/ae49e838b5537eb391af306d8a42c03c (Run as above: on a machine with Docker, clone the Gist and run Salient contents: // has_bc.proto
syntax = "proto3";
import "element_b.proto";
import "element_c.proto";
message HasBC {
ElementB b = 1;
ElementC c = 2;
} // anomaly.proto, version 1
syntax = "proto3";
message Anomaly {
} // anomaly.proto, version 2
syntax = "proto3";
import "element_a.proto";
message Anomaly {
ElementA a = 1;
} Suppose that these files are compiled as
But when
This means that making a two-line change to --- out1/has_bc.pb.dart 2025-02-25 15:56:17.613794005 +0000
+++ out2/has_bc.pb.dart 2025-02-25 15:56:17.686794005 +0000
@@ -13,13 +13,13 @@
import 'package:protobuf/protobuf.dart' as $pb;
-import 'element_b.pb.dart' as $0;
-import 'element_c.pb.dart' as $1;
+import 'element_b.pb.dart' as $1;
+import 'element_c.pb.dart' as $2;
class HasBC extends $pb.GeneratedMessage {
... And that's why I think that just sorting input filenames doesn't really suffice to fix the underlying issue. |
We should confirm (I haven't implemented this so I don't know) but my understanding is each generated file gets a unique index. So I agree that sorting imports withing libraries and giving them local numbers would make sense as it would generate the same code in more cases.
We don't need docker or anything that's more complicated than a Dart unit test, we should be able to test this as a unit test. I'll investigate why my test above doesn't reproduce this. Maybe some internal map or set iteration order is different in my test vs. the command-line invocation of |
I also tried to dump I checked maps and sets in protoc_plugin, they're either The only other source of non-determinism I can think of is object identities in combination of splay tree maps and sets, but our So I'm out of ideas as to what may be causing this non-determinism when the same compilation is done in a unit test vs. via |
If you pass multiple file names on the command line to
protoc
while usingprotoc-gen-dart
, the contents of the generated files may depend on the order in which you listed the inputs.This is problematic because it makes it harder to run automated checks like "test that the generated outputs are up to date" with simple invocations like
find . -name '*.proto' -exec protoc ... {} +
. This can be mitigated by sorting the file list, but (a) it took me a while to figure out what was going on here and come up with that solution, and (b) it should be the duty of the plugin to produce deterministic output.I also generate bindings for C++, Go, Python, and TypeScript, both with and without gRPC support, and none of those other language bindings exhibits this issue.
Repro
Construct a few protos that depend on each other:
Compile them in two different orders:
Actual behavior
Note that the generated code differs between the two versions:
Expected behavior
The output should be bitwise identical.
Version info
The text was updated successfully, but these errors were encountered: