forked from coreos/fleet
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a share-based scheduling heuristic #1
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
e87998b
to
3fb314e
Compare
Sounds good to me. |
… timeout As etcd's keysAPI already supports HeaderTimeoutPerRequest, fleet should not handle request timeout on its own, but just pass on the timeout value to the etcd client. For that, set timeout only in etcd.Config.HeaderTimeoutPerRequest, and get rid of internal timeout handling from both EtcdRegistry and LeaseManager. Also remove EtcdRegistry.ctx(), and just call context.Background() directly, as suggested by Jon. Suggested-by: Jonathan Boulle <jonathanboulle@gmail.com> Partially resolves coreos#1397
Support truthy values in boolean unit options(true, yes, on, 1, t). Originally written by wuqixuan <wuqixuan@huawei.com> Supersedes coreos#1271 Fixes: coreos#1108
Fleetctl {stop,unload} has been always printing out somewhat ambiguous messages like below: Unit app.service loaded on 0eb1b9b6.../coreos1 For better usability, print out a better result message, either about {stopp,unload}ing successfully, or failing to do. Partially resolves: coreos#1432
As the unit state cache is going to be obsolete, functional test does not need to run go test with unit state cache enabled. So let's remove the second run for now.
…alues fleetctl: support truthy values in boolean unit options
registry: use etcd.Config.HeaderTimeoutPerRequest instead of internal timeout
tests: test fleetd with extra configuration options
…al-message fleetctl: print out result message explicitly
- rework option parsing - move away from (most) global variables (cAPI), keep currentCommand for now... - we need to pass c (the cli context) and cAPI (the client API) around Signed-off-by: Olaf Buddenhagen <olaf@endocode.com>
Add github.com/spf13/cobra and github.com/spf13/pflag.
Let's roll back to the original global variable cAPI, to keep the entire code as simple as possible. As this is a single command-line tool, it would be harmless to simply use a global variable.
Use Cobra (github.com/spf13/cobra) instead of cli (github.com/codegangsta/cli), for better cmdline user interface. * Create a wrapper runWrapper() to be used for cobra, to wrap around a normal run*() function into a prototype for cobra.Command.Run(). It also sets a global variable cAPI for running a normal command. * remove unnecessary code for codegangsta/cli from fleetctl.go. Suggested-by: Jonathan Boulle <jonathanboulle@gmail.com> Fixes: coreos#1453 Supersedes coreos#1570
As every fleetctl command uses cobra instead of codegangsta/cli, we need to update unit tests as well.
* Do not compare output of help messages, to avoid occasional failures. As cobra sometimes prints out a different help message, it causes functional test TestClientHelpFlag to fail. * Run "fleetctl version" instead of "fleetctl --version". * Run "fleetctl help" instead of "fleetctl" to get help message.
Since CoreOS alpha (1000), coreos-cloudinit doesn't seem to watch on /media/configdrive/openstack/latest/user_data. That's why functional tests haven't been running, as systemd-nspawn container runs without any configurations. Fix it by changing into /var/lib/coreos-install/user_data. Fixes: coreos#1532
…fxtests functional: fix path to user_data to /var/lib/coreos-install
So far ReplaceMember has been checking for error, by looking up an explicit string "Success" in stderr from "systemctl -M <ID> poweroff". However, with systemd v229 or newer, systemctl poweroff does not print out "Success" even if poweroff succeeded. This causes TestKnownHostsVerification to always return error like: --- FAIL: TestKnownHostsVerification (2.66s) client_test.go:60: Failed replacing machine: poweroff failed: Fix it by changing the systemctl call to "machinectl poweroff <ID>", which should return error correctly. This would be better in the long run, than checking for messages in stderr. Fixes: coreos#1586
…ror-fxtests functional: poweroff machine using machinectl instead of systemctl
TestSingleNodeConnectivityLoss has been so far retrieving is-enabled status by calling "systemctl list-unit-files UNITS". It was expected to get "enabled" for each unit. This has worked until systemd v227. However, that's not the case any more since systemd v228, precisely since systemd/systemd@0ec0dea ("install: follow unit file symlinks in /usr, but not /etc when looking for [Install] data"). Since then, systemctl list-unit-files returns "linked-runtime" status for each fleet unit, which is actually reasonable behavior for units that are not installed persistently under /etc/systemd. With systemd v228 or newer, however, TestSingleNodeConnectivityLoss in fleet functional tests cannot make use of "systemctl list-unit-files" any more. And it would not be optimal to simply change enabled to linked-runtime, because then the test couldn't work with systemd < v228. So let's work around it, by calling "systemctl is-enabled UNIT" instead. That command has returned correct status "linked-runtime", no matter whether systemd >= 228 or not. Fixes: coreos#1588
run-in-qemu should use double hyphens for long command-line options for consistency with other tools.
Allow run-in-qemu to run with a given custom base image, e.g. coreos_developer_qemu_image.img which was built by CoreOS SDK. $ ./run-in-qemu -c alpha -i ~/.qemu/coreos_developer_qemu_image.img
…fxtests functional: allow run-in-qemu to run based on a custom base image
As systemd v230 or newer requires /etc/machine-id on a nspawn container, we need to write the container's own machine ID into /etc/machine-id. Otherwise, functional tests don't start at all like: --- FAIL: TestKnownHostsVerification (10.80s) client_test.go:36: unable to detect machine PID Fixes: coreos#1594 See also systemd/systemd#3014 See also rkt/rkt#2440
…-fxtests functional: write machine ID into nspawn container
As CoreOS requires PAM configurations under /etc/pam.d starting from developer version of CoreOS 104x, PAM configurations on the host need to copied into the nspawn container. Otherwise, every sudo command fails with messages like: --- FAIL: TestShutdown (1.99s) shutdown_test.go:43: exit status 1 Note that copying /etc/pam.d is necessary only for such setups with sys-auth/pambase installed. Though it should be also fine for systems where /etc/pam.d is empty, because then it should automatically fall back to /usr/lib64/pam.d, which belongs to sys-libs/pam. Fixes: coreos#1591 See also https://coreos.com/blog/security-brief-coreos-linux-alpha-remote-ssh-issue.html
3fb314e
to
01d7f7a
Compare
…e-header scripts,golang: update schema-generator, bump golang to 1.7.1
So far it hasn't been possible to do "go get github.com/coreos/fleet" because there was no main.go in the top source directory. Create main.go, a simple wrapper for fleetd to make it "go-getable".
Now that fleetd/fleetd.go doesn't have main(), we need to make the default build script run go build on the top source directory, instead of ./fleetd/. Then it will end up building fleetd, as main() in main.go imports fleetd.Main() by default.
…table fleetd,main: make github.com/coreos/fleet go-getable
Apparently rkt doesn't support a single dash in its option any more. For example, "rkt -volume" doesn't work any more. Use double dash instead: "rkt --volume" Also --insecure-skip-verify doesn't exist. Use --insecure-options=image instead.
Documentation: fix unavailable rkt options in examples
To avoid confusion with ./build, we'd better have build-aci under ./scripts directory.
Various changes on build-aci. * Use acbuild command for building the ACI image. * Remove unnecessary docker command. Actually we could build ACI image without relying on docker, just like etcd does. * Remove unnecessary manifest definition, as it's automatically generated by acbuild. * Improve the cleanup path using acbuildEnd(). * Allow TCP ports 2379,4001 for etcd How to test, for example: $ ./scripts/build-aci v0.12.0-amd64 $ sudo rkt run --insecure-options=image \ --stage1-name=coreos.com/rkt/stage1-fly:1.14.0 --inherit-env \ --volume etc-fleet,kind=host,source=/etc/fleet \ --volume machine-id,kind=host,source=/etc/machine-id \ --volume dbus-socket,kind=host,source=/run/dbus/system_bus_socket \ --volume fleet-units,kind=host,source=/run/fleet/units \ ./fleetd-v0.12.0-amd64.aci
engine: remove vestigial trigger code
To be as idiomatic as possible, do so: * make tryWaitForSystemdActiveState() return error instead of int * pass getBlockAttempts(cCmd) to tryWaitForSystemdActiveState() to handle such cases of --no-block or --block-attempts. * make assertSystemdActiveState(), waitForSystemdActiveState() return just error, not with 'found', as it's unnecessary. * remove unnecessary checking against conn == nil. * compare strings with '!=' instead of strings.Compare() != 0
Now that tryWaitForSystemdActiveState() returns error instead of int, tryWaitForUnitStates() should also return error instead of int, to be as idiomatic as possible.
It's not appropriate to check systemd active states with DBUS connection, as that will return correct states only on the same machine. So we should introduce a check using cAPI.UnitStates(), to check states on other machines.
As it's basically not efficient to call cAPI.UnitStates() for each unit, let's call it only once outside goroutines for processing units, i.e. calling in waitForSystemdActiveState(), not in getSingleUnitState(). To do that, we need to share the result apiStates across all units. Also make use of goroutines in waitForSystemdActiveState(), instead of sequential iteration over each unit. Note that we still need to keep calling UnitStates() when assertSystemdActiveState() failed, because in the next attempt the old apiState is not necessarily going to be valid. Ideally in that case, we should be able to fetch the state only for a single unit. However, we cannot do that for now, because cAPI.UnitState() is not available. In the future we would need to implement cAPI.UnitState() and all dependendent parts all over the tree in fleet, e.g. schema, etcdRegistry, rpcRegistry, etc. to replace UnitStates() in this place with the new method UnitState(). In practice, calling UnitStates() here is not as badly inefficient as it looks, because it will be anyway rarely called only when the assertion failed.
…temd-states fleetctl: check systemd active state via client API
scripts: build ACI image using acbuild
When marshal() returns a non-nil error 'err', which shadows the pre-defined 'err', we should explicitly return the second error returned from marshal(). Otherwise it will always return nil. To fix other potential bugs, let's just return err explicitly. This was discovered by running go tool vet: $ go tool vet --shadow ./registry registry/job.go:316: declaration of "err" shadows declaration at registry/job.go:315
In matchLocalFileAndUnit(), if os.Stat() returns nil and getUnitFromFile() returns non-nil error, then in the end matchLocalFileAndUnit() will return (false, nil). That's not the expected result. It should have returned the second error that came from getUnitFromFile(). To avoid further potential bugs, let's follow the idiomatic coding style: first check err != nil, and fail-fast. This issue was discoverd by running go tools vet.
…vars registry,fleetctl: fix bugs regarding shadowed error variables
Fix weblink to build-aci to https://github.com/coreos/fleet/blob/master/scripts/build-aci. This was a side-effect of moving the script to the scripts directory.
…-aci Documentation: fix a broken weblink to scripts/build-aci
Run TestRegistryMuxUnitManagement() only when user's euid is 0 and systemd is avilable at run-time. Such a distinction is necessary for avoiding test failures, especially when unit tests run on travis CI, where systemd is not available. With those checks, we can re-enable error handling after calling systemd.NewSystemdUnitManager().
On Linux syscall.Setuid() is not supported in go. So running unit tests with root privileges fails when calling Setuid. Let's skip the test on Linux, if euid is 0.
…r-systemd registry/rpc: re-enable error handling after NewSystemdUnitManage
…the shares of the machine
01d7f7a
to
524b896
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on coreos#945