-
Notifications
You must be signed in to change notification settings - Fork 373
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cli/config/configuration.toml.in
Outdated
[factory] | ||
# VM templating support. Once enabled, new VMs are created from template | ||
# using vm cloning. They will share the same initial kernel memory by mapping | ||
# it readonly. When disabled, new VMs are created from scrash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you explain why users would want to use this feature (to save memory, speed up boot, etc)?
- s/scrash/scratch/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll add some text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we either need a new enable_debug=
option for this section, or a comment explaining the the runtime enable_debug
option affects factory logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf @jodh-intel I think it will be useful to add the security implications of this approach as well here so that users are aware of the side-effects of enabling this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodh-intel @amshinde The purpose of the PR is to add vm factory functionality. The CLI side change is just minimal to use vm factory. I can add more CLI changes after the PR is merged.
cli/config/configuration.toml.in
Outdated
#enable_template = false | ||
|
||
# VM caching support. When set to larger than 0, number of cache empty VMs | ||
# will maintained alive. New VMs are created by just using the cached VMs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Can you explain the impact of setting this? For example, if I set this to
5
, does that mean the first container I start after a fresh boot will result in6
instances of qemu being started (which will take longer than normal), but subsequently created containers will be fast (since they can make use of the running instances? -
Also, what happens if those instances are running and
qemu
gets upgraded "underneath" them? Presumably, we should recommend all instances are stopped before upgrading. -
How do you stop these instances (we need to document this)?
@@ -1126,6 +1126,10 @@ type Memory struct { | |||
// MaxMem is the maximum amount of memory that can be made available | |||
// to the guest through e.g. hot pluggable memory. | |||
MaxMem string | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no corresponding change to Gopkg.toml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The govmm PR is not merge yet. I'll update vendor properly once that one gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf could you please go to the govmm repo and address the comments. I'd like to see it merged before we move forward with this PR.
} else if config.Knobs.FileBackedMem == true { | ||
if config.Memory.Size != "" && config.Memory.Path != "" { | ||
dimmName := "dimm1" | ||
objMemParam := "memory-backend-ram,id=" + dimmName + ",size=" + config.Memory.Size + ",mem-path=" + config.Memory.Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been tested with qemu-lite
and qemu
v1.11 vanilla?
|
||
func New(count uint, b base.FactoryBase) base.FactoryBase { | ||
if count < 1 { | ||
return b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should count == 0
result in a log warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually OK to have count == 0 in which case we still return the base factory.
for { | ||
vm, err := b.GetBaseVM() | ||
if err != nil { | ||
//glog.Errorf("cache factory get error when allocate vm: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out code here and below.
virtcontainers/factory/factory.go
Outdated
|
||
func (f *FactoryConfig) validate() error { | ||
if f.Cache == 0 { | ||
return fmt.Errorf("invalid factory cache number %d", f.Cache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth display the entire FactoryConfig
to help debugging here?
virtcontainers/vm.go
Outdated
} | ||
|
||
func (v *VM) assignSandbox(s *Sandbox) error { | ||
// TODO: add vm symlinks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete?
Removing wip and the do-not-merge tag. The PR is working and ready for review now. @sboeuf @jodh-intel @egernst @WeiZhang555 @laijs PTAL. In the meaning time, I'll fix up CI and post perf numbers. |
PSS Measurement: Memory inside container: |
Script to run: I've changed kata config to create 128MB guests by default. w/o vm templating:
w/ vm templating:
So vm templating helped saving about 9GB for 100 guests, which is around 90MB~ for each 128MB guest. It's clear that vm templating can be very useful for density workload. @egernst ^^ |
Wow, this looks like an amazing optimization. And it's also quite a huge patch, need some time to read~ |
PSS Measurement: Memory inside container: |
Memory savings look good :-) I presume this measurement was taken with KSM turned off. I have a feeling that some of those savings could also be found with KSM, but I suspect:
So, those memory savings look good to me :-) |
@grahamwhaley Yes, these numbers were taken when KSM is off. And yes, one advantage of vm templating over KSM is that it does not consume extra CPU resources. However, it does not share the host side qemu memory either, same as KSM. All the sharing is done at the guest memory level, mostly including kernel, initramfs, kata-agent and other process memory like systemd if that is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the whole idea, and the structure to my eyes looks right.
I've added some general feedback, but we need some core VM and VC devs to review this...
@@ -1126,6 +1126,10 @@ type Memory struct { | |||
// MaxMem is the maximum amount of memory that can be made available | |||
// to the guest through e.g. hot pluggable memory. | |||
MaxMem string | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the patch title - hack/vendor
made me think
- that this was a patch in the https://github.com/kata-containers/runtime/tree/master/virtcontainers/hack directory
- and reminded me that we should have changed that subdir name from
hack
when we moved over VC - and that no PRs should probably container the word
hack
;-)
Can you drop the hack
from the subject pls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That commit will be reworked once the govmm PR gets merged. The hack
in the commit subject is to remind myself of the rework ;)
cli/config/configuration.toml.in
Outdated
# using vm cloning. They will share the same initial kernel memory by mapping | ||
# it readonly. When disabled, new VMs are created from scrash. | ||
# | ||
#enable_template = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is commented out, and I presume the default is therefore false/off, you may as well make the commented out line:
#enable_template = true
as that is the most likely line to be added or uncommented I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
|
||
var factorySubCmds = []cli.Command{ | ||
initFactoryCommand, | ||
destroyFactoryCommand, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be possible to have some sort of 'status' command as well - to show how many runtimes are shared, or even more important I think once we have the cache/pool, how many are cached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible and it needs corresponding virtcontainers side of changes to support it. At this point, I would defer further improvements to future PRs since this one is already pretty big now.
virtcontainers/qemu.go
Outdated
return q.hotplugAddMemory(slot, memMB) | ||
} | ||
|
||
func (q *qemu) hotplugAddMemory(slot uint32, amount int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought - is this tied to the vmtemplating PR, or could it (if need be, to simplify the PRs) be pulled out into a separate PR?
Also, /cc @devimc as this feels like it might overlap or interact with some of his work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VM templating needs cpu and memory hotplug support so I added them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is needed, but this is something that is not related to the whole refactoring, and I think you should submit this as a separate PR on top of which you should rebase this current PR.
This will help to split this huge PR and people will be more motivated in reviewing it ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not about refactoring. It is a dependency of vm factory functionality and vm factory is in fact the only consumer of the internal hotplug APIs ATM. I thought that it would lose the whole picture for reviewers if I split it.
And for reviewing, I thought it is smaller commit in a large PR that actually helps. But it seems you are right that people would like to just review it as a whole rather than looking into each commit. And the PR is huge to be reviewed as a whole (hence I get no comments on the core idea/API/refactoring part of the PR for a week :( ). I'll split some dependency commits to another PR to see if that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf Thanks :)
And don't worry, I'm gonna spend some time today in reviewing this PR, this is a huge piece of work and I need to be focus to be able to properly review it.
Also, don't hesitate to ping people that should review this PR. Usually if you don't ask, they will not spend the time on such big PRs 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I plan to re-review this tomorrow AM too fwiw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been planning to review this for a whole week, in the end I just started this work a few minutes ago 😆
Shame on myself :-)
virtcontainers/factory.go
Outdated
@@ -6,6 +6,6 @@ | |||
package virtcontainers | |||
|
|||
type Factory interface { | |||
GetVM(hypervisorType HypervisorType, hypervisorConfig HypervisorConfig) (*VM, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this whole commit be folded into the one that added factory.go in the first place? Seems this fixes something effectively 'missing' from that prior commit?
@@ -359,7 +359,7 @@ func newQemuHypervisorConfig(h hypervisor) (vc.HypervisorConfig, error) { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I would have liked to have seen the static check (lint, cyclo) fixes in (merged into) the original commits. Not a biggie, just noting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
PSS Measurement: Memory inside container: |
Just a brief looking into the code.(quiet a lot of code.. great job!)
Will feedback later~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments later...
virtcontainers/qemu.go
Outdated
@@ -62,7 +60,9 @@ type qemu struct { | |||
|
|||
const qmpCapErrMsg = "Failed to negoatiate QMP capabilities" | |||
|
|||
const defaultConsole = "console.sock" | |||
const qmpSocket = "qmp.sock" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with using only one socket, but you increased the numbers of char in the socket name now. controlSocket
and monitorSocket
were shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main point is that there is no need for two qmp sockets. If soeket file name length is a concern, I can just use qmp
instead. But we've already shortened socket parent path and qmp.sock
survives BuildSocketPath
check, so I'd prefer to making the naming clear such that a socket actually ends with its .sock
suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fair enough. We have have shortened the parent path, and I agree I prefer a socket name ending with .sock
.
@@ -1126,6 +1126,10 @@ type Memory struct { | |||
// MaxMem is the maximum amount of memory that can be made available | |||
// to the guest through e.g. hot pluggable memory. | |||
MaxMem string | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf could you please go to the govmm repo and address the comments. I'd like to see it merged before we move forward with this PR.
} | ||
|
||
// SetLogger implements the VC function of the same name. | ||
func (impl *VCImpl) SetLogger(logger logrus.FieldLogger) { | ||
SetLogger(logger) | ||
} | ||
|
||
// SetFactory implements the VC function of the same name. | ||
func (impl *VCImpl) SetFactory(factory Factory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need the mock implementation to implement this new API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mock implementation has SetFactory
.
virtcontainers/qemu.go
Outdated
return q.hotplugAddMemory(slot, memMB) | ||
} | ||
|
||
func (q *qemu) hotplugAddMemory(slot uint32, amount int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is needed, but this is something that is not related to the whole refactoring, and I think you should submit this as a separate PR on top of which you should rebase this current PR.
This will help to split this huge PR and people will be more motivated in reviewing it ;)
kataLog.WithField("factory", factoryConfig).Info("load vm factory") | ||
f, err := vf.NewFactory(factoryConfig, true) | ||
if err != nil { | ||
kataLog.WithError(err).Info("load vm factory failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth adding a comment here explaining that if vm factory support fails, the system will "degrade" automatically to using the non-factory codepath.
In fact, I think we might want to add another variable to the config file to avoid this graceful degradation as some users only want to use VM factories:
[factory]
# If set, fail if VM factory support is not available.
## strict=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed. But again, I'd like to push all these UX changes to later PRs. This PR is mostly to add core vm factory functionalities.
cli/config/configuration.toml.in
Outdated
[factory] | ||
# VM templating support. Once enabled, new VMs are created from template | ||
# using vm cloning. They will share the same initial kernel memory by mapping | ||
# it readonly. When disabled, new VMs are created from scrash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we either need a new enable_debug=
option for this section, or a comment explaining the the runtime enable_debug
option affects factory logging?
f, err := vf.NewFactory(factoryConfig, true) | ||
if err != nil { | ||
kataLog.WithError(err).Error("load vm factory failed") | ||
// ignore error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It happens when we destroy a factory that does not exist in the first place.
return err | ||
} | ||
flags := uintptr(syscall.MS_NOSUID | syscall.MS_NODEV) | ||
opts := fmt.Sprintf("size=%dM", t.config.HypervisorConfig.DefaultMemSz+8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why +8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An experience value that a vm device state file is smaller than 8MB, usually it is about 3MB~ and we need to align it by 8.
fmt.Println("vm factory initialized") | ||
} else { | ||
kataLog.Error("vm factory is not enabled") | ||
fmt.Println("vm factory is not enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kataLog does not print to stdout. But I think we should tell user if the command succeeds or not. Any suggestion how to do it properly with kataLog?
f.CloseFactory() | ||
} | ||
} | ||
fmt.Println("vm factory destroyed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't use kataLog
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kataLog does not print to stdout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf The PR looks pretty good, but I don't think we need to introduce a new abstraction layer with this new VM interface. The factory should call directly into the hypervisor, there is no reason to add an extra layer here. And about the sandbox decoupling, it's not needed since we want the complexity to be hidden behind sandbox API.
We can do the proper parallelization from Frakti by using CreateSandbox()
with the appropriate factory, and hotplug later by creating new containers. Each container will be responsible for hotplugging the resources that we need.
That being said, the same way we're going to introduce sandbox.AddStorage()
, we could introduce sandbox.AddResources()
if really needed.
virtcontainers/qemu.go
Outdated
@@ -62,7 +60,9 @@ type qemu struct { | |||
|
|||
const qmpCapErrMsg = "Failed to negoatiate QMP capabilities" | |||
|
|||
const defaultConsole = "console.sock" | |||
const qmpSocket = "qmp.sock" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fair enough. We have have shortened the parent path, and I agree I prefer a socket name ending with .sock
.
@@ -253,15 +250,6 @@ func (q *qemuArchBase) memoryTopology(memoryMb, hostMemoryMb uint64) govmmQemu.M | |||
return memory | |||
} | |||
|
|||
func (q *qemuArchBase) append9PVolumes(devices []govmmQemu.Device, volumes []Volume) []govmmQemu.Device { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of this function is not part of the decoupling between sandbox and hypervisor, is it ?
Shoud be part of it's own commit I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed.
virtcontainers/vm.go
Outdated
|
||
package virtcontainers | ||
|
||
type VM struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get the need for such additional abstraction layer. Could you explain the rationales behind this ? My understanding so far is that it actually duplicates the hypervisor
interface, and includes the agent
handler, meaning it looks a lot like what sandbox is doing.
virtcontainers/qemu.go
Outdated
@@ -340,8 +346,7 @@ func (q *qemu) createSandbox(sandboxConfig SandboxConfig) error { | |||
// bridge gets the first available PCI address i.e bridgePCIStartAddr | |||
devices = q.arch.appendBridges(devices, q.state.Bridges) | |||
|
|||
devices = q.arch.append9PVolumes(devices, sandboxConfig.Volumes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf Can you add in the commit message that you are removing this method and the reason for doing this.
virtcontainers/factory.go
Outdated
|
||
package virtcontainers | ||
|
||
type Factory interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf Can you add documentation comments for Factory
here? Will be useful include the purpose for Factory
and FactoryBase
in the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sboeuf That would push the vm config comparison and CPU/memory hotplug down to each of the factory implementations. factory/factory.go
is not just a broker but also handles vm config validation/comparison and possible CPU/memory hotplugs. It's better to do it at the factory level instead of doing the same work in each of the factory implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf I had missed this. This makes sense and it'd be great to explicitly explain this from the code by commenting about it. Otherwise, it's not obvious what difference we have between Factory
and FactoryBase
interfaces.
virtcontainers/factory/base/base.go
Outdated
|
||
import vc "github.com/kata-containers/runtime/virtcontainers" | ||
|
||
type FactoryBase interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, can you add documentation comments for FactoryBase
and its purpose?
// | ||
// cache implements base vm factory on top of other base vm factory. | ||
|
||
package cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf Can you add description about the cache
and template
factory? I get no idea about what these are and how they differ from each other. I know you are planning to add implementation later, but comments about general flow giving an idea about what is stored and what is constructed runtime for each factory type will be useful to get a general understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the PR description.
cli/config/configuration.toml.in
Outdated
[factory] | ||
# VM templating support. Once enabled, new VMs are created from template | ||
# using vm cloning. They will share the same initial kernel memory by mapping | ||
# it readonly. When disabled, new VMs are created from scrash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf @jodh-intel I think it will be useful to add the security implications of this approach as well here so that users are aware of the side-effects of enabling this option.
@@ -100,6 +101,25 @@ func create(containerID, bundlePath, console, pidFilePath string, detach bool, | |||
return err | |||
} | |||
|
|||
if runtimeConfig.FactoryConfig.Template { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf Can we handle this in virtcontainers CreateSandbox
itself rather than doing this in the cli?
All these implementation details about the factory could be handled by virtcontainers itself keeping the api simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kata CLI is the one reading the configuration files. And for frakti, unlike kata cli, we would only initialize the vm factory for only once, instead of doing it for every CreateSandbox
. That is the reason factory config is not pushed into CreateSandbox
.
}, | ||
} | ||
|
||
var initFactoryCommand = cli.Command{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these commands intended to be called by frakti
? I am trying to understand the use case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not at all. frakti does not call any of the kata CLI commands. The CLI commands are minimal for kata cli to use vm factory and it helps testing.
@@ -103,3 +116,13 @@ func (t *template) createFromTemplateVM() (*vc.VM, error) { | |||
|
|||
return vc.NewVM(config) | |||
} | |||
|
|||
func (t *template) checkTemplateVM() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some comments what you are doing here and what those files mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
virtcontainers/factory/factory.go
Outdated
} | ||
|
||
type factory struct { | ||
base base.FactoryBase | ||
} | ||
|
||
func NewFactory(config FactoryConfig) (vc.Factory, error) { | ||
func NewFactory(config FactoryConfig, fetchOnly bool) (vc.Factory, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments for the fetchOnly
flag here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@sboeuf The VM layer is needed because vm factory needs to work on a vm guest abstraction without sandbox association. The VM layer is on top of the hypervisor layer because hypervisor is just a set of driver interfaces that has no in-memory abstraction. We can add abstraction data structure in hypervisor.go but it would just look like the VM layer and it is not as clean as a separate VM layer. With the VM layer, we have a proper layering that:
The sandbox API cannot cover vm factory. For one thing, vm factory needs to be initialized before any sandbox is created. For another thing, unlike kata CLI, frakti would just initialize the factory for only once and use it as long as the daemon is running. So it needs to be a separate API that is not coupled with the sandbox APIs.
I agree that it makes sense to let each container hotplug what it needs. But doing hotplug with vm factory still allows the factory to prepare VMs with different sizes. I would like to keep such ability. WDYT? |
Config() vc.VMConfig | ||
|
||
// GetBaseVM returns a paused VM created by the base factory. | ||
GetBaseVM() (*vc.VM, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main advantage of a cache is of course speed. However, we need to be able to identify when the cache should be invalidated, for example when a new rootfs image containing a security fix is released.
As such, I wonder if we need to add a timestamp into vc.VM
that shows when the cache item was created (not when it was used). That will give us a basic way to look for "stale images" I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I think the system upgrade best practice will need to include destroying the vm factory and re-initialize it after upgrade. This would apply to all vm factory users, including kata cli and frakti. For example in frakti, we would destroy the vm factory whenever the daemon quits and always initialize it upon startup.
You are right that it is possible to handle it automatically inside vm factory but it can be very tricky and error prone since an upgrade can happen at any time. Given that frakti is the only user of cache vm factory, I would prefer to push the burden up to frakti where it is already properly handled.
virtcontainers/qemu.go
Outdated
} | ||
} | ||
|
||
err = q.qmpMonitorCh.qmp.ExecSetMigrateArguments(q.qmpMonitorCh.ctx, fmt.Sprintf("exec:cat>%s", q.config.DevicesStatePath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any issues around creating the image with "version X" of qemu and then trying to use it with "version Y" of qemu? I'm thinking about build-support as well (qemu-lite
compared with qemu-vanilla
compared with distro-packaged qemu
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In such case, the qmp command might fail. And I would classify it as a configuration mistake... It's just like the version mismatch when you upgrade kata-runtime but do not correspondingly upgrade kata-agent and end up with certain grpc failures.
I clearly understand the need that you have here, and this means you need access to the hypervisor interface so that you can run a VM and stop it to save its state, but I thought it would be better to make sure that we can do that through the hypervisor interface itself. And because you have removed dependency on the sandbox through the hypervisor interface, you should be able to do that.
Ok, but if you need different size of VMs, why don't you simply create them with the expected amount directly. IMO the hotplug should be reserved for the containers since they are the variable in the equation (don't know what the CRI layer will ask for in advance). |
Except for accessing the hypervisor interface, we also need an in-memory data structure to represent the guest without a sandbox, so that we can create it (
For performance consideration, it is much slower to create a new VM and wait for agent to be up, than simply hotplug the requested CPU/Memory. |
That's because you assume you have to wait for the agent to be up inside the VM when you're creating it from the factory. I don't think we need this, and as explained above, this is more something that should be checked in the context of the sandbox. |
@amshinde @devimc @grahamwhaley @egernst it'd be nice if you could take a look ;) |
It is not used and we actully cannot append multiple 9pfs volumes to a guest. Signed-off-by: Peng Tao <bergwolf@gmail.com>
A hypervisor implementation does not need to depend on a sandbox structure. Decouple them in preparation for vm factory. Signed-off-by: Peng Tao <bergwolf@gmail.com>
The boolean return value is not necessary. Signed-off-by: Peng Tao <bergwolf@gmail.com>
There are a few changes we need on kata agent to introduce vm factory support: 1. decouple agent creation from sandbox config 2. setup agent without creating a sandbox 3. expose vm storage path and share mount point Signed-off-by: Peng Tao <bergwolf@gmail.com>
1. support qemu migration save operation 2. setup vm templating parameters per hypervisor config 3. create vm storage path when it does not exist. This can happen when an empty guest is created without a sandbox. Signed-off-by: Peng Tao <bergwolf@gmail.com>
As representation of a guest without actual sandbox attached to it. This prepares for vm factory support. Signed-off-by: Peng Tao <bergwolf@gmail.com>
Add vm factory support per design in the VM Factory plugin section. The vm factory controls how a new vm is created: 1. direct: vm is created directly 2. template: vm is created via vm template. A template vm is pre-created and saved. Later vm is just a clone of the template vm so that they readonly share a portion of initial memory (including kernel, initramfs and the kata agent). CPU and memory are hot plugged when necessary. 3. cache: vm is created via vm caches. A set of cached vm are pre-created and maintained alive. New vms are created by just picking a cached vm. CPU and memory are hot plugged when necessary. Fixes: kata-containers#303 Signed-off-by: Peng Tao <bergwolf@gmail.com>
Add SetFactory to allow virtcontainers consumers to set a vm factory. And use it to create new VMs whenever the factory is set. Signed-off-by: Peng Tao <bergwolf@gmail.com>
Add enable_template option to the config file. When it is set, enable the vm template factory. cache factory cannot be used by kata cli directly because it requires a running daemon to maintain the cache VMs. `kata-runtime factory init` would initialize the vm factory and `kata-runtime factory destroy` would destroy the vm factory. When configured, a vm factory is loaded before creating new sandboxes. Signed-off-by: Peng Tao <bergwolf@gmail.com>
For one thing, it is not used by any kata components. For another thing, it breaks vm factory hypervisor config check. Signed-off-by: Peng Tao <bergwolf@gmail.com>
Add UTs to all factory components. Signed-off-by: Peng Tao <bergwolf@gmail.com>
PSS Measurement: Memory inside container: |
Build failed (third-party-check pipeline) integration testing with
|
This is now blocked by kata-containers/tests/issues/513 and kata-containers/tests/issues/506. restarting CI did not help.... I know it all passed yesterday. But codecov was not happy so I had to update again. |
recheck |
Build succeeded (third-party-check pipeline).
|
@kata-containers/runtime PTAL, this is important to review this PR, so that we can move forward with it. It's been around for a long time, and @bergwolf reworked it many times. PTAL ! |
LGTM. |
It seems that codecov/patch is not reliable. see kata-containers/tests#508 |
Hi @bergwolf - I know this is an old and merged item (that base VM templating/factory is merged already). I was wondering, are you planning at some point on running some metrics tests to publish maybe on vm factory gains for density and boot? I think having these numbers available for Kata might be a great thing. Maybe we should open an Issue to track this, and then we might be able to have some collab to come up with some sort of blog etc.? |
agent: do not quit on grpc serve errors
The PR adds vm factory support to kata runtime per design in the VM Factory plugin section. The vm factory controls how a new vm is created:
To support vm factory, virtcontainers is refactored so that:
A new kata CLI subcommand
kata-runtime factory
is added. It is used to manage vm factory init and destroy. Only direct and template factory are supported by kata CLI, because the cache factory requires a running daemon to maintain the cache vm channels. OTOH, all factories are available to use at the kata API level.VMs created by vm factory still lack network support because right now network devices are configured before vm is created. We need network hotplug support (#287) to enable network configuration after vm is created. That said, this patch works otherwise and is large enough. So I'd like to get it merged and continue working on the network support when #287 is done.