Tuesday, August 26, 2014

ZFS support in libvirt

An upcoming release of libvirt, 1.2.8 that should be released early September, will include an initial support of managing ZFS volumes.

That means that it's possible to boot VMs and use ZFS volumes as disks. Additionally, it allows to control volumes using the libvirt API. Currently, supported operations are:

  • list volumes in a pool
  • create and delete volumes
  • upload and download volumes

It's not possible to create and delete pools yet, hope to implement that in the next release.

Defining a pool

Assume we have some pools and want to use one of them in libvirt:

# zpool list
NAME       SIZE  ALLOC   FREE   FRAG  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
filepool  1,98G  56,5K  1,98G     0%         -     0%  1.00x  ONLINE  -
test       186G  7,81G   178G     0%         -     4%  1.00x  ONLINE  -

Let's take filepool and define it with libvirt. This could be done using this virsh command:

virsh # pool-define-as --name zfsfilepool --source-name filepool --type zfs
Pool zfsfilepool defined

virsh #  pool-start zfsfilepool
Pool zfsfilepool started

virsh # pool-info zfsfilepool
Name:           zfsfilepool
UUID:           5d1a33a9-d8b5-43d8-bebe-c585e9450176
State:          running
Persistent:     yes
Autostart:      no
Capacity:       1,98 GiB
Allocation:     56,50 KiB
Available:      1,98 GiB

virsh # 

As you can see, we specify a type of the pool, its source name, such as seen in zpool list output and a name for it in libvirt. We also need to start it using the pool-start command.

Managing volumes

Let's create a couple of volumes in our new pool.

virsh # vol-create-as --pool zfsfilepool --name vol1 --capacity 1G
Vol vol1 created

virsh # vol-create-as --pool zfsfilepool --name vol2 --capacity 700M
Vol vol2 created

virsh # vol-list zfsfilepool
 Name                 Path                                    
------------------------------------------------------------------------------
 vol1                 /dev/zvol/filepool/vol1                 
 vol2                 /dev/zvol/filepool/vol2                 

virsh #

Dropping a volume is also easy:

virsh # vol-delete --pool zfsfilepool vol2
Vol vol2 deleted

Uploading and downloading data

Let's upload an image to our new volume:

virsh # vol-upload --pool zfsfilepool --vol vol1 --file /home/novel/FreeBSD-10.0-RELEASE-amd64-memstick.img 

... and download

virsh # vol-download --pool zfsfilepool --vol vol1 --file /home/novel/zfsfilepool_vol1.img

Note: if you would check e.g. md5 sum of the downloaded files, the result would be different as downloaded file will be of the same size as a volume. However, if you trim zeros, it'll be the same.

$ md5 FreeBSD-10.0-RELEASE-amd64-memstick.img zfsfilepool_vol1.img 
MD5 (FreeBSD-10.0-RELEASE-amd64-memstick.img) = e8e7cbd41b80457957bd7981452ecf5c
MD5 (zfsfilepool_vol1.img) = a77c3b434b01a57ec091826f81ebbb97
$ truncate -r FreeBSD-10.0-RELEASE-amd64-memstick.img zfsfilepool_vol1.img
$  md5 FreeBSD-10.0-RELEASE-amd64-memstick.img zfsfilepool_vol1.img             
MD5 (FreeBSD-10.0-RELEASE-amd64-memstick.img) = e8e7cbd41b80457957bd7981452ecf5c
MD5 (zfsfilepool_vol1.img) = e8e7cbd41b80457957bd7981452ecf5c
$

Booting a VM from volume

Finally got to the most important part. In use a volume as disk device for VM 'devices' section of the domain XML should be updated with something like this:

    <disk type='volume' device='disk'>
      <source pool='zfsfilepool' volume='vol1'/>
      <target dev='vdb' bus='virtio'/>
    </disk>

Few notes

Note #1: this code is just a few weeks old, so quite likely there are some rough edges. Feel free to report problems to novel%freebsd.org if you spot any problems.

Note #2: this code is FreeBSD-only for now. However, it should not be hard to make it work on Linux with zfsonlinux.org. Its developers were kind enough to add some useful missing flags in some of the CLI tools. However, these changes are not available in any released version so far. There are some more minor differences between zfs on Linux and FreeBSD, but that should not be hard to address. I was planning to get to it as soon as a new version of zfs on linux with the necessary flags is available. However, if you are interested in that and ready to help with testing -- feel free to poke me so it could be done sooner.


Sunday, March 30, 2014

Using Jenkins libvirt-slave-plugin with bhyve

I've played with libvirt-slave-plugin today to make it work with libvirt/bhyve and decided to document my steps in case it would be useful for somebody.

libvirt-slave-plugin

Assuming that you already have Jenkins up and running, installation of libvirt-slave-plugin is as follows. As we need a slightly modified version, we need to build it ourselves. I've made a fork which contains a required modification which could be cloned like that:

git clone -b bhyve git@github.com:jenkinsci/libvirt-slave-plugin.git

The only change I made is adding a single line with 'BHYVE' hypervisor type, you could find the pull request here. When that would be merged, this step will be not required.

So, getting back to the build. You'll need maven that could be installed from ports:

cd /usr/ports/devel/maven2 && make install clean

When it's installed, go back to the plugin we cloned and do:

mvn package -DskipTests=true

When done, login to the Jenkins web interface, go to Manage Jenkins -> Manage Plugins -> Advanced -> Upload Pluging. It'll ask to provide a path to the plugin. It would be target/libvirt-slave.hpi in our plugin directory.

After plugin is installed, please follow to Manage Jenkins -> Configure System -> Add new cloud. Then you'll need to specify hypervisor type BHYVE and configure credentials so Jenkins could reach your libvirtd using SSH. There's a handy 'Test Connection' you could use your configuration.

Once done with that, we can go to Manage Jenkins -> Manage Nodes -> New Node and choose 'libvirt' node type. Then you'll need to choose a libvirt domain to use for the node. From now on, node configuration is pretty straightforward, expect, probably an IP address of the slave. To find out an IP address, you'd need to find out its MAC address (just run virsh dumpxml and you'll find it there) and then find the corresponding file in dnsmasq/default.leases file.

Guest Preparation

The only thing guest OS needs is to have jdk installed. I preferred to download a package with java/openjdk7, but I had to configure network first. My VMs use bridged networking on virbr0, so NAT config looks like that in /etc/pf.conf:

ext_if="re0"
int_if="virbr0"

virt_net="192.168.122.0/24"

scrub all

nat on $ext_if from $virt_net to any -> ($ext_if)

Now openjdk could be installed from the guest using:

pkg install java/openjdk7

Finally, find the nodes in node management menu and press 'Launch slave agent' button. It should be ready for the builds now.

PS It might be useful to sync clock on both guest and host systems using ntpdate.

PPS libvirt version should be at least 1.2.2.


Saturday, March 15, 2014

Bhyve in libvirt

I continue my activities on improving libvirt FreeBSD support and I have some good news. Recent libvirt release, 1.2.2, is the first version to include the bhyve support!

Currently it's in its early stage and doesn't support some of the features and doesn't provide good flexibility, it's just a basic stuff at this point. I'll not provide a detailed description and instead will point you to the document: Libvirt: Bhyve driver. You'll find a sample domain XML which covers all the features currently supported by the driver.

TODO list

While there are lots and lots of things to be done, there are some specific ones I'm focusing on:

  • Console support through nmdm(4). This is very important feature for debugging and checking what's going on in the guest.
  • Domains autostart support. There's a patch already kindly provided by David Shane Holden that just needs review and testing.
  • A little more flexible slot ids allocation / device configuration.

Qemu/FreeBSD status

As a side note, I'll give an update what's changed since my previous blog post about qemu libvirt driver on FreeBSD. So, here's what's new:

  • Proper TAP interfaces cleanup
  • CPU affinity configuration support, check http://libvirt.org/formatdomain.html#elementsCPUAllocation for details
  • virsh console should now work if you run it from freebsd host and connect to libvirtd on Linux
  • Node status support (such as virsh nodecpustats, virsh nodememstats)

Some of these are available in already released versions, some are only in git version.


Tuesday, January 21, 2014

lc-tools 0.7.1.1: bugfix release

I have released a bugfix lc-tools version: 0.7.1.1.

It contains just two improvements:

  • Support RAM printing for GoGrid nodes using %(ram)s format string, submitted by Kirill Popov
  • Format string decoding using 'string-escape', submitted by Stanislav

As usual, new version is available on pypi:

easy_install lctools

I decided to stop uploading distfiles to github because most people use pypi anyway, so I doubt anybody needs it on github, but please let me know if I'm wrong.


Monday, November 11, 2013

Running Qemu VMs on FreeBSD

I'm slowly working on getting libvirt more functional on FreeBSD and, thanks to help of a lot of other people who did very valuable reviews of my patches and contributed other portability fixes, at this point libvirt on FreeBSD is capable of running Qemu VMs with some limitations. I'll provide a quick howto on doing that and what limitations exist at this point.

Building libvirt

As I'm playing with the codebase, it's more convenient for me to use direct git checkout instead of port. I'll provide an instuction how to build it (port should work as well, but I haven't tested).

Configure command looks this way:

CFLAGS="-I/usr/local/include" LDFLAGS="-L/usr/local/lib" ./configure \
                                           --without-polkit \
                                           --disable-werror

CFLAGS and LDFLAGS are needed to help it find yajl includes and libs. We're disabling polkit because we don't want to waste extra time configuring it and we disable treating warnings as errors because some third party libs' headers could mess things up.

When it configures successfully, it's traditional gmake && gmake install.

Preparing a VM

Nothing special about VM preparation, just create a Qemu VM like you usually do, configure it for you needs and you're almost ready to go.

There are some things recommended to do though:

  • Make virtio stuff available (see vtnet(4), virtio_balloon(4), virtio_blk(4) for details) on your guest
  • Configure network interface to using DHCP by adding this line to /etc/rc.conf: ifconfig_vtnet0="DHCP"

Defining and running a domain

Before running up libvirtd, make sure you have bridge(4) and tap(4) modules loaded or built into kernel.

Then you need to execute libvirtd and then connect to it using virsh:

virsh -c "qemu:///system"
Now, we need to create an XML file with domain definition. Here's the one I use: https://gist.github.com/novel/7399465

You might notice we're using virtio for network device and for the disk drive. You'll have to modify a path to the image location and adjust mem/cpu amount for your needs.

Once you're done with that, execute 'define /path/to/domain.xml' in virsh, and domain with name 'qemu' (unless you changed it) will be created. You can start it using 'start qemu' command in virsh.

To see what's going on, we could use vnc. To figure out what vnc port our VM is available at, we need to execute 'vncdisplay qemu' and it should print something like:

127.0.0.1:0

Now we can connect using vnc client, for example, if you use net/vnc, the proper command will be:

vncviewer 127.0.0.1:0

If everything goes well, you'll get an access to your VM's console.

Run ifconfig to check if there's an IP address on vtnet0 interface:

vtnet0: flags=8843 metric 0 mtu 1500
 options=80028
 ether 52:54:00:ae:4c:37
 inet 192.168.122.117 netmask 0xffffff00 broadcast 192.168.122.255
 media: Ethernet 1000baseT 
 status: active

And on the host you'll have:

  • bridge(4) device virbr0
  • tap(4) device vnet0 which corresponds to our VM and is member of virbr0 bridge

It's possible to connect from guest to host and vice versa. Unfortunately, it's not possible to provide access to the internet for VM without manual configuration. Normally, libvirt should do that, but its bridge driver needs some work to do that on FreeBSD, and that's on my todo list.

Apart from bridge stuff, there are a lot of other limitations on FreeBSD currently, but firewalling for bridge driver and nwfilter are probably the most important now. It's not quite easy to come up with a good way to implement it. For example, one has to choose what firewall package to use, pf and ipfw being the main candidates for that.

I'll cover my thoughts and concerns in a separate post.

Feel free to poke me if you have any questions/thoughts.

Update 25/01/2014: some people have troubles because of the old qemu version. On FreeBSD, please use qemu-devel port, because qemu port is too old.

Tuesday, January 3, 2012

lc-tools 0.7.1.0 released!

I'm happy to announce new lc-tools release: 0.7.1.0!

It doesn't have new features, but was updated to work with libcloud 0.7.x (earlier libcloud versions are no longer supported!). All the kudos go to Pavel Vozdvizhenskiy for noticing problems and testing.

You can install the new version using either easy_install:

easy_install lctools

or by downloading and installing it by hands:

lctools-0.7.1.0.tar.gz

SHA256 (lctools-0.7.1.0.tar.gz) = 7f65c29cda06d256c02f6544b4957ca984f34f1cfed5b24558c3b4338f87dd9c

Wednesday, December 21, 2011

Bug Prediction

Recently I've read a very interesting post about Bug Prediction on Google Engineering Tools blog and decided to inspect FreeBSD source tree in the same manner.

The question is how to determine if the given commit is a bug fix. I decided to start with a security fixes first and filtered commit messages by "Security:" tag in them. The result is the following:



FilenameScore
crypto/openssl/ssl/s3_srvr.c1.12807360699
crypto/openssl/ssl/s3_clnt.c1.12546730725
crypto/openssl/openssl.spec0.914805209201
crypto/openssl/NEWS0.914805209201
crypto/openssl/CHANGES0.914805209201
crypto/openssl/README0.914805209201
crypto/openssl/crypto/opensslv.h0.914805209201
crypto/openssl/Makefile0.914805209201
crypto/openssl/FAQ0.914805209201
crypto/openssl/apps/speed.c0.707204148366
crypto/openssl/ssl/t1_lib.c0.696806056187
crypto/openssl/ssl/s3_lib.c0.663083892748
crypto/openssl/crypto/md32_common.h0.578509821664
crypto/openssl/crypto/ocsp/ocsp_prn.c0.578509821664
crypto/openssl/engines/e_chil.c0.578509821664
crypto/openssl/ssl/d1_pkt.c0.51304099663
crypto/openssl/crypto/pqueue/pqueue.c0.51304099663
crypto/openssl/ssl/d1_both.c0.51304099663
contrib/ntp/ntpd/ntp_crypto.c0.505424173077
crypto/openssl/crypto/asn1/tasn_dec.c0.482876843456
sys/compat/linux/linux_socket.c0.477720395098
usr.bin/compress/zopen.c0.462407652214
usr.bin/gzip/zuncompress.c0.462407652214
sys/kern/uipc_usrreq.c0.462407652214
crypto/openssl/ssl/ssltest.c0.460574251046
crypto/openssl/ssl/s2_srvr.c0.460574251046
crypto/openssl/apps/x509.c0.460574251046
crypto/openssl/ssl/s3_pkt.c0.456096100304
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c0.44243544619
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c0.44243544619


There results appeared not very interesting though. This kind of matches security advisories, but given that there are not very much of them (luckily!), it's not very representative.

I also tried to run the script with "[\t ]+[Bb]ug" regular expression instead of "Security:", and got the following result:

FilenameScore
sys/netinet/sctp_output.c7.66251938944
sys/netinet/sctp_pcb.c7.2065885975
sys/netinet/sctputil.c6.80461378901
sys/dev/bge/if_bge.c6.50257671046
sys/netinet/sctp_input.c5.62524237411
sys/netinet/sctp_indata.c5.25336437345
sys/netinet/sctp_usrreq.c5.11155404498
sys/netinet/sctp_timer.c3.67543247047
sys/netinet/sctp_structs.h2.99517512907
sys/conf/files2.89041302648
sys/netinet/sctp_asconf.c2.79978400331
sys/netinet/sctp_constants.h2.79452724525
sys/dev/bge/if_bgereg.h2.62852430236
sys/dev/msk/if_msk.c2.31771522145
sys/netinet/sctputil.h2.19989952769
contrib/bind9/CHANGES2.15204576684
contrib/bind9/version2.15204576684
sys/kern/kern_fork.c2.12687393752
sys/netinet/in.c2.09501338006
sys/netinet/sctp_sysctl.c2.06560623664
sys/netinet6/sctp6_usrreq.c1.92839801197
sys/netinet/sctp_var.h1.91990472478
sys/conf/NOTES1.79166008565
sys/dev/mxge/if_mxge.c1.74628407608
contrib/bind9/bin/named/query.c1.74336559317
contrib/bind9/lib/dns/rbtdb.c1.74336559317
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c1.7287195258
sys/i386/xen/pmap.c1.68724079063
sys/powerpc/aim/mmu_oea64.c1.68644933665
contrib/bind9/lib/dns/validator.c1.6676964412


This is somewhat more interesting result. It shows that most 'dangerous' sits in SCTP implementation, NIC drivers (bge, msk, mxge) and also some bits in zfs and bind. I wonder how accurate is that?

The script is available here. It's designed to use git repository. I used freebsd git mirror on github: https://github.com/freebsd/freebsd-head.