Sunday, November 13, 2016

Playing around with valgrind in libvirt

Valgrind is a well known memory debugging tool. It doesn't seem to officially support FreeBSD, but, luckily, it's available in FreeBSD ports.

Using it is not hard (at least on a basic level), but requires some patience and attention.

Fortunately, libvirt comes provides an easy interface for adding unit tests, and it could be used for running with valgrind as well. Obviously, tests should cover code paths we want to test, but it's still much more convenient than triggering all the stuff manually.

Individual tests could be executed like this:

VIR_TEST_RANGE=22 VIR_TEST_DEBUG=1 ./tests/bhyvexml2argvtest

bhyvexml2argvtest is a test that checks that the bhyve driver converts XML to a proper set of actual bhyve(8) commands. VIR_TEST_RANGE tests to run only test #22 from the suite, and VIR_TEST_DEBUG=1 makes it produce more verbose output.

Adding valgrind into the game is fairly simple:

VIR_TEST_RANGE=22 VIR_TEST_DEBUG=1 valgrind --trace-children=yes --tool=memcheck --leak-check=full   ./tests/bhyvexml2argvtest

Note: I also configured libvirt with '-O0 -g' in CFLAGS for debugging.

It produces a bunch of information about leaks, for example:

==28853== 128 bytes in 1 blocks are definitely lost in loss record 134 of 173
==28853==    at 0x4C246C0: malloc (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==28853==    by 0x7D906D8: vasprintf_l (in /lib/libc.so.7)
==28853==    by 0x512876F: virVasprintfInternal (virstring.c:481)
==28853==    by 0x512886D: virAsprintfInternal (virstring.c:502)
==28853==    by 0x40D9A9: testCompareXMLToArgvHelper (bhyvexml2argvtest.c:109)
==28853==    by 0x40DFD8: virTestRun (testutils.c:180)
==28853==    by 0x40DD50: mymain (bhyvexml2argvtest.c:177)
==28853==    by 0x40FA77: virTestMain (testutils.c:992)
==28853==    by 0x40DE0E: main (bhyvexml2argvtest.c:193)
==28853== 
==28853== 128 bytes in 1 blocks are definitely lost in loss record 135 of 173
==28853==    at 0x4C246C0: malloc (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==28853==    by 0x7D906D8: vasprintf_l (in /lib/libc.so.7)
==28853==    by 0x512876F: virVasprintfInternal (virstring.c:481)
==28853==    by 0x512886D: virAsprintfInternal (virstring.c:502)
==28853==    by 0x40D9F5: testCompareXMLToArgvHelper (bhyvexml2argvtest.c:111)
==28853==    by 0x40DFD8: virTestRun (testutils.c:180)
==28853==    by 0x40DD50: mymain (bhyvexml2argvtest.c:177)
==28853==    by 0x40FA77: virTestMain (testutils.c:992)
==28853==    by 0x40DE0E: main (bhyvexml2argvtest.c:193)

Moving from bottom to top of these traces, it's fair to assume that the issue is somewhere in testCompareXMLToArgvHelper. It looks like this:

 96 
 97 static int
 98 testCompareXMLToArgvHelper(const void *data)
 99 {
100     int ret = -1;
101     const struct testInfo *info = data;
102     char *xml = NULL;
103     char *args = NULL, *ldargs = NULL, *dmargs = NULL;
104 
105     if (virAsprintf(&xml, "%s/bhyvexml2argvdata/bhyvexml2argv-%s.xml",
106                     abs_srcdir, info->name) < 0 ||
107         virAsprintf(&args, "%s/bhyvexml2argvdata/bhyvexml2argv-%s.args",
108                     abs_srcdir, info->name) < 0 ||
109         virAsprintf(&ldargs, "%s/bhyvexml2argvdata/bhyvexml2argv-%s.ldargs",
110                     abs_srcdir, info->name) < 0 ||
111         virAsprintf(&dmargs, "%s/bhyvexml2argvdata/bhyvexml2argv-%s.devmap",
112                     abs_srcdir, info->name) < 0)
113         goto cleanup;
114 
115     ret = testCompareXMLToArgvFiles(xml, args, ldargs, dmargs, info->flags);
116 
117  cleanup:
118     VIR_FREE(xml);
119     VIR_FREE(args);
120     return ret;
121 }

We can see that we call virAsnprintf a few times to populate xml, args, ldargs and dmargs, but later, in the cleanup section, we only free xml and args, leaking ldargs and dmargs, just like valgrind reported.

Now trying to fix that by adding two VIR_FREE calls for leaked variables and try again.

After running the test again, this warning is gone (yay!), so moving to the next one:

==29877== 296 (200 direct, 96 indirect) bytes in 1 blocks are definitely lost in loss record 146 of 171
==29877==    at 0x4C25C90: calloc (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==29877==    by 0x50C47D8: virAllocVar (viralloc.c:560)
==29877==    by 0x510D3A1: virObjectNew (virobject.c:193)
==29877==    by 0x510D4FC: virObjectLockableNew (virobject.c:219)
==29877==    by 0x51EBFC1: virGetConnect (datatypes.c:127)
==29877==    by 0x40D680: testCompareXMLToArgvFiles (bhyvexml2argvtest.c:34)
==29877==    by 0x40DA1B: testCompareXMLToArgvHelper (bhyvexml2argvtest.c:115)
==29877==    by 0x40DFF0: virTestRun (testutils.c:180)
==29877==    by 0x40DD68: mymain (bhyvexml2argvtest.c:179)
==29877==    by 0x40FA8F: virTestMain (testutils.c:992)
==29877==    by 0x40DE26: main (bhyvexml2argvtest.c:195)
==29877== 

And indeed, it looks like connect object is leaking here:

 22 static int testCompareXMLToArgvFiles(const char *xml,
 23                                      const char *cmdline,
 24                                      const char *ldcmdline,
 25                                      const char *dmcmdline,
 26                                      unsigned int flags)
 27 {
 28     char *actualargv = NULL, *actualld = NULL, *actualdm = NULL;
 29     virDomainDefPtr vmdef = NULL;
 30     virCommandPtr cmd = NULL, ldcmd = NULL;
 31     virConnectPtr conn;
 32     int ret = -1;
 33 
 34     if (!(conn = virGetConnect()))
 35         goto out;
  --- skip ---
 82  out:
 83     VIR_FREE(actualargv);
 84     VIR_FREE(actualld);
 85     VIR_FREE(actualdm);
 86     virCommandFree(cmd);
 87     virCommandFree(ldcmd);
 88     virDomainDefFree(vmdef);
 89     return ret;
 90 }

So adding virObjectUnref call for the connection object and trying again. And after this there's only one warning left:

==30532== 7 bytes in 1 blocks are definitely lost in loss record 6 of 169
==30532==    at 0x4C246C0: malloc (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==30532==    by 0x7D361DF: strdup (in /lib/libc.so.7)
==30532==    by 0x5128C71: virStrdup (virstring.c:714)
==30532==    by 0x410B8F: bhyveBuildNetArgStr (bhyve_command.c:94)
==30532==    by 0x411410: virBhyveProcessBuildBhyveCmd (bhyve_command.c:305)
==30532==    by 0x40D6FF: testCompareXMLToArgvFiles (bhyvexml2argvtest.c:46)
==30532==    by 0x40DA27: testCompareXMLToArgvHelper (bhyvexml2argvtest.c:116)
==30532==    by 0x40DFFC: virTestRun (testutils.c:180)
==30532==    by 0x40DD74: mymain (bhyvexml2argvtest.c:180)
==30532==    by 0x40FA9B: virTestMain (testutils.c:992)
==30532==    by 0x40DE32: main (bhyvexml2argvtest.c:196)

This one is a little harder. Reading the code of bhyveBuildNetArgStr, it's not exactly obvious where it leaks, because net->ifname was freed in every code path.

 46 static int
 47 bhyveBuildNetArgStr(virConnectPtr conn,
 48                     const virDomainDef *def,
 49                     virDomainNetDefPtr net,
 50                     virCommandPtr cmd,
 51                     bool dryRun)
 52 {
 53     char macaddr[VIR_MAC_STRING_BUFLEN];
 54     char *realifname = NULL;
 55     char *brname = NULL;
 56     char *nic_model = NULL;
 57     int ret = -1;
 58     virDomainNetType actualType = virDomainNetGetActualType(net);
 59 
 60     if (STREQ(net->model, "virtio")) {
 61         if (VIR_STRDUP(nic_model, "virtio-net") < 0)
 62             return -1;
 63     } else if (STREQ(net->model, "e1000")) {
 64         if ((bhyveDriverGetCaps(conn) & BHYVE_CAP_NET_E1000) != 0) {
 65             if (VIR_STRDUP(nic_model, "e1000") < 0)
 66                 return -1;
 67         } else {
 68             virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
 69                            _("NIC model 'e1000' is not supported "
 70                              "by given bhyve binary"));
 71             return -1;
 72         }
 73     } else {
 74         virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
 75                        _("NIC model '%s' is not supported"),
 76                        net->model);
 77         return -1;
 78     }
 79 
 80     if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) {
 81         if (VIR_STRDUP(brname, virDomainNetGetActualBridgeName(net)) < 0)
 82             goto out;
 83     } else {
 84         virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
 85                        _("Network type %d is not supported"),
 86                        virDomainNetGetActualType(net));
 87         goto out;
 88     }
 89 
 90     if (!net->ifname ||
 91         STRPREFIX(net->ifname, VIR_NET_GENERATED_PREFIX) ||
 92         strchr(net->ifname, '%')) {
 93         VIR_FREE(net->ifname);
 94         if (VIR_STRDUP(net->ifname, VIR_NET_GENERATED_PREFIX "%d") < 0)
 95             goto out;
 96     }
 97 
 98     if (!dryRun) {
 99         if (virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac,
100                                            def->uuid, NULL, NULL, 0,
101                                            virDomainNetGetActualVirtPortProfile(net),
102                                            virDomainNetGetActualVlan(net),
103                                            VIR_NETDEV_TAP_CREATE_IFUP | VIR_NETDEV_TAP_CREATE_PERSIST) < 0)
104             goto out;
105 
106         realifname = virNetDevTapGetRealDeviceName(net->ifname);
107 
108         if (realifname == NULL)
109             goto out;
110 
111         VIR_DEBUG("%s -> %s", net->ifname, realifname);
112         /* hack on top of other hack: we need to set
113          * interface to 'UP' again after re-opening to find its
114          * name
115          */
116         if (virNetDevSetOnline(net->ifname, true) != 0)
117             goto out;
118     } else {
119         if (VIR_STRDUP(realifname, "tap0") < 0)
120             goto out;
121     }
122 
123 
124     virCommandAddArg(cmd, "-s");
125     virCommandAddArgFormat(cmd, "%d:0,%s,%s,mac=%s",
126                            net->info.addr.pci.slot, nic_model,
127                            realifname, virMacAddrFormat(&net->mac, macaddr));
128 
129     ret = 0;
130  out:
131     VIR_FREE(net->ifname);
132     VIR_FREE(realifname);
133     VIR_FREE(brname);
134     VIR_FREE(nic_model);
135 
136     return ret;
137 }

Also, I had an assumption (wrong!) that I was running with dryRun == true in the test. Anyway, when things go in unexpected ways, it's good to run lldb:

(lldb) breakpoint set --file bhyve_command.c --line 94
Breakpoint 1: where = bhyvexml2argvtest`bhyveBuildNetArgStr + 731 at bhyve_command.c:94, address = 0x0000000000410b3d

Then a couple of next's and we can set a watchpoint and see what happens:

(lldb) print net->ifname
(char *) $1 = 0x0000000807343678 "vnet%d"
(lldb) watchpoint set var net->ifname
Watchpoint created: Watchpoint 1: addr = 0x80727f108 size = 8 state = enabled type = w
    declare @ '/home/novel/code/libvirt/src/bhyve/bhyve_command.c:49'
    watchpoint spec = 'net->ifname'
    new value: 0x0000000807343678
(lldb) c
Process 53109 resuming

Watchpoint 1 hit:
old value: 0x0000000807343678
new value: 0x0000000000000000
Process 53109 stopped
* thread #1: tid = 100513, 0x0000000800ccadb6 libvirt.so.0`virStrdup(dest=0x000000080727f108, src="vnet0", report=true, domcode=57, filename="bhyvexml2argvmock.c", funcname="virNetDevTapCreateInBridgePort", linenr=32) + 43 at virstring.c:712, stop reason = watchpoint 1
    frame #0: 0x0000000800ccadb6 libvirt.so.0`virStrdup(dest=0x000000080727f108, src="vnet0", report=true, domcode=57, filename="bhyvexml2argvmock.c", funcname="virNetDevTapCreateInBridgePort", linenr=32) + 43 at virstring.c:712
   709            size_t linenr)
   710  {
   711      *dest = NULL;
-> 712      if (!src)
   713          return 0;
   714      if (!(*dest = strdup(src))) {
   715          if (report)
(lldb) bt
* thread #1: tid = 100513, 0x0000000800ccadb6 libvirt.so.0`virStrdup(dest=0x000000080727f108, src="vnet0", report=true, domcode=57, filename="bhyvexml2argvmock.c", funcname="virNetDevTapCreateInBridgePort", linenr=32) + 43 at virstring.c:712, stop reason = watchpoint 1
  * frame #0: 0x0000000800ccadb6 libvirt.so.0`virStrdup(dest=0x000000080727f108, src="vnet0", report=true, domcode=57, filename="bhyvexml2argvmock.c", funcname="virNetDevTapCreateInBridgePort", linenr=32) + 43 at virstring.c:712
    frame #1: 0x0000000800862168 bhyvexml2argvmock.so`virNetDevTapCreateInBridgePort(brname="virbr0", ifname=0x000000080727f108, macaddr=0x000000080727f004, vmuuid="\x04\x11㮰P+874\a\b", tunpath=0x0000000000000000, tapfd=0x0000000000000000, tapfdSize=0, virtPortProfile=0x0000000000000000, virtVlan=0x0000000000000000, fakeflags=9) + 83 at bhyvexml2argvmock.c:32
    frame #2: 0x0000000000410bf3 bhyvexml2argvtest`bhyveBuildNetArgStr(conn=0x000000080725e000, def=0x000000080727c000, net=0x000000080727f000, cmd=0x0000000807245280, dryRun=false) + 913 at bhyve_command.c:99
    frame #3: 0x00000000004113eb bhyvexml2argvtest`virBhyveProcessBuildBhyveCmd(conn=0x000000080725e000, def=0x000000080727c000, dryRun=false) + 610 at bhyve_command.c:304
    frame #4: 0x000000000040d70a bhyvexml2argvtest`testCompareXMLToArgvFiles(xml="/home/novel/code/libvirt/tests/bhyvexml2argvdata/bhyvexml2argv-net-e1000.xml", cmdline="/home/novel/code/libvirt/tests/bhyvexml2argvdata/bhyvexml2argv-net-e1000.args", ldcmdline="/home/novel/code/libvirt/tests/bhyvexml2argvdata/bhyvexml2argv-net-e1000.ldargs", dmcmdline="/home/novel/code/libvirt/tests/bhyvexml2argvdata/bhyvexml2argv-net-e1000.devmap", flags=0) + 225 at bhyvexml2argvtest.c:48
    frame #5: 0x000000000040da26 bhyvexml2argvtest`testCompareXMLToArgvHelper(data=0x000000000063e460) + 405 at bhyvexml2argvtest.c:117
    frame #6: 0x000000000040dfe3 bhyvexml2argvtest`virTestRun(title="BHYVE XML-2-ARGV net-e1000", body=(bhyvexml2argvtest`testCompareXMLToArgvHelper at bhyvexml2argvtest.c:101), data=0x000000000063e460) + 245 at testutils.c:180
    frame #7: 0x000000000040dd5b bhyvexml2argvtest`mymain + 789 at bhyvexml2argvtest.c:179
    frame #8: 0x000000000040fa82 bhyvexml2argvtest`virTestMain(argc=1, argv=0x00007fffffffe6e8, func=(bhyvexml2argvtest`mymain at bhyvexml2argvtest.c:127)) + 1675 at testutils.c:992
    frame #9: 0x000000000040de19 bhyvexml2argvtest`main(argc=1, argv=0x00007fffffffe6e8) + 50 at bhyvexml2argvtest.c:195
    frame #10: 0x000000000040d44f bhyvexml2argvtest`_start + 383
(lldb) 

But virStrdup is not really helpful, so going to the previous frame to see the logic behind that:

(lldb) fr s 1
frame #1: 0x0000000800862168 bhyvexml2argvmock.so`virNetDevTapCreateInBridgePort(brname="virbr0", ifname=0x000000080727f108, macaddr=0x000000080727f004, vmuuid="\x04\x11㮰P+874\a\b", tunpath=0x0000000000000000, tapfd=0x0000000000000000, tapfdSize=0, virtPortProfile=0x0000000000000000, virtVlan=0x0000000000000000, fakeflags=9) + 83 at bhyvexml2argvmock.c:32
   29                                      virNetDevVlanPtr virtVlan ATTRIBUTE_UNUSED,
   30                                      unsigned int fakeflags ATTRIBUTE_UNUSED)
   31   {
-> 32       if (VIR_STRDUP(*ifname, "vnet0") < 0)
   33           return -1;
   34       return 0;
   35   }
(lldb) 

Finally, things are getting clear: bhyveBuildNetArgStr was calling virNetDevTapCreateInBridgePort that is mocked for tests. And the mocked version does not free ifname before calling strdup on it, so it's never freed again. The rest of the fix is trivial, problem solved.

All in all, this appears like an interesting and useful exercise. This is probably too detailed, but hopefully it'll be useful for somebody who's just starting the journey of searching for memory leaks (like myself).

1 comment: