Wasting InnoDB memory

I usually get strange looks when I complain about memory handling inside InnoDB. It seems as if terabytes of RAM are so common and cheap, that nobody should really care about memory efficiency. Unfortunately for me, I do.

Examples:

  • The infamous Bug#15815 – buffer pool mutex contention. The patch for the bug added lots of small mutexes, and by ‘lots’ I mean really really lots – two mutexes (and rwlock structure) for each buffer pool page. That makes two million mutexes for 16GB buffer pool, um, four million mutexes for 32GB buffer pool, and I guess more for larger buffer pools. Result – 16GB buffer pool gets 625MB locking tax to solve a 8-core locking problem. Solution? Between giant lock and armies of page mutexes there lives a land of mutex pools, where locks are shared happily by multiple entities. I even made a patch, unfortunately it gets some ibuf assertion after server restart though at first everything works great :)
  • InnoDB data dictionary always grows, never shrinks. It is not considered a bug, as it isn’t memory leak – all memory is accounted by (hidden) dict_sys->size, and valgrind doesn’t print errors. 1-column table takes 2k of memory in InnoDB data dictionary, a table with few more columns and indexes takes already 10k. 100000 tables, and 1GB of memory is wasted. Who needs 100000 tables? People running application farms do. Actually, there even is a code for cleaning up data dictionary, just wasn’t finished, and is commented out at the moment. Even worse, the fix for #20877 was a joke – reducing the in-memory structure size, still not caring about structure count. And of course, do note that every InnoDB partition of a table takes space there too…

So generally if you’re running bigger InnoDB deployment, you may be hitting various hidden memory taxes – in hundreds of megabytes, or gigabytes – that don’t provide too much value anyway. Well, memory is cheap, our next database boxes will be 32GB-class instead of those ‘amnesia’ 16GB types, and I can probably stop ranting :)

10 thoughts on “Wasting InnoDB memory”

  1. InnoDB is very elegant in some respects but totally broken in others.

    Hopefully with more people looking at the code this will be fixed.

  2. The structs (mutex_struct, rw_lock_struct and os_event_struct) can be made smaller, especially on 64-bit servers. Look for it in the next patch and keep the good ideas coming.

  3. For a 32 bit build on Linux 2.6, sizeof(mutex_struct) is 104 and sizeof(rw_lock_struct) is 176. With our unreleased patch sizeof(mutex_struct) == 40 and sizeof(rw_lock_struct) = 112. I think that both can be reduced by another 10 bytes.

  4. My numbers leave out the size of the os_event_struct, which is 132 bytes on my 32-bit build. Each mutex_struct and rw_lock_struct instance will have 1 os_event_struct in the patch.

  5. I think there is also other point – do we really need so MANY mutexes ?

    One mutex is bad – millions is not good either. Why does not Innodb get to have some hashing for the mutexes to keep balance between contention and overhead.

  6. Mark, I was looking at 64-bit builds, those are the ones that tend to have big buffer pool. But yeah, reducing structure sizes is another way to help here.

    Peter, thats what my patch does =)

  7. I don’t know InnoDB internals, but in general, trying to pool RW locks for buffer pool pages can be problematic. If access methods are allowed to lock multiple pages simultaneously, then you can introduce spurious deadlocks that way.

    For example, you might have access methods which are supposed to lock index pages in tree order (from root to leaf) so as to avoid deadlocks. Let’s say I’ve got a three-level tree, and the hash for one of the leaves is the same as the hash for the root. Thread 1 locks the root and then an intermediate page. Thread 2 locks the same intermediate page and then the leaf. Oops, deadlock.

    If InnoDB doesn’t use the RW locks as medium-duration physical page locks like this (or if it can handle deadlocks at some level), then it’s not an issue.

    Short-duration mutexes are easier.

    JVS

  8. Domas,

    I think the fix is to shrink the size and to not have an os_event_struct per rw-mutex/mutex. This should cut the memory consumption by more than 1/3 without major surgery.

Comments are closed.