Why is g_hash_table_insert used?

Yesterday I was discussing a bug in some code using a GHashTable with Will and we both started to wonder if there is any reason to use g_hash_table_insert instead of g_hash_table_replace.

First of all an example, look at this code and try to find the bug.

#include <glib.h>

static void
add_entry (GHashTable *ht, const gchar *config)
{
  gchar **split_config = g_strsplit (config, "=", 2);

  if (g_strv_length (split_config) == 2) {
      gchar *key = g_utf8_strdown (split_config[0], -1);
      gchar *value = g_strdup (split_config[1]);

      g_hash_table_insert (ht, key, value);

      g_print ("Set %s to %sn", key, value);
  }

  g_strfreev (split_config);
}

int
main (int argc, char **argv)
{
  GHashTable *ht = g_hash_table_new_full (g_str_hash,
      g_str_equal, g_free, g_free);
  gint i;

  for (i = 1; i < argc; i++)
    add_entry (ht, argv[i]);

  g_hash_table_unref (ht);

  return 0;
}

If it’s not clear where the bug is, try invoking the program with “apples=42 Pears=12 APPLES=10” on the command line.

If a key already exists in the hash table, the key passed to g_hash_table_insert is destroyed and you cannot use it afterwards. This behaviour is documented, but it’s easy to find code affected by this bug. g_hash_table_replace behaves like g_hash_table_insert, but without this problem.
Is there any good reason for using g_hash_table_insert instead of g_hash_table_replace? Can you come up with a non-contrived example where you want the behaviour of the former?

16 thoughts on “Why is g_hash_table_insert used?”

M Welinder says:

January 30, 2013 at 7:02 pm

> Is there any legitimate reason for using g_hash_table_insert instead of g_hash_table_replace?

Of course.

The choice between the two is what key-value pair to delete, so clearly you need to be careful no matter what version you use. With _insert it is safe to hold on to values from _lookup as long as you never delete; anything from the hash; with _replace that’s an instant core dump.

LikeLike
Emanuele Aina says:

January 30, 2013 at 8:01 pm

The above explanation from Morten makes sense. Marco, since you brought up the topic, would you care to submit a patch that explains that in the docs? 🙂

LikeLike
barisione says:

January 30, 2013 at 9:09 pm

Morten:
I’m not sure to understand what you mean, but it’s not safe to keep the values from _lookup. _insert will use the new value and the old key.

LikeLike
Emanuele Aina says:

January 30, 2013 at 9:51 pm

Yup, s/value/keys. The value is always replaced but with _insert() you can have stable references to inserted *keys*. I don’t think it is a common scenario, but at least it alleviates a bit the WTF. To be honest I’d say that this is not going to be worth the confusion it creates: Marco, propose a patch to deprecate _insert() and make it available under a different and more clear name! Maybe something like g_hash_table_replace_full (key, value, FLAGS) with FLAGS telling what should be done when throwing away keys?

LikeLike
barisione says:

January 30, 2013 at 10:18 pm

I still cannot think of any non-contrived example of code where you would like to keep a reference to the initial keys, but not to other keys you inserted later.

LikeLike
Benjamin Otte says:

January 30, 2013 at 11:30 pm

The best I can come up with:

char *key = find_key_for_value (table, value);
g_hash_table_insert (table, strdup (key), value);
g_print (“added value for key %sn”, key);

LikeLike
desrt says:

January 30, 2013 at 11:37 pm

Freeing a key that was likely just-allocated will have better results in terms of reducing memory fragmentation than freeing one that was allocated long ago…

LikeLike
James Livingston says:

January 31, 2013 at 1:46 am

I agree the names are confusing, but I’d expect the _insert behaviour to be the default not the _replace one. Why would it change the key when it’s already there?

For what it’s worth, Java’s Map.put() and C#’s Dictionary.Add() keep the old keys too. Obviously they don’t have the issue where the old key is freed causing problems if you have another copy.

LikeLike
danw says:

January 31, 2013 at 3:26 am

There is no non-contrived reason you’d want the _insert semantics. Someone just wrote it that way first, and _replace was added later as a saner alternate version (and _insert didn’t get deprecated because people didn’t deprecate things as much back then).

LikeLike
Mathias Hasselmann says:

January 31, 2013 at 10:16 am

Ryan just gave the major reason, and this is a big one.

Another reason is the use-case Morton gave:

static const char *KEY_CONSTANT = g_strdup(“WHATEVER”);

later:

lut = g_hash_table_new(g_str_hash, g_str_equal, g_free, value_destroy);
g_hash_table_insert(lut, KEY_CONSTANT, value);

and even later:

g_hash_table_lookup(lut, KEY_CONSTANT);

Without doubt this use case should be implemented implemented direct pointers:

static const char KEY_CONSTANT[] = “WHATEVER”;
lut = g_hash_table_new(g_direct_hash, g_direct_equal, NULL, value_destroy);

Also I believe you’ll break complexity properties of g_quark_from_static_string() at least when changing g_hash_table_insert().

Dawn: g_quark_from_static_string() depends on the current behavior of g_hash_table_insert().

LikeLike
Mathias Hasselmann says:

January 31, 2013 at 10:21 am

…memory fragmentation… this just leads to another big WTF: g_hash_table_add() – “This is a convenience function for using a GHashTable as a set. It is equivalent to calling g_hash_table_replace() with key as both the key and the value.” Please someone tell me the documentation is wrong.

LikeLike
Mathias Hasselmann says:

January 31, 2013 at 10:28 am

Ehm… Actually… The example is just invalid code:

g_hash_table_insert (ht, key, value);

By calling g_hash_table_insert() the ownership of the string key is pointing to, is transferred to the hash table. After that call dereferencing key is just undefined. By all means. Same way like after calling free.

Marco: Please don’t confuse use with invalid code! 🙂

LikeLike
barisione says:

January 31, 2013 at 10:51 am

@Mathias:
_insert is different from g_free as as most of the times you can use the key after an _insert, so you could not see the bug for a while. And this is not a rare theoretic bug as I remember fixing a few of these bugs while we worked together on the Fremantle address book project.

The documentation of g_hash_table_add is right. Hash tables are often used as sets and most GLib code I saw in the past used to have GINT_TO_POINTER(TRUE) as value for the items in the set.
GHashTable was recently optimised so that the keys are not stored if all the values are identical to the keys, so you save memory.

By the way, the bug I mentioned in the post is #692815.

LikeLike
Mathias Hasselmann says:

January 31, 2013 at 2:35 pm

Marco, the bug report actually describes a different issue than the one you describe in your example:

gchar *a = g_strdup (“foo”);
gchar *b = g_strdup (“foo”);
g_hash_table_insert (set, a, a);
g_hash_table_insert (set, b, b);

There the issue is, that the value for a gets replaced by b, which is an invalid pointer after g_hash_table_insert() returning.

So the problem isn’t bad behavior of g_hash_table_insert(), but the convention that GHashTable based sets shall have key == value. So what actually strikes back is the wrong cleverness in simulating sets with GHashTable by adding g_hash_table_* functions with crazy semantics, instead of just adding an explicit g_hash_set_* API.

LikeLike
Mathias Hasselmann says:

January 31, 2013 at 2:36 pm

So actually, the proper solution would be to deprecate those whack hacks related to g_hash_table_add (), and to introduce proper API instead.

LikeLike
barisione says:

January 31, 2013 at 3:42 pm

Mathias:
I just linked to the bug because I cited it in the post. I know it’s two different issues, but we started talking about _insert/_replace after Will found the bug.
I would prefer having a GSet structure too and it could be just using a GHashTable internally.

LikeLike