Arrays of strings and relocation

While working on GRegex (Perl-style regular expressions for GLib) I discovered that arrays of string pointers can cause lots of relocations.

Relocation is the process of adjusting the pointers whose value is unknown at link-time, such as the pointer to a function is a dynamically loaded library.

In PCRE (the library used by GRegex) there is a table used to associate the name of a script, such as Latin or Arabic, to its properties:

const ucp_type_table _pcre_utt[] = {
  { "Any",      PT_ANY, 0 },
  { "Arabic",   PT_SC,  ucp_Arabic },
  { "Armenian", PT_SC,  ucp_Armenian },
  ...
  { "Zs",       PT_PC,  ucp_Zs }
};

The strings (they are more than 100!) are placed in read-only memory, while the _pcre_utt array will be placed in writeable memory, so the dynamic loader has to do a relocation for each string. The number of relocations in libglib with GRegex was:

libglib-2.0.so: 290 relocations, 263 relative (90%), 131 PLT entries, 1 for local syms (0%), 0 userstable

Without GRegex:

libglib-2.0.so: 90 relocations, 63 relative (70%), 131 PLT entries, 1 for local syms (0%), 0 users

A possible solution is to replace the strings in the table by offsets into one big string constant.

const char _pcre_ucp_names[] =
  "Any\0"
  "Arabic\0"
  "Armenian\0"
  ...
  "Zs";

const ucp_type_table _pcre_utt[] = {
  { 0,   PT_ANY, 0 },
  { 4,   PT_SC,  ucp_Arabic },
  { 11,  PT_SC,  ucp_Armenian },
  ...
  { 653, PT_PC,  ucp_Zs }
};

The result after this operation is:

libglib-2.0.so.0.1300.0: 187 relocations, 160 relative (85%), 131 PLT entries, 1 for local syms (0%), 0 users

Using the same technique on the other arrays in PCRE and GRegex, we reduced the number of relocations, now the results are similar to the ones obtained without GRegex:

libglib-2.0.so.0.1300.0: 102 relocations, 75 relative (73%), 131 PLT entries, 1 for local syms (0%), 0 users

The number of relocations was obtained using the relinfo.pl script by Ulrich Drepper. If you want to know something more about dynamic libraries, you can read How To Write Shared Libraries by the same author.

UPDATE: Declare _pcre_ucp_names as a char [] instead of char *, as suggested by Ulrich Drepper.