Initial performance under ANTLR: short slab of ASCII text 14.380000 2.240000 16.620000 ( 16.685454) short slab of UTF-8 text 18.080000 2.420000 20.500000 ( 21.965856) After move to Ragel scanner: short slab of ASCII text 5.010000 0.010000 5.020000 ( 5.033520) short slab of UTF-8 text 9.130000 0.010000 9.140000 ( 9.158980) About 28% of time is being spent inside rb_ary_includes (compared with only 14% in the next_token function) After adding custom C replacement for Ruby Array class: short slab of ASCII text 3.400000 0.000000 3.400000 ( 3.417410) short slab of UTF-8 text 6.290000 0.010000 6.300000 ( 6.314861) Biggest drain is st_init_strtable (42%), something to do with strings followed by 14% for rb_str_append (next_token function now up to 22%) After speeding up Ragel scanner with -G2 switch: short slab of ASCII text 2.870000 0.000000 2.870000 ( 2.929794) short slab of UTF-8 text 5.390000 0.010000 5.400000 ( 5.399399) After replacing many rb_str_append calls with rb_str_cat (for constant strings): short slab of ASCII text 2.380000 0.010000 2.390000 ( 2.461861) short slab of UTF-8 text 4.860000 0.000000 4.860000 ( 5.016289) More rb_str_append calls replaced with rb_str_cat (token text): short slab of ASCII text 1.570000 0.010000 1.580000 ( 1.705596) short slab of UTF-8 text 3.280000 0.020000 3.300000 ( 3.353919) After implementing profiling_parse method (to minimize noise in profile from "times" method): short slab of ASCII text 1.490000 0.000000 1.490000 ( 1.498188) short slab of UTF-8 text 3.150000 0.010000 3.160000 ( 3.266873) Biggest drain still st_init_strtable (34.1%), followed by rb_str_append (12.3%), and rb_str_buf_cat (11.2%) not clear whether a custom string implementation would help here (next_token function currently accounts for 13.3%)