As you often need to iterate over UTF-8 characters inside a string, you might be tempted to use mb_substr($text,$i,1).
The problem with this is that there is no "magic" way to find $i-th character inside UTF-8 string, other than reading it byte by byte from the begining. Thus a loop which calls mb_substr($text,$i,1) N times for all possible N values of $i, will take much longer than expected. The larger the $i gets, the longer is the search for $i-th letter. As characters are between 1 to 6 bytes long, one can convince oneself, that the execution time of such loop is actually Theta(N^2), which can be really slow even for moderately long texts.
One way to work around it is to first split your text into an array of letters using some smart preprocessing, and only then iterate over the array.
Here is the idea:
<?php
class Strings
{
public static function len($a){
return mb_strlen($a,'UTF-8');
}
public static function charAt($a,$i){
return self::substr($a,$i,1);
}
public static function substr($a,$x,$y=null){
if($y===NULL){
$y=self::len($a);
}
return mb_substr($a,$x,$y,'UTF-8');
}
public static function letters($a){
$len = self::len($a);
if($len==0){
return array();
}else if($len == 1){
return array($a);
}else{
return Arrays::concat(
self::letters(self::substr($a,0,$len>>1)),
self::letters(self::substr($a,$len>>1))
);
}
}
?>
As you can see, the Strings::letters($text) split the text recursively into two parts. Each level of the recursion requires time linear in the length of the string, and there is logarithmic number of levels, so the total runtime is O(N log N), which is still more than theoretically optimal O(N), but sadly this is the best idea I've got.