You can use the 'original' abbreviations if you feel more comfortable:
<?php
Normalizer::NFD;
Normalizer::NFKD;
Normalizer::NFC;
Normalizer::NFKC;
?>
(PHP 5 >= 5.3.0, PHP 7, PECL intl >= 1.0.0)
Normalizer::normalize -- normalizer_normalize — Normalizes the input provided and returns the normalized string
Object oriented style
$input
[, int $form
= Normalizer::FORM_C
] )Procedural style
$input
[, int $form
= Normalizer::FORM_C
] )Normalizes the input provided and returns the normalized string
input
The input string to normalize
form
One of the normalization forms.
The normalized string or FALSE
if an error occurred.
Example #1 normalizer_normalize() example
<?php
$char_A_ring = "\xC3\x85"; // 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "\xCC\x8A"; // 'COMBINING RING ABOVE' (U+030A)
$char_1 = normalizer_normalize( $char_A_ring, Normalizer::FORM_C );
$char_2 = normalizer_normalize( 'A' . $char_combining_ring_above, Normalizer::FORM_C );
echo urlencode($char_1);
echo ' ';
echo urlencode($char_2);
?>
Example #2 OO example
<?php
$char_A_ring = "\xC3\x85"; // 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "\xCC\x8A"; // 'COMBINING RING ABOVE' (U+030A)
$char_1 = Normalizer::normalize( $char_A_ring, Normalizer::FORM_C );
$char_2 = Normalizer::normalize( 'A' . $char_combining_ring_above, Normalizer::FORM_C );
echo urlencode($char_1);
echo ' ';
echo urlencode($char_2);
?>
The above example will output:
%C3%85 %C3%85
You can use the 'original' abbreviations if you feel more comfortable:
<?php
Normalizer::NFD;
Normalizer::NFKD;
Normalizer::NFC;
Normalizer::NFKC;
?>
Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before.
The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text.
Be sure to have the PHP-Normalizer-extension (intl and icu) installed.
Tipp: You may also want to map the text to lower case before execute matching procedures ...
<?php
function normalizeUtf8String( $s)
{
// Normalizer-class missing!
if (! class_exists("Normalizer", $autoload = false))
return $original_string;
// maps German (umlauts) and other European characters onto two characters before just removing diacritics
$s = preg_replace( '@\x{00c4}@u' , "AE", $s ); // umlaut Ä => AE
$s = preg_replace( '@\x{00d6}@u' , "OE", $s ); // umlaut Ö => OE
$s = preg_replace( '@\x{00dc}@u' , "UE", $s ); // umlaut Ü => UE
$s = preg_replace( '@\x{00e4}@u' , "ae", $s ); // umlaut ä => ae
$s = preg_replace( '@\x{00f6}@u' , "oe", $s ); // umlaut ö => oe
$s = preg_replace( '@\x{00fc}@u' , "ue", $s ); // umlaut ü => ue
$s = preg_replace( '@\x{00f1}@u' , "ny", $s ); // ñ => ny
$s = preg_replace( '@\x{00ff}@u' , "yu", $s ); // ÿ => yu
// maps special characters (characters with diacritics) on their base-character followed by the diacritical mark
// exmaple: Ú => U´, á => a`
$s = Normalizer::normalize( $s, Normalizer::FORM_D );
$s = preg_replace( '@\pM@u' , "", $s ); // removes diacritics
$s = preg_replace( '@\x{00df}@u' , "ss", $s ); // maps German ß onto ss
$s = preg_replace( '@\x{00c6}@u' , "AE", $s ); // Æ => AE
$s = preg_replace( '@\x{00e6}@u' , "ae", $s ); // æ => ae
$s = preg_replace( '@\x{0132}@u' , "IJ", $s ); // ? => IJ
$s = preg_replace( '@\x{0133}@u' , "ij", $s ); // ? => ij
$s = preg_replace( '@\x{0152}@u' , "OE", $s ); // Œ => OE
$s = preg_replace( '@\x{0153}@u' , "oe", $s ); // œ => oe
$s = preg_replace( '@\x{00d0}@u' , "D", $s ); // Ð => D
$s = preg_replace( '@\x{0110}@u' , "D", $s ); // Ð => D
$s = preg_replace( '@\x{00f0}@u' , "d", $s ); // ð => d
$s = preg_replace( '@\x{0111}@u' , "d", $s ); // d => d
$s = preg_replace( '@\x{0126}@u' , "H", $s ); // H => H
$s = preg_replace( '@\x{0127}@u' , "h", $s ); // h => h
$s = preg_replace( '@\x{0131}@u' , "i", $s ); // i => i
$s = preg_replace( '@\x{0138}@u' , "k", $s ); // ? => k
$s = preg_replace( '@\x{013f}@u' , "L", $s ); // ? => L
$s = preg_replace( '@\x{0141}@u' , "L", $s ); // L => L
$s = preg_replace( '@\x{0140}@u' , "l", $s ); // ? => l
$s = preg_replace( '@\x{0142}@u' , "l", $s ); // l => l
$s = preg_replace( '@\x{014a}@u' , "N", $s ); // ? => N
$s = preg_replace( '@\x{0149}@u' , "n", $s ); // ? => n
$s = preg_replace( '@\x{014b}@u' , "n", $s ); // ? => n
$s = preg_replace( '@\x{00d8}@u' , "O", $s ); // Ø => O
$s = preg_replace( '@\x{00f8}@u' , "o", $s ); // ø => o
$s = preg_replace( '@\x{017f}@u' , "s", $s ); // ? => s
$s = preg_replace( '@\x{00de}@u' , "T", $s ); // Þ => T
$s = preg_replace( '@\x{0166}@u' , "T", $s ); // T => T
$s = preg_replace( '@\x{00fe}@u' , "t", $s ); // þ => t
$s = preg_replace( '@\x{0167}@u' , "t", $s ); // t => t
// remove all non-ASCii characters
$s = preg_replace( '@[^\0-\x80]@u' , "", $s );
// possible errors in UTF8-regular-expressions
if (empty($s))
return $original_string;
else
return $s;
}
?>
The above function is mainly based on the following article:
http://ahinea.com/en/tech/accented-translate.html
If you get error messages while starting apache of xampp package with activated extension=intl.dll, copy the files
* icudt##.dll
* icuin##.dll
* icuio##.dll
* icule##.dll
* iculx##.dll
* icutu##.dll
* icuuc##.dll
## = version number
from "/program files/xampp/php"
into your "/program files/xampp/apache/bin" or whereever your xampp resides :-)