Tonight’s goal: Make a simple PHP class.
- Input: a URL pointing to an HTML document.
- Output: a UTF-8 version, regardless of what encoding it’s really in.
Sounds easy, right?
Nope. Because some pages specify encoding via HTTP header, some specify via
metatag, some specify both but they disagree, and some don’t specify at all. Sometimes, the encoding is specified with an unusual variant of its name (e.g. X-GBK, MS939). And often, the specified encoding is wrong.But I think I got it, finally.
This is so useful, albeit to a relatively narrow range of programmers, that I feel bad not releasing it to the world, except that I assume that someone else has already done this and I just didn’t bother looking for it. (My experiences with PHP-community code are not good, so I almost always roll my own.) Any interest?
I’d like to check it out. I haven’t had to mess with converting character encoding in a long time. I believe I got aggrivated with PHP the last time and just did it with Ruby.
-
mavryx reblogged this from marco and added:
Good Lord! Very Interested~~!!
-
answers reblogged this from 200 and added:
Shouldn’t be, but it is. mb_detect_encoding doesn’t always detect properly. It works statistically, and it’s imperfect....
-
200 reblogged this from marco and added:
mb_detect_encoding...mb_convert_encdoing plus...bit string...
-
elvira liked this
-
bjornstar reblogged this from marco and added:
Yes, please release it. You don’t have...support it, but helping people convert into UTF-8...
-
topherchris liked this
-
arood liked this
-
eduardoe liked this
-
spiteshow reblogged this from marco and added:
check it out. I haven’t had to mess...converting character
-
drazin liked this
-
davidhoffman liked this
-
inky liked this
-
kylewritescode reblogged this from marco and added:
I’m interested! marco:
-
thelos liked this
-
marco posted this