PHP’s
serialize()andunserialize()are like Schrödinger’s functions: they either preserve your data perfectly or they burn your sanity in UTF-8 fire.
This all started when I was integrating a solution for a WordPress plugin that stores users’ fields data as serialized PHP blobs in the database and another plugin that just didn’t care and sent the unparsed data to the frontend .. I also needed to validate the raw data before sending it to the server for saving to make sure some properties were set correctly in order to avoid breaking the two plugins stupid flow. Yeah, looking back, I should have done things differently .. but hindsight is 20/20.
I optimistically thought: “How hard can it be to parse this PHP gibberish in JavaScript?”… turned out, pretty hard.
Phase 1: The Beautiful .. Simple strings are the false hope
At first glance the format looks simple enough:
$data = ["name" => "Anas", "age" => 30];
$serialized = serialize($data);
echo $serialized;
//Outputs: a:2:{s:4:"name";s:4:"Anas";s:3:"age";i:30;}
Pretty straightforward, right? An array of 2 elements, each with a string key and a value that can be either a string or an integer.
It’s deterministic, compact and kind of human-readable-ish.
I fell into its trap..
I thought let’s do crazy and throw an emoji, because why not, that’s what users do.. sometimes .. weirdly.
echo serialize("emoji test: 😅🚀");
//Outputs: s:20:"emoji test: 😅🚀";
Wait, what? 20? "emoji test: 😅🚀".length help! .. no it’s 16 .. where did the extra 4 come from?
Well, dear good old lovely reader.. PHP counts string lengths in bytes not characters, and those emojis are 4 bytes each in UTF-8.
WELCOME TO CHARACTER ENCODING HELL!
Phase 2: The Broken .. Arrays, Inside Arrays, Inside Arrays ..
After strings came arrays.
And nested arrays.
And arrays pretending to be associative objects pretending to be arrays again.
$data = [
"colors" => ["red", "green", "blue"],
"shapes" => [
"circle" => ["radius" => 10],
"square" => ["side" => 5]
]
];
$serialized = serialize($data);
echo $serialized;
//Outputs: a:2:{s:6:"colors";a:3:{i:0;s:3:"red";i:1;s:5:"green";i:2;s:4:"blue";}s:6:"shapes";a:2:{s:6:"circle";a:1:{s:6:"radius";i:10;}s:6:"square";a:1:{s:4:"side";i:5;}}}
Looks fine, right? yeah right .. because life is usually this easy!
You see the }} at the end? my parser thought that was the end of the data and died screaming “Unknown type ‘}’“.
Turns out, PHP’s serialization format doesn’t give you me any indication of when the whole structure ends, you have to keep track of how many opening { you have seen and match them with closing }. Like counting parentheses in a bad regex.
Recursion .. fun times!
class PhpReader {
constructor(input) {
this.input = input;
this.pos = 0;
}
read(n = 1) { return this.input.slice(this.pos, this.pos += n); }
readUntil(ch) { /* … find next delimiter … */ }
}
That’s how good I am in inventing bugs and wasting time.
Phase 3: The WTF .. Re-implementing PHP in JavaScript What the hell am I doing?
I should have probably give up at this point, but when doing something stupid you reach that weird point of pride where you just want to finish it and prove to yourself and to the damn computer you can do it.
So I did.
A small part of the final parser looks like this:
function parseString(r) {
const len = Number(r.readUntil(':'));
r.expect('"');
let out = '', bytes = 0;
while (bytes < len) {
const [ch, b] = readCodePointChunk(r); // UTF-8 accurate
if ((bytes += b) > len) throw r.error('UTF-8 mismatch');
out += ch;
}
r.expect('"');
r.expect(';');
return out;
}
Take that PHP! .. for some reason..
Well, see that readCodePointChunk function? That was the hardest part, because I had to read the string byte by byte and decode UTF-8 code points correctly to avoid breaking in the middle of a multi-byte character.
Once I wired this up with a descent recursive parseArray and simple phpSerialize() function and copied one of the weirdest PHP serialized strings I could find in one of our databases, something wild happened:
const data = PHPSer.parseSerialized(
'a:1:{s:4:"user";a:2:{s:4:"name";s:4:"John";s:3:"age";i:30;}}'
);
PHPSer.deepSet(data, 'user.name', 'Anas');
console.log(PHPSer.phpSerialize(data));
// Outputs: a:1:{s:4:"user";a:2:{s:4:"name";s:4:"Anas";s:3:"age";i:30;}}
It worked!
In browser!
Without PHP!
I might have shedded a tear of joy at the time, can’t really remember.
Lessons neither me nor anyone else asked for:
- Never trust a format that counts bytes instead of characters.
- UTF-8 is simple and complex at the same time.
- If your debugger says “Unknown type ‘}’”, it probably means you messed up your brackets.
- Building tools you don’t technically need teached you the stuff you actually do. (still not sure what I learned here)
The End .. (for now)
If you ever have to inspect PHP serialized data in your frontend for some reason.. you probably are doing something wrong that you ended up here!
Anyways .. just hit F12 and get the php-parser.js script file.
PS: I never said that I prefer this over JSON, but I respect PHP serialization now, like you respect a crocodile .. in a cage .. from a distance .. with a stick .. and maybe some kevlar armor.
Here’s the demo
✓ Parsed successfully.
× Parse error. Please check the input.
Editable JSON:
× Serialize error. Please check the JSON input.
✓ Serialized successfully.
Serialized output:
No comments yet.
Note: I welcome all comments, including negative ones, but I reserve the right to moderate for spam, abuse, hate speech, and other inappropriate content. Disclaimer: Comments are moderated and may take some time to appear. I do not endorse the views expressed in comments. By submitting a comment, you agree to the privacy policy and consent to the handling of your data as described therein.