This article was originally published in the June 2024 edition of the PHP Architect magazine.
Introduction
If you're anything like me, the first time I saw a serialized string in PHP, I was super confused. I was working on a Laravel project and wanted to understand what was happening when I pushed a job onto a queue. I noticed that some data was being serialized but didn't understand why or how it worked. But after I spent some time experimenting with serialization, I started to realise that it's not as scary as it looks.
In this article, we're going to take a look at what serialization is and how it works. We'll then explore how you can use PHP's built-in serialization functions so you can serialize and unserialize data in your applications. Finally, we'll discuss how you can write tests to make sure your serialization code works as expected.
Hopefully, by the end of the article, you should have a good understanding of what serialization is and feel confident enough to use it in your own applications.
What is Serialization?
Serialization is the process of converting a variable, object, or data structure into a string format. This string format represents the original data in a way that can be stored or transmitted elsewhere. On the other hand, deserialization (often referred to as "unserialization" in PHP) is the process of converting the serialized data back into its original form.
Serialization is an important concept and can be used for things like converting data so it can be stored in a cache, database, or file.
We can serialize data into many different formats such as JSON, XML, or even a binary format (such as Protocol Buffers for use in gRPC APIs). However, in this article, we're going to focus on PHP's built-in serialization functions.
As an example, if you've ever worked with Laravel before, you may have spotted that the framework serializes data when pushing a job onto a queue ready to be run. For instance, take this pending job in Laravel that's been pushed onto the queue (split onto new lines and with some properties removed for readability):
1{2 "uuid": "3d05be68-8cd0-4c3a-8d05-71e86871713a",3 "data": {4 "commandName": "App\\Jobs\\SendOneTimePassword",5 "command": "O:28:\"App\\Jobs\\SendOneTimePassword\"6 :1:{s:15:\"oneTimePassword\";s:6:\"123456\";}"7 }8}
In the example JSON that represents a pending job, the data.command
property is a serialized string that represents an App\Jobs\SendOneTimePassword
job. When a queue worker picks this job off the queue, the serialized string is then deserialized to create an instance of the App\Jobs\SendOneTimePassword
class to work with. Don't worry if this doesn't make sense right now, we'll dive into more examples later on and explain what's happening.
How Does Serialization Work in PHP?
In PHP, we can easily serialize and unserialize data using the serialize
and unserialize
functions respectively.
The serialize
function accepts the data you want to serialize and returns it in a string format. Whereas, the unserialize
function accepts the serialized data and returns the original data structure.
Let's take a look at how we can serialize and unserialize different types of data in PHP:
Serializing Strings
To serialize a string, you can simply pass the string to the serialize
function:
1$serialized = serialize('Hello');
This will return a serialized string:
1s:5:"Hello";
This looks a bit strange at first, but once you notice the pattern, you'll see that it's not as scary as it appears. Our serialized data is following the format: data_type:string_length:string;
.
So in the case of the serialized string above, s
stands for string and represents the data type for when we unserialize the data, and 5
is the length of the string.
We could then pass that serialized string to the unserialize
function to get the original string back:
1$string = unserialize('s:5:"Hello";');
Serializing Integers and Floats
We can also serialize integers and floats in PHP. Here's how you can serialize an integer:
1serialize(123);
This will return a serialized string:
1i:123;
You may have noticed that the structure is slightly different from the serialized string we saw earlier. Integers are serialized using a format of: data_type:data;
. Notice that we don't have size
here like with strings. In this case, the data type of the serialized data is i
for integer.
Similarly, we can serialize floats:
1serialize(123.45);
This will return a serialized string:
1d:123.45;
This structure is similar to the integer serialization, but the data type is d
for double.
Serializing Booleans
We can also serialize booleans in PHP. For example, we can serialize true
:
1serialize(true);
This will return a serialized string with b
as the data type and 1
(representing true
) as the value:
1b:1;
Similarly, we can serialize false
:
1serialize(false);
This will return a serialized string with b
as the data type and 0
(representing false
) as the value:
1"b:0;"
Serializing Arrays
We can serialize arrays in PHP like so:
1serialize([1,2,3]);
This will return a serialized string:
1a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}
Now, you may have noticed that this is a little more complex than the other serialized data we've already looked at. Let's break it down.
The string has a structure of data_type:size:{key_data_type:key_data;value_data_type:value_data;...}
. In this case, the data_type
is a
for array and the size is 3
because the array has 3 elements.
If we then look at the data inside the { }
, we can see that the keys are represented by i
for integers and the values are also represented by i
for integers. It might help to visualise the structure by splitting them onto new lines:
1i:0;i:1;2i:1;i:2;3i:2;i:3;
As another example, let's look at what a serialized array of strings might look like. We can serialize the following array:
1serialize(['a','b','c']);
This will return a serialized string:
1a:3:{i:0;s:1:"a";i:1;s:1:"b";i:2;s:1:"c";}
As we can see in the serialized string above, the keys are still represented by i
, whereas the values are represented by s
for string. To help visualise the structure, we can split the data onto new lines:
1i:0;s:1:"a";2i:1;s:1:"b";3i:2;s:1:"c";
Similarly, we can also serialize an associative array:
1serialize(['a' => 'A', 'b' => 'B', 'c' => 'C']);
This will return a serialized string:
1a:3:{s:1:"a";s:1:"A";s:1:"b";s:1:"B";s:1:"c";s:1:"C";}
As we can see, the structure is very similar to the serialized arrays we've already looked at. However, in this case, the keys are represented by s
for strings. To help visualise the structure, we can split the data onto new lines:
1s:1:"a";s:1:"A";2s:1:"b";s:1:"B";3s:1:"c";s:1:"C";
Serializing Enums
We can also serialize enums in PHP. As a basic example, imagine we have the following enum that represents the status of a blog post:
1namespace App\Enums;2 3enum PostStatus: string4{5 case Published = 'published';6 case Draft = 'draft';7 case Pending = 'in_review';8}
Let's imagine we then create a new instance of this enum and serialize it like so:
1serialize(PostStatus::Published);
This will return a serialized string:
1E:30:"App\Enums\PostStatus:Published";
The structure of the serialized enum is data_type:size:"enum_type:enum_value";
. In this case, the data type is represented by E
and the size
is 30
because the class name is App\Enums\PostStatus
and the enum value is Published
.
Serializing Objects
So far we've covered how serialization works for basic data types like strings, integers, floats, booleans, arrays, and enums. But what about objects?
By default, apart from a few built-in PHP classes, all objects are serializable.
To explain how object serialization works, let's take a basic example App\User
class that contains three public properties:
1namespace App; 2 3class User 4{ 5 public function __construct( 6 public string $name, 7 public string $email, 8 public string $apiToken, 9 ) { }10}
We'll create a new instance of this class and serialize it:
1$user = new User(2 name: 'Ash Allen',3 email: 'mail@ashallendesign.co.uk',4 apiToken: 'secret',5);6 7serialize($user);
This will return a serialized string:
1O:8:"App\User":3:{s:4:"name";s:9:"Ash Allen";s:5:"email";s:225:"mail@ashallendesign.co.uk";s:8:"apiToken";s:6:"secret";}
Let's break down the structure of the serialized object. We've got the following structure:
1data_type:class_name_size:class_name:property_count:{2 property_name_type:property_name_size:property_name;3 property_value_type:property_value_size:property_value;4 ...5}
So from this structure, we can see that the data type is O
for object, the class name size is 8
, the class name is App\User
, and the property count is 3
because the object has 3 properties. We can then see each of the serialized properties inside the { }
.
We can then pass this serialized string to the unserialize
function to get the original object back:
1$serialized = 'O:8:"App\User":3:{s:4:"name";s:9:"Ash Allen'2 .'";s:5:"email";s:25:"mail@ashallendesign.co.uk";s:8:'3 .'"apiToken";s:6:"secret";}';4 5$user = unserialize($serialized);
This would return an instance of the App\User
class with each of the properties set like the original.
Property Visibility
When serializing objects, the visibility of the properties is important as it affects the string that is returned.
Let's update our App\User
class to have a public, protected, and private property:
1namespace App; 2 3class User 4{ 5 public function __construct( 6 public string $name, 7 protected string $email, 8 private string $apiToken, 9 ) { }10}
Then we'll create a new instance of this class and serialize it:
1$user = new User(2 name: 'Ash Allen',3 email: 'mail@ashallendesign.co.uk',4 apiToken: 'secret',5);6 7serialize($user);
This would return a serialized string:
1O:8:"App\User":3:{s:4:"name";s:9:"Ash Allen";s:8:2"\0*\0email";s:25:"mail@ashallendesign.co.uk";s:18:3"\0App\User\0apiToken";s:6:"secret";}
The string format is very similar to the serialized object we earlier. However, there's a slight difference with the names for the email
and apiToken
properties.
When PHP serializes an object, it will prefix the property name to indicate the visibility of the property. Protected properties are indicated by a *
prefix and private properties are indicated by the class name prefix. So we can see that instead of email
and apiToken
, we have \0*\0email
and \0App\User\0apiToken
(with the \0
representing a null byte).
Let's split the serialized string onto new lines to help visualise the structure:
1s:4:"name";s:9:"Ash Allen";2s:8:"\0*\0email";s:25:mail@ashallendesign.co.uk"3s:18:"\0App\User\0apiToken";
This means that by looking at a serialized object, we can determine the visibility of the properties.
Serializing Objects That Contain Other Objects
There may be times when you need to serialize an object that contains another object. We'll quickly take a look at what a serialized object that contains another object might look like.
Imagine we have a simple App\ValueObjects\Address
class:
1namespace App\ValueObjects;2 3class Address4{5 public function __construct(6 public int $number,7 public string $postalCode,8 ) { }9}
We'll then imagine that our App\User
class has an App\ValueObjects\Address
object as a property. We might want to create a new object like so and then serialize it:
1$user = new User(2 name: 'Ash Allen',3 email: 'mail@ashallendesign.co.uk',4 apiToken: 'secret',5 address: new Address('18', 'SW1A 2AA'),6);7 8serialize($user);
This would lead to a serialized string like the following:
1O:8:"App\User":4:{s:4:"name";s:9:"Ash Allen";s:5:"email";2s:25:"mail@ashallendesign.co.uk";s:8:"apiToken";s:6:3"secret";s:7:"address";O:24:"App\ValueObjects\Address":2:4{s:6:"number";i:18;s:10:"postalCode";s:8:"SW1A 2AA";}}
Let's take the contents of this object and break it down onto separate lines:
1s:4:"name";s:9:"Ash Allen";2s:5:"email";s:25:"mail@ashallendesign.co.uk";3s:8:"apiToken";s:6:"secret";4s:7:"address";O:24:"App\ValueObjects\Address":2:{5 s:6:"number";i:18;6 s:10:"postalCode";s:8:"SW1A 2AA";7}
As we can see here, the App\ValueObjects\Address
object is simply serialized as a property of the App\User
object.
Error Handling When Unserializing
It's important to handle any errors that might occur from attempting to unserialize invalid data. Depending on the invalid data you're trying to unserialize, PHP 8.3 will either emit an E_WARNING
or throw an \Exception
or \Error
.
For example, let's take this invalid serialized string that has a length of 10
for the string hello
rather than the expected 5
:
1unserialize('s:10:"hello";');
If we were to run this code in PHP 8.3, an E_WARNING
would be emitted with an error message like the following:
1Warning: unserialize(): Error at offset 2 of 13 bytes in2/www/serialization.php on line 3
To handle the warnings so we can catch them and handle them in our code, we can use the set_error_handler
function to set a custom error handler. This will allow us to catch the warnings and throw them as exceptions instead.
To do this, we'll first create a new App\Services\Serializer
class that looks like so:
1declare(strict_types=1); 2 3namespace App\Services; 4 5final readonly class Serializer 6{ 7 public function unserialize(string $serialized): mixed 8 { 9 try {10 set_error_handler(static function (11 $severity, $message, $file, $line12 ) {13 throw new \ErrorException(14 $message, 0, $severity, $file, $line15 );16 });17 18 $result = unserialize($serialized);19 } finally {20 restore_error_handler();21 }22 23 return $result;24 }25}
In this class, we've added an unserialize
method that accepts a serialized string. We're then overriding the error handler so that we can catch any warnings and throw them as exceptions. We then attempt to unserialize the data. If a warning is emitted, it will be thrown as an exception. We then restore the error handler to its original state inside the finally
block that will be run whether the deserialization is successful or not. Assuming it was successful, we then return the unserialized data.
We can then use this class to unserialize data like so:
1use App\Services\Serializer;2 3$result = (new Serializer())->unserialize(4 serialized: 's:10:"hello";'5);
Running the above code would result in an \ErrorException
being thrown with the following message:
1unserialize(): Error at offset 2 of 13 bytes
Or we could run the following code:
1use App\Services\Serializer;2 3$result = (new Serializer())->unserialize(4 serialized: 's:5:"hello";'5);
This would result in the string hello
being returned.
There is currently an RFC (https://wiki.php.net/rfc/improve_unserialize_error_handling) that is partly implemented and aims to improve the error handling when unserializing data. The RFC contains a proposal to change the behaviour of unserialize
so that it throws an \UnserializationFailedException
rather than emitting an E_WARNING
as of PHP 9.0. So if this is implemented, we shouldn't need to override the error handler to catch warnings and throw them as exceptions as we've done above.
Defining Serialization Logic in PHP
As we've already seen above, PHP provides the ability to serialize and unserialize objects by default.
However, there may be times when you want to define custom serialization logic for your objects. This could be for several reasons, such as encrypting sensitive data before serializing it, or maybe you need to perform some additional logic when unserializing an object.
Thankfully, PHP provides two magic methods that you can use to define how an object can be serialized and unserialized: __serialize
and __unserialize
.
To get an idea of how this might work, let's take a look at an example. Sticking with our App\User
class from earlier, let's say we want to encrypt the apiToken
property before serializing the object and decrypt it when unserializing the object. This might be because we're storing the serialized data in a cache or on a queue so we want to make sure the data is secure in case it's compromised.
For the purposes of this article, we're going to imagine we have two functions that we can call to encrypt and decrypt data: encrypt
and decrypt
. We don't need to worry about the implementation of these functions for now, we're just going to assume they exist. If any of you are Laravel developers, you may recognise these functions as they both ship with Laravel.
Let's update our App\User
class to include the __serialize
and __unserialize
methods and then discuss what's being done:
1declare(strict_types=1); 2 3namespace App; 4 5class User 6{ 7 public function __construct( 8 public string $name, 9 public string $email,10 public string $apiToken,11 ) { }12 13 public function __serialize(): array14 {15 return [16 'name' => $this->name,17 'email' => $this->email,18 'apiToken' => encrypt($this->apiToken),19 ];20 }21 22 public function __unserialize(array $data): void23 {24 $this->name = $data['name'];25 $this->email = $data['email'];26 $this->apiToken = decrypt($data['apiToken']);27 }28}
In the __serialize
method, we're returning an array of the properties we want to serialize. We're encrypting the apiToken
property before returning it. This means that when we call serialize
on the object, the apiToken
property will be encrypted.
Let's create a new instance of the App\User
class and serialize it:
1$user = new User(2 name: 'Ash Allen',3 email: 'mail@ashallendesign.co.uk',4 apiToken: 'secret',5);6 7$serialized = serialize($user);
The serialized string may look something like so (with the encrypted string shortened for brevity):
1O:8:"App\User":3:{s:4:"name";s:9:"Ash Allen";s:5:"email";s:225:"mail@ashallendesign.co.uk";s:8:"apiToken";s:200:3"eyJpdiI6Ikx0N3BDQwYzcwMzE1NGQy...sdfsfsfdssInRhZyI6IiJ9";}
As we can see, the apiToken
property is now encrypted and without the encryption key it's not possible to decrypt the data.
Now if we wanted to create an instance of the App\User
class from the serialized string, we could call unserialize
on the string and the __unserialize
method would be called. This __unserialize
method accepts an array of the serialized data, so we can assign each of the properties and decrypt the apiToken
property.
Testing Your Serialization Code
Just like with any other part of your application, you'll likely want to write tests for your serialization logic if you're customizing how objects are serialized and unserialized. This is a great way to ensure that your serialization code works as expected and that you can catch any bugs.
For instance, let's take our previous example we've just looked at. What would happen if we accidentally removed the decrypt()
function call from the __unserialize()
function for the apiToken
property? This would lead to us having an object with an encrypted token rather than the original unencrypted value that we expect.
Your tests can be as in-depth and strict as you'd like. Let's take a look at a simple test we could write to make sure an App\User
object can be serialized and then unserialized:
1declare(strict_types=1); 2 3namespace Tests\Feature\User; 4 5use App\User; 6use Illuminate\Foundation\Testing\TestCase; 7use PHPUnit\Framework\Attributes\Test; 8 9final class SerializeTest extends TestCase10{11 // ...12 13 #[Test]14 public function serialize_and_unserialize_works(): void15 {16 $user = new User(17 name: 'Ash Allen',18 email: 'mail@ashallendesign.co.uk',19 apiToken: '1234567890',20 );21 22 $serialized = serialize($user);23 24 $unserializedUser = unserialize($serialized);25 26 // Assert that the user we've just built is the27 // same as the one we originally serialized.28 $this->assertEquals($user, $unserializedUser);29 }30}
In the test above, we're creating an instance of App\User
, serializing it, and then unserializing it. We're then asserting that the user we've just built is the same as the one we originally serialized. This is a simple test that can give us confidence that our serialization code is working as expected.
However, if we were to remove the encrypt
and decrypt
function calls the test would still pass even though we might not be encrypting and decrypting the apiToken
property as we expect.
If you'd prefer to get a little bit more strict with your tests, you could write two more tests to ensure that the apiToken
property is being encrypted and decrypted as expected.
We're going to write tests as if they are part of a Laravel application and that the encrypt
and decrypt
functions are just resolving an instance of the Illuminate\Contracts\Encryption\Encrypter
interface from the service container using the encrypter
key. But if you're not familiar with Laravel, this doesn't matter. You just need to know that we're mocking the underlying classes that the encrypt
and decrypt
functions are calling so we can hardcode the expected encrypted value in our test and assert against it.
The first test we're going to write is to ensure that the apiToken
property is being encrypted when the user object is serialized:
1declare(strict_types=1); 2 3namespace Tests\Feature\User; 4 5use App\User; 6use Illuminate\Foundation\Testing\TestCase; 7use Mockery\MockInterface; 8use PHPUnit\Framework\Attributes\Test; 9 10final class SerializeTest extends TestCase11{12 // ...13 14 #[Test]15 public function user_object_can_be_serialized(): void16 {17 // Mock the encrypter so we can strictly test the18 // serialization of the user object.19 $this->mock('encrypter', function ($mock): void {20 $mock->shouldReceive('encrypt')21 ->once()22 ->withArgs(['1234567890', true])23 ->andReturn('encrypted');24 });25 26 $user = new User(27 name: 'Ash Allen',28 email: 'mail@ashallendesign.co.uk',29 apiToken: '1234567890',30 );31 32 $serialized = serialize($user);33 34 $expectedString = 'O:8:"App\User":3:{s:4:"name";s:'.35 '9:"Ash Allen";s:5:"email";s:25:"'.36 'mail@ashallendesign.co.uk";s:8:"apiToken";s:'.37 '9:"encrypted";}';38 39 // Assert that the serialized string is exactly40 // what we expect.41 $this->assertSame(42 expected: $expectedString,43 actual: $serialized,44 );45 }46}
In the test, we're starting by mocking the encrypter
so we can hardcode the expected encrypted value in our test. In this case, we're expecting the apiToken
property to have a value of 1234567890
and when encrypting it, we'll return the string encrypted
. We're then creating a new instance of the App\User
class and serializing it. We're then asserting that the serialized string is exactly what we expect.
We can then write another test to ensure that the apiToken
property is being decrypted when the user object is unserialized:
1declare(strict_types=1); 2 3namespace Tests\Feature\User; 4 5use App\User; 6use Illuminate\Foundation\Testing\TestCase; 7use Mockery\MockInterface; 8use PHPUnit\Framework\Attributes\Test; 9 10final class SerializeTest extends TestCase11{12 // ...13 14 #[Test]15 public function user_string_can_be_unserialized(): void16 {17 // Mock the encrypter so we can strictly test the18 // serialization of the user object.19 $this->mock('encrypter', function ($mock): void {20 $mock->shouldReceive('decrypt')21 ->once()22 ->withArgs(['encrypted', true])23 ->andReturn('1234567890');24 });25 26 $serialized = 'O:8:"App\User":3:{s:4:"name";s:9:"'.27 'Ash Allen";s:5:"email";s:25:"'.28 'mail@ashallendesign.co.uk";s:8:"apiToken";s:'.29 '9:"encrypted";}';30 31 $user = unserialize($serialized);32 33 $this->assertInstanceOf(User::class, $user);34 $this->assertSame('Ash Allen', $user->name);35 $this->assertSame(36 'mail@ashallendesign.co.uk',37 $user->email38 );39 $this->assertSame('1234567890', $user->apiToken);40 }41}
In the test above, we've mocked the encrypter
so we can hardcode the expected decrypted value in our test. We're then unserializing the serialized string and asserting that the user object is an instance of App\User
and that the properties are as we expect.
Conclusion
In this article, we've taken a look at what serialization is and how it works. We've explored how you can use PHP's built-in serialization functions to serialize and unserialize data in your PHP applications. We've also discussed how you can write tests to make sure your serialization code works as expected.
Hopefully, you should now have a good understanding of what serialization is and feel confident enough to use it in your own applications. If you have any questions or comments, feel free to leave them in the comments below.