PHP Generators - Practical Example

post-thumb

Introduction

Generators in PHP enable the construction of iterators without the necessity of creating the whole array. This significantly aids in reducing memory consumption.

What Does This Mean?

It’s common to find yourself needing to handle large amounts of data sets, such as reading a 3GB CSV file of customer data that needs to be stored in the database, or transforming data from one format to another when the total data set is significant in memory size.

Another scenario might be retrieving data from an API and storing it into an array to iterate over it later for storage. This forms the basis of the key example we’re going to discuss today

Simple example

1function xrange(int $from, int $to) {
2  for($i = $from;$i <= $to;$i++) {
3    yield $i;
4  }
5}
6
7foreach(xrange(1, 10_000_000) as $i) {
8  var_dump($i);
9}

Here, xrange won’t return 10 million items all at once. Instead, with each iteration, it will generate one number. Consequently, the memory usage will remain minimal.

Manual Iteration

You can also iterate through a Generator in a different way.

 1function domains() {
 2  yield 'google.com';
 3  yield 'facebook.com';
 4  yield 'instagram.com';
 5}
 6
 7$d = domains();
 8
 9$d->current(); // returns 'google.com'
10$d->next(); // moves the pointer to the next item and returns null
11
12$d->current(); // returns 'facebook.com';
13$d->next(); // moves the pointer to next item
14
15$d->current(); // return 'instagram.com'

The problem

I previously built a system featuring a component that allowed users to integrate their Shopify store, enabling us to pull all their data (including products, orders, and customers) to generate financial and marketing reports based on this information.

The process involved triggering three distinct background jobs to simultaneously pull product, order, and customer data, and then persist it in our MySQL database as soon as the user added their Shopify store.

Many of our customers had thousands of these data records. However, the code employed to fetch them wasn’t perfect.

Let’s examine the original implementation and explore how we could enhance it.

Fetching Products

Here, our ClientFactory is utilized to query the shop’s data.

 1class ClientFactory
 2{
 3    public function make(Shop $shop) {
 4        return new Client([
 5            'base_uri' => "https://{$shop->domain}/admin/api/2021-07/",
 6            'headers' => [
 7                'X-Shopify-Access-Token' => $shop->token,
 8                'Content-Type' => 'application/json',
 9            ],
10        ]);
11    }
12}

The most initial and straightforward implementation that might come to mind is simply using Guzzle to fetch the products and store them directly.

 1public function handle(ClientFactory $factory): void
 2{
 3    do {
 4		$response = $this->factory->make($shop)->get('products.json', ['query' => [
 5                'limit' => 250,
 6                'since_id' => $lastId ?? 0,
 7            ]]);
 8
 9        $data = json_decode($response->getBody()->getContents(), true);
10
11        $shopifyProducts = $data['products'];
12
13        if(count($shopifyProducts) == 0) break;
14        $lastId = $shopifyProducts[count($shopifyProducts) - 1]['id'];
15
16        foreach ($shopifyProducts as $shopifyProduct) {
17            Product::create(ShopifyProductMapper::map($shopifyProduct)->toArray());
18        }
19
20    } while ($lastId);
21}
API Limitations:
  • Shopify only allows a maximum of 250 products per page.
  • Pagination is achieved by sending the last product ID in the since_id parameter. This allows the retrieval of the next 250 products after the given since_id.
Let’s Break It Down
  • We initiate a do-while loop as we need to fetch all the products.
  • If the count of retrieved products is zero, we simply break out of the loop.
  • Otherwise, we set the last product ID in $lastId to be used as a next page pointer.
  • We persist the products into our Product model.

This code is not well-organized and many operations are occurring in one place. Imagine the overhead if we have to repeat the same process for Orders and Customers endpoints, along with any other data we are pulling from these stores.

We can improve this by extracting the API endpoint call to another class like a Repository and returning the products array.

ProductsAPI

 1class ProductsAPI
 2{
 3    public function __construct(private ClientFactory $factory)
 4    {
 5    }
 6
 7    public function getAllProducts(Shop $shop): array
 8    {
 9        $products = [];
10
11        do {
12            $response = $this->factory->make($shop)->get('products.json', ['query' => [
13                'limit' => 250,
14                'since_id' => $lastId ?? 0,
15            ]]);
16            $data = json_decode($response->getBody()->getContents(), true);
17            $shopifyProducts = $data['products'];
18
19            if (count($shopifyProducts) == 0) break;
20
21            $lastId = $shopifyProducts[count($shopifyProducts) - 1]['id'];
22
23            foreach ($shopifyProducts as $shopifyProduct) {
24                $products[] = ShopifyProductMapper::map($shopifyProduct);
25            }
26        } while ($lastId);
27
28        return $products;
29    }
30}

The issue here is that we’re storing all the products in memory until we’ve completed the task, at which point we return them to the caller. This approach could lead to an OOM (Out of Memory) exception when dealing with thousands of records.

So, how can we handle pagination more efficiently to avoid this?

Callable

One method, which I personally don’t find very appealing, but which is possible nonetheless, is to accept a callback function.

 1public function getAllProducts(Shop $shop, callable $callback)
 2{
 3    do {
 4        $response = $this->factory->make($shop)->get('products.json', ['query' => [
 5            'limit' => 250,
 6            'since_id' => $lastId ?? 0,
 7        ]]);
 8
 9        $data = json_decode($response->getBody()->getContents(), true);
10        $shopifyProducts = $data['products'];
11
12        if (count($shopifyProducts) == 0) break;
13
14        $lastId = $shopifyProducts[count($shopifyProducts) - 1]['id'];
15
16        $products = [];
17        foreach ($shopifyProducts as $shopifyProduct) {
18            $products[] = ShopifyProductMapper::map($shopifyProduct);
19        }
20
21        $callback($products);
22    } while ($lastId);
23}

As you can see, the method possesses a second callable parameter, which it then invokes in line 21 to provide access to the products we have obtained per request on line 18. So, the usage in the context of the Job class might be as follows:

 1public function handle(ProductsApi $api): void
 2{
 3    $shop = Shop::findOrFail($this->shopId);
 4
 5    $api->getAllProducts($shop, function (array $shopifyProducts) {
 6        foreach ($shopifyProducts as $shopifyProduct) {
 7            Product::create($shopifyProduct->toArray());
 8        }
 9    });
10}

I don’t find this approach appealing, as the Repository is unexpectedly given the responsibility to execute our callback function. This seems unusual for the repo to require.

LastId param

Another approach is to allow the caller of the repository to pass the last ID. However, this seems to conflict with the method name.

 1public function getAllProducts(Shop $shop, int $lastId): array
 2{
 3    $response = $this->factory->make($shop)->get('products.json', ['query' => [
 4        'limit' => 250,
 5        'since_id' => $lastId,
 6    ]]);
 7    $data = json_decode($response->getBody()->getContents(), true);
 8    $shopifyProducts = $data['products'];
 9
10    if (count($shopifyProducts) == 0) break;
11
12    $lastId = $shopifyProducts[count($shopifyProducts) - 1]['id'];
13    $products = [];
14    foreach ($shopifyProducts as $shopifyProduct) {
15        $products[] = ShopifyProductMapper::map($shopifyProduct);
16    }
17
18    return $products;
19}

Now, with this approach, the user of this class must know and understand how to handle the last ID. Furthermore, they have to determine when to stop the pagination process.

Generators

We can simplify this entire process by using Generators .

Our goal is to iterate over all the products after we have paginated and obtained all of them.

In the ProductsAPI step, we performed all pagination within the getAllProducts method, and that ended up using a significant amount of memory. Generators, on the other hand, will assist us in constructing the same products array, but in a more memory-optimized manner.

Let’s see how it can be deployed.

 1/**
 2 * @return Generator<Product[]>
 3 */
 4public function getAllProducts(Shop $shop): Generator
 5{
 6    do {
 7        $response = $this->factory->make($shop)->get('products.json', ['query' => [
 8            'limit' => 250,
 9            'since_id' => $lastId ?? 0,
10        ]]);
11        $data = json_decode($response->getBody()->getContents(), true);
12        $shopifyProducts = $data['products'];
13
14        if (count($shopifyProducts) == 0) break;
15
16        $lastId = $shopifyProducts[count($shopifyProducts) - 1]['id'];
17
18        yield from array_map(function (array $shopifyProduct) {
19            return ShopifyProductMapper::map($shopifyProduct);
20        }, $shopifyProducts);
21
22    } while ($lastId);
23}

As you can see on line 4, the method’s return type is Generator and then on line 18, we yield from the mapped object.

Let’s first look at how this can be utilized in practice, followed by a brief explanation of its execution process.

We can modify the code in our job as follows:

 1public function handle(ProductsApi $api): void
 2{
 3    $shop = Shop::findOrFail($this->shopId);
 4
 5    $shopifyProducts = $api->getAllProducts($shop);
 6
 7    foreach ($shopifyProducts as $shopifyProduct) {
 8        Product::create($shopifyProduct->toArray());
 9    }
10}

So, when line 5 is executed, nothing within getAllProducts will run.

If you were to perform a die & dump operation on $shopifyProducts, it would display the following:

 1Generator {#2531
 2  this: App\Modules\Shopify\ProductsApi {#2500 …}
 3  trace: {
 4    ./app/Modules/Shopify/ProductsApi.php:18 {
 5      App\Modules\Shopify\ProductsApi->getAllProducts(Shop $shop): Generator
 6public function getAllProducts(Shop $shop): Generator
 7      › {
 8do {
 9    }
10    App\Modules\Shopify\ProductsApi->getAllProducts() {}
11  }
12  closed: false
13}

It’s not immediately apparent or easy to understand, but it’s something that PHP can certainly interpret. You can tell it’s a reference to our method and its associated logic.

When Does It Get Executed?

Generators can be invoked in various ways:

  • Within a foreach loop
  • Using ->current and ->next

So, the generic object we have with $shopifyProducts is equipped with some methods we can utilize.

1$shopifyProducts->current()

Calling ->current() will trigger the first API call to Shopify, returning the first product in the initial array of 250 products.

If you invoke the current() method multiple times, it will always return the same first product.

To retrieve the next set of 250 products, you have to invoke ->next 251 times. Upon the 251st call, it will send the second request, and current will then point to the 251st product.

You then continue to call next until you have finished dealing with the 500th product, and so forth.

End result

Thanks to generators, we now have a simple loop that can list all products, customers, and orders.

1$shop = Shop::findOrFail($this->shopId);
2$shopifyProducts = $api->getAllProducts($shop);
3
4foreach ($shopifyProducts as $shopifyProduct) {
5    Product::create($shopifyProduct->toArray());
6}

There are several aspects we didn’t delve into, such as handling API exceptions and optimizing our database insertions. However, these sections were intentionally omitted to keep the article concise. If you’re interested in exploring these topics further, let me know in the comments.

Conclusion

Together, we have explored what a Generator is, its basic usage, and a practical example of its implementation.

You can use Generators to optimize processes such as importing or exporting files such as CSVs or log files, and other scenarios to optimize your memory usage.

If you have previously used Generators, I’d love to hear how you benefited from them. Share your experiences with us!

Happy Coding!

comments powered by Disqus

You May Also Like