Laravel Form Requests
In Laravel, validating forms is key to maintaining the safety and accuracy of user data. The FormRequest component simplifies this by keeping the …
ReadGenerators in PHP enable the construction of iterators without the necessity of creating the whole array. This significantly aids in reducing memory consumption.
It’s common to find yourself needing to handle large amounts of data sets, such as reading a 3GB CSV
file of customer data that needs to be stored in the database, or transforming data from one format to another when the total data set is significant in memory size.
Another scenario might be retrieving data from an API and storing it into an array to iterate over it later for storage. This forms the basis of the key example we’re going to discuss today
1function xrange(int $from, int $to) {
2 for($i = $from;$i <= $to;$i++) {
3 yield $i;
4 }
5}
6
7foreach(xrange(1, 10_000_000) as $i) {
8 var_dump($i);
9}
Here, xrange
won’t return 10 million items all at once. Instead, with each iteration, it will generate one number. Consequently, the memory usage will remain minimal.
You can also iterate through a Generator in a different way.
1function domains() {
2 yield 'google.com';
3 yield 'facebook.com';
4 yield 'instagram.com';
5}
6
7$d = domains();
8
9$d->current(); // returns 'google.com'
10$d->next(); // moves the pointer to the next item and returns null
11
12$d->current(); // returns 'facebook.com';
13$d->next(); // moves the pointer to next item
14
15$d->current(); // return 'instagram.com'
I previously built a system featuring a component that allowed users to integrate their Shopify store, enabling us to pull all their data (including products, orders, and customers) to generate financial and marketing reports based on this information.
The process involved triggering three distinct background jobs to simultaneously pull product, order, and customer data, and then persist it in our MySQL database as soon as the user added their Shopify store.
Many of our customers had thousands of these data records. However, the code employed to fetch them wasn’t perfect.
Let’s examine the original implementation and explore how we could enhance it.
Here, our ClientFactory is utilized to query the shop’s data.
1class ClientFactory
2{
3 public function make(Shop $shop) {
4 return new Client([
5 'base_uri' => "https://{$shop->domain}/admin/api/2021-07/",
6 'headers' => [
7 'X-Shopify-Access-Token' => $shop->token,
8 'Content-Type' => 'application/json',
9 ],
10 ]);
11 }
12}
The most initial and straightforward implementation that might come to mind is simply using Guzzle to fetch the products and store them directly.
1public function handle(ClientFactory $factory): void
2{
3 do {
4 $response = $this->factory->make($shop)->get('products.json', ['query' => [
5 'limit' => 250,
6 'since_id' => $lastId ?? 0,
7 ]]);
8
9 $data = json_decode($response->getBody()->getContents(), true);
10
11 $shopifyProducts = $data['products'];
12
13 if(count($shopifyProducts) == 0) break;
14 $lastId = $shopifyProducts[count($shopifyProducts) - 1]['id'];
15
16 foreach ($shopifyProducts as $shopifyProduct) {
17 Product::create(ShopifyProductMapper::map($shopifyProduct)->toArray());
18 }
19
20 } while ($lastId);
21}
since_id
parameter. This allows the retrieval of the next 250 products after the given since_id
.$lastId
to be used as a next page pointer.This code is not well-organized and many operations are occurring in one place. Imagine the overhead if we have to repeat the same process for Orders and Customers endpoints, along with any other data we are pulling from these stores.
We can improve this by extracting the API endpoint call to another class like a Repository and returning the products array.
1class ProductsAPI
2{
3 public function __construct(private ClientFactory $factory)
4 {
5 }
6
7 public function getAllProducts(Shop $shop): array
8 {
9 $products = [];
10
11 do {
12 $response = $this->factory->make($shop)->get('products.json', ['query' => [
13 'limit' => 250,
14 'since_id' => $lastId ?? 0,
15 ]]);
16 $data = json_decode($response->getBody()->getContents(), true);
17 $shopifyProducts = $data['products'];
18
19 if (count($shopifyProducts) == 0) break;
20
21 $lastId = $shopifyProducts[count($shopifyProducts) - 1]['id'];
22
23 foreach ($shopifyProducts as $shopifyProduct) {
24 $products[] = ShopifyProductMapper::map($shopifyProduct);
25 }
26 } while ($lastId);
27
28 return $products;
29 }
30}
The issue here is that we’re storing all the products in memory until we’ve completed the task, at which point we return them to the caller. This approach could lead to an OOM (Out of Memory) exception when dealing with thousands of records.
So, how can we handle pagination more efficiently to avoid this?
One method, which I personally don’t find very appealing, but which is possible nonetheless, is to accept a callback function.
1public function getAllProducts(Shop $shop, callable $callback)
2{
3 do {
4 $response = $this->factory->make($shop)->get('products.json', ['query' => [
5 'limit' => 250,
6 'since_id' => $lastId ?? 0,
7 ]]);
8
9 $data = json_decode($response->getBody()->getContents(), true);
10 $shopifyProducts = $data['products'];
11
12 if (count($shopifyProducts) == 0) break;
13
14 $lastId = $shopifyProducts[count($shopifyProducts) - 1]['id'];
15
16 $products = [];
17 foreach ($shopifyProducts as $shopifyProduct) {
18 $products[] = ShopifyProductMapper::map($shopifyProduct);
19 }
20
21 $callback($products);
22 } while ($lastId);
23}
As you can see, the method possesses a second callable parameter, which it then invokes in line 21 to provide access to the products we have obtained per request on line 18. So, the usage in the context of the Job class might be as follows:
1public function handle(ProductsApi $api): void
2{
3 $shop = Shop::findOrFail($this->shopId);
4
5 $api->getAllProducts($shop, function (array $shopifyProducts) {
6 foreach ($shopifyProducts as $shopifyProduct) {
7 Product::create($shopifyProduct->toArray());
8 }
9 });
10}
I don’t find this approach appealing, as the Repository is unexpectedly given the responsibility to execute our callback function. This seems unusual for the repo to require.
Another approach is to allow the caller of the repository to pass the last ID. However, this seems to conflict with the method name.
1public function getAllProducts(Shop $shop, int $lastId): array
2{
3 $response = $this->factory->make($shop)->get('products.json', ['query' => [
4 'limit' => 250,
5 'since_id' => $lastId,
6 ]]);
7 $data = json_decode($response->getBody()->getContents(), true);
8 $shopifyProducts = $data['products'];
9
10 if (count($shopifyProducts) == 0) break;
11
12 $lastId = $shopifyProducts[count($shopifyProducts) - 1]['id'];
13 $products = [];
14 foreach ($shopifyProducts as $shopifyProduct) {
15 $products[] = ShopifyProductMapper::map($shopifyProduct);
16 }
17
18 return $products;
19}
Now, with this approach, the user of this class must know and understand how to handle the last ID. Furthermore, they have to determine when to stop the pagination process.
We can simplify this entire process by using Generators .
Our goal is to iterate over all the products after we have paginated and obtained all of them.
In the ProductsAPI step, we performed all pagination within the getAllProducts
method, and that ended up using a significant amount of memory. Generators, on the other hand, will assist us in constructing the same products array, but in a more memory-optimized manner.
Let’s see how it can be deployed.
1/**
2 * @return Generator<Product[]>
3 */
4public function getAllProducts(Shop $shop): Generator
5{
6 do {
7 $response = $this->factory->make($shop)->get('products.json', ['query' => [
8 'limit' => 250,
9 'since_id' => $lastId ?? 0,
10 ]]);
11 $data = json_decode($response->getBody()->getContents(), true);
12 $shopifyProducts = $data['products'];
13
14 if (count($shopifyProducts) == 0) break;
15
16 $lastId = $shopifyProducts[count($shopifyProducts) - 1]['id'];
17
18 yield from array_map(function (array $shopifyProduct) {
19 return ShopifyProductMapper::map($shopifyProduct);
20 }, $shopifyProducts);
21
22 } while ($lastId);
23}
As you can see on line 4, the method’s return type is Generator
and then on line 18, we yield from
the mapped object.
Let’s first look at how this can be utilized in practice, followed by a brief explanation of its execution process.
We can modify the code in our job as follows:
1public function handle(ProductsApi $api): void
2{
3 $shop = Shop::findOrFail($this->shopId);
4
5 $shopifyProducts = $api->getAllProducts($shop);
6
7 foreach ($shopifyProducts as $shopifyProduct) {
8 Product::create($shopifyProduct->toArray());
9 }
10}
So, when line 5 is executed, nothing within getAllProducts
will run.
If you were to perform a die & dump operation on $shopifyProducts
, it would display the following:
1Generator {#2531
2 this: App\Modules\Shopify\ProductsApi {#2500 …}
3 trace: {
4 ./app/Modules/Shopify/ProductsApi.php:18 {
5 App\Modules\Shopify\ProductsApi->getAllProducts(Shop $shop): Generator
6 › public function getAllProducts(Shop $shop): Generator
7 › {
8 › do {
9 }
10 App\Modules\Shopify\ProductsApi->getAllProducts() {}
11 }
12 closed: false
13}
It’s not immediately apparent or easy to understand, but it’s something that PHP can certainly interpret. You can tell it’s a reference to our method and its associated logic.
Generators can be invoked in various ways:
->current
and ->next
So, the generic object we have with $shopifyProducts
is equipped with some methods we can utilize.
1$shopifyProducts->current()
Calling ->current()
will trigger the first API call to Shopify, returning the first product in the initial array of 250 products.
If you invoke the current()
method multiple times, it will always return the same first product.
To retrieve the next set of 250 products, you have to invoke ->next
251 times. Upon the 251st call, it will send the second request, and current
will then point to the 251st product.
You then continue to call next
until you have finished dealing with the 500th product, and so forth.
Thanks to generators, we now have a simple loop that can list all products, customers, and orders.
1$shop = Shop::findOrFail($this->shopId);
2$shopifyProducts = $api->getAllProducts($shop);
3
4foreach ($shopifyProducts as $shopifyProduct) {
5 Product::create($shopifyProduct->toArray());
6}
There are several aspects we didn’t delve into, such as handling API exceptions and optimizing our database insertions. However, these sections were intentionally omitted to keep the article concise. If you’re interested in exploring these topics further, let me know in the comments.
Together, we have explored what a Generator is, its basic usage, and a practical example of its implementation.
You can use Generators to optimize processes such as importing or exporting files such as CSVs or log files, and other scenarios to optimize your memory usage.
If you have previously used Generators, I’d love to hear how you benefited from them. Share your experiences with us!
Happy Coding!
In Laravel, validating forms is key to maintaining the safety and accuracy of user data. The FormRequest component simplifies this by keeping the …
ReadHey! So Livewire 3 and Volt are out. I’ve been playing around with them and they’re pretty cool. My favorite part? Definitely the long …
Read