Reads causing writes in Postgres

2024-12-21

It is good practice to regularly inspect the statements running in the hot path of your Postgres instance. One way to do this is to examine the pg_stat_statements view, which shows various statistics about the SQL statements executed by the Postgres server.

Among many other interesting columns, you will find the shared_blks_dirtied and shared_blks_written columns. They keep track of how many blocks, Postgres' smallest unit of I/O, in the shared buffer cache are modified by a statement and (ultimately) written to disk.

One observation you may make when inspecting the view is that SELECT queries show non-zero values in the shared_blks_written and shared_blks_dirtied columns. This seems counterintuitive at first as it indicates that a simple SELECT query, which you would expect to only read data, can in fact cause data to be written (keeping track of the statement statistics themselves is not what causes these writes).

This article discusses two mechanisms, setting hint bits for transaction state tracking and page pruning, that can cause writes by read-only statements. Before we can go into detail about these mechanisms themselves, however, we first have to understand how Postgres organises table data and manages concurrency.

Multi-version concurrency control

Postgres allows multiple clients to read and write data concurrently using multi-version concurrency control (MVCC). As the name suggests, it uses multiple versions of the same logical table row to ensure that clients always have a consistent (albeit potentially different) view of data - even if other clients make changes to the same data at the same time.

One client will only ever see at most one such row version (although which one exactly may change throughout the lifetime of a transaction). Different clients interacting with the same table row concurrently may see different versions of that row.

There are many possible side-effects that result from supporting concurrency in a database. For example, it can happen that in one and the same transaction reading the same row multiple times may show different data (i.e. row versions) - even if the row was not modified by the reading transaction. This phenomenon, known as non-repeatable read, can happen if another client's transaction changed and committed that row in the meantime.

Depending on the requirements, clients may wish to use a stricter level of isolation from other transactions to reduce (or even entirely eliminate) such concurrency effects. Of course, stricter transaction isolation levels have performance implications that should be considered carefully.

There are many more interesting concurrency effects, but more details are beyond the scope of this article. We will continue to focus on how row versions work in Postgres.

Tables, pages and tuples

To make this article easier to follow, we'll use a small example.^[1] Let's say we have a library of books at home and we occasionally lend the books to friends. Even though we sadly only have one book currently in our collection, we keep track of it in a books table.^[2]

CREATE TABLE books (
	id integer PRIMARY KEY,
	title char(800) NOT NULL,
	author char(600) NOT NULL,
	lent_to char(400)
);
INSERT INTO books VALUES (1, 'Designing Data-Intensive Applications', 'Martin Kleppmann', 'Alice');
SELECT * FROM books;

 id |                 title                  |      author       | lent_to
----+----------------------------------------+-------------------+---------
  1 | Designing Data-Intensive Applications  | Martin Kleppmann  | Alice

How is this data physically represented in Postgres?

On disk, the data for a table is separated into three different kinds of files^[3], also known as forks:

the main fork, storing the main table data
the free space map (FSM), keeping track of available space in the table
the visibility map (VM), holding information about pages with row versions visible to all active transactions (more later)

Files in the main fork are logically split into blocks. They are, as mentioned in the introduction, the smallest unit of data Postgres will read from or write to disk. A block is also often referred to as a page, even though technically they are not exactly the same.

I will not go into more details on the file structure, for the purposes in this post it is sufficient to consider a table as a collection of so-called heap pages^[4]. A heap page is a container that can hold one or more row versions, often also referred to as heap tuples or simply items.

Pages always have a fixed size, usually 8192 bytes, regardless of how many tuples they currently contain. To avoid costly (de-)serialisation, there is no difference in how pages are stored on disk or in memory.

HEAP PAGE LAYOUT

Byte                                                          |---lp_len---|
0       24   28   32   lower                    upper        lp_off                 8192
+--------+----+----+----+-----------//-----------+------------+------------+---------+
|        |    |    |    |->                    <-|            |            |         |
| PAGE   | It.| It.| It.|         FREE           |            |            |         |
|        | ID | ID | ID |                        |   Item 3   |   Item 2   |  Item 1 |
| HEADER | 1  | 2  | 3  |         SPACE          |            |            |         |
|        |    |    |    |->                    <-|            |            |         |
+--------+----+----+----+-----------//-----------+------------+------------+---------+
            |    |    |                          ^            ^            ^
            |    |    +--------------------------+            |            |
            |    +--------------------------------------------+            |
            +--------------------------------------------------------------+

The diagram above illustrates the memory layout of a heap page. The first 24 bytes are occupied by the page header, followed by item identifiers - each 4 bytes long - which store an offset to the start of the actual item, its length and state.

Between the item identifiers and the items themselves is unallocated space that can be used for new data. New items are allocated from the bottom (right in the diagram above) of the page upwards, while new identifiers are added from the top.

Indexes are also organised in pages, but they have an additional area called "special space" allocated at the very end of the page for index-specific data. There is no special space allocated for heap pages, however.

Items in index or heap pages need to point to items in other heap pages. Such a reference, known as a CTID, always points to a page and the number of the item identifiers, e.g. (0, 3). The referenced item identifier in turn keeps track of the offset to the actual item within the page as shown above. This indirection allows moving items around inside the page without breaking outside pointers and therefore requires the identifiers to not be moved until freed. We will encounter this in more detail later.

Inspecting pages

To see how our books table is represented in pages and tuples we need some special tools. Thankfully, Postgres offers many functions to inspect its inner workings.

We first need to get access to several such functions from the pageinspect module. The page_header function converts the raw page data returned by get_raw_page into a human readable format. We only look at header fields most relevant for our exploration:

CREATE EXTENSION pageinspect;

SELECT lower, upper, special, pagesize, prune_xid FROM page_header(get_raw_page('books', 0));

 lower | upper | special | pagesize | prune_xid
-------+-------+---------+----------+-----------
    28 |  6352 |    8192 |     8192 |         0

The first four values lower, upper, special, pagesize are byte offsets within the page. We see that the page has a size of 8kB and since this is a heap page there is no special space allocated (it points to the very end of the page). Also, there is currently only a single item identifier present since lower is at byte 28 (as discussed above, a single identifier is 4 bytes and the header itself is 24 bytes).

We can infer two more pieces of information from the upper value: the item size is 1840 bytes (pagesize - upper) and the page therefore has 6504 bytes (upper - lower) of unallocated space.

The value in prune_xid refers to a transaction ID (or none if 0 as in our case), more details on that later.

To inspect the tuples in the page, we can use the heap_page_items function. I again chose only a subset of the fields for clarity:

SELECT
    lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_ctid, t_infomask, t_infomask2, t_hoff, t_data
FROM heap_page_items(get_raw_page('books', 0));

 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | t_infomask | t_infomask2 | t_hoff | t_data
----+--------+----------+--------+--------+--------+--------+------------+-------------+--------+--------
  1 |   6352 |        1 |   1840 |   1148 |      0 | (0,1)  |       2306 |           4 |     24 | .....

Confirming our observation from the page header, so far there is only a single row version in the page.

The lp prefix of the first four columns stands for "line pointer" and corresponds to the data of the item identifier. The first lp column represents the item number in the current page, lp_off the offset in bytes where the actual item starts within the page, lp_flags the status of the item and lp_len its length in bytes (see the page layout illustration above).

Since there is only a single item in the page, the lp_off value corresponds to the upper value of the page header. We can now also confirm what we already derived from the page header: the item is 1840 bytes long^[5].

The t_hoff value is an offset from the start of the item to the start of user data, which is stored in the t_data column. I removed the actual content in the output above as it is not relevant for the purposes of our investigation^[6].

The remaining header columns t_xmin, t_xmax, t_ctid, t_infomask are the most interesting as they play an important role when rows are modified and to determine which, if any, row version to return to the client. We will discuss them in more detail next.

Row updates

Let's say Alice finished reading our book and since it's so good we immediately lend it out again, this time to Bob. We update our table:

UPDATE books SET lent_to = 'Bob' where id = 1;

How is this update reflected under the hood?

SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_infomask, t_infomask2
FROM heap_page_items(get_raw_page('books', 0));

 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_infomask | t_infomask2
----+--------+----------+--------+--------+--------+------------+-------------
  1 |   6352 |        1 |   1840 |   1148 |   1149 |        258 |       16388
  2 |   4512 |        1 |   1840 |   1149 |      0 |      10242 |       32772

With MVCC, the previous tuple cannot be updated in place by changing t_data. Instead a second tuple, a new row version, is added for the update. Data that has not changed between versions is also present in the new tuple.^[7] As discussed earlier, the second tuple is added above the first one, shown by the lower lp_off value. Both tuples are in the normal state (lp_flags) and have the same size (lp_len).

The interesting aspects are the t_xmin and t_xmax values. They respectively indicate which transaction IDs inserted and deleted a particular row version.

Transaction IDs form the basis for Postgres' MVCC mechanism and, to simplify, you can think of them as points in time at which data was modified. Using them as values for t_xmin and t_xmax therefore defines time spans in which particular row versions are visible^[8]. This enables consistent views of the same data despite concurrent updates.

A t_xmax value of 0 is a special value indicating that the tuple has not been deleted and is the latest version of the row. As shown in the example, in simplified terms, a row update can be considered as deleting the previous version (setting t_xmax) and inserting a new one (with the same ID for t_xmin).

However, the t_xmin and t_xmax values by themselves are not enough to determine which row version is visible in a given situation. It is not clear from the IDs alone whether or not the transaction was actually committed. Making row versions from uncommitted transactions visible, a "dirty read", is not allowed in Postgres, regardless of the transaction isolation level.

This is where the t_infomask and t_infomask2 columns come in. They store extra information that help determine the state of the inserting / deleting transaction. We can use the heap_tuple_infomask_flags function to decode their meaning:

SELECT lp, t_xmin, t_xmax, t_infomask, t_infomask2, raw_flags
FROM heap_page_items(get_raw_page('books', 0)),
LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2);

 lp | t_xmin | t_xmax | t_infomask | t_infomask2 |       raw_flags
----+--------+--------+------------+-------------+-----------------------
  1 |   1148 |   1149 |        258 |       16388 | {HEAP_XMIN_COMMITTED}
  2 |   1149 |      0 |      10242 |       32772 | {HEAP_XMAX_INVALID}

I removed some of the raw_flags values which are not relevant right now. We can see the HEAP_XMIN_COMMITTED flag for the first row version, indicating that the transaction that inserted that tuple was committed and HEAP_XMAX_INVALID for the second row version, indicating that the 0 value should be ignored (because it is not a transaction ID).

This raises the question: why aren't we seeing a HEAP_XMAX_COMMITTED flag for the first tuple and similarly a HEAP_XMIN_COMMITTED flag for the second tuple? After all, the update we performed earlier is already committed^[9].

The reason for this state is that when a new row version is added and/or a previous one modified, the hint bits for the transaction state cannot be updated at the same time since it is not clear if the transaction making the change will commit or not.

There is also no record-keeping of which pages were modified by a transaction (it would be expensive to do so) and so once a transaction is committed this event is not reflected in the affected pages. Instead, Postgres keeps track of commit status of all transactions in a separate commit log (in the pg_xact directory).

Hint bit updates

As we saw in the last example, this leaves row versions that were modified by a previous transaction in an ambiguous state: is a particular tuple visible or not? Put differently: was the transaction that made a change to the tuple committed or not? The server has to consult the commit log to answer this question.

Once answered, any statement accessing tuples in such a state sets the hint bits in the infomask on the relevant tuple headers to record the answer - you see where this is going. This way the ambiguity about a tuple's visibility is resolved for subsequent statements which avoids the extra check of the commit log.

Recording this answer can cause a write by a simple SELECT query.

We can verify this easily in our example, simply by reading from our table now:

EXPLAIN (ANALYZE, BUFFERS, COSTS OFF) SELECT * FROM books;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on books (actual time=0.015..0.016 rows=1 loops=1)
   Buffers: shared hit=1 dirtied=1
 Planning:
   Buffers: shared hit=11
 Planning Time: 0.369 ms
 Execution Time: 0.033 ms

I used EXPLAIN with a couple of options to show the execution plan of that statement. The ANALYZE option makes the statement get executed and BUFFERS shows how it interacted with the internal cache where the page for our table is kept (COSTS OFF to make the output easier to read).

By SELECTing the entire table, we of course had to read from the single page stored in shared buffers (hit=1), and, crucially, while doing so we "dirtied" the page which means that our statement modified the page.^[10] A dirtied page means that it will have to be written to disk sooner or later (either when the background writer processes it or when it gets evicted from the shared buffers cache).

Checking the tuples in the page again to verify the hint bits were updated:

 lp | t_xmin | t_xmax | t_infomask | t_infomask2 |                raw_flags
----+--------+--------+------------+-------------+-------------------------------------------
  1 |   1148 |   1149 |       1282 |       16388 | {HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED}
  2 |   1149 |      0 |      10498 |       32772 | {HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID}

Now the flags show that the transaction updating our row was committed.

Another SELECT does not dirty the page again:

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on books (actual time=0.028..0.030 rows=1 loops=1)
   Buffers: shared hit=1
 Planning Time: 0.591 ms
 Execution Time: 0.051 ms

We covered a lot of ground now and explored the first of the two mechanisms that can cause writes by read queries. Let's move on to the second: page pruning.

Page pruning

As seen above, updates (or deletes) of a row will leave old row versions in the page - even if they are no longer visible to any transaction.

Such outdated dead tuples must be garbage collected as the database would otherwise need an ever-increasing amount of storage. Compared to the heavy VACUUM, page pruning is a much more lightweight process as it only deletes outdated row versions within a single heap page.

Indexes may still reference the outdated tuples. This is not an issue though as the item pointers themselves are not removed, only the actual items in the page. We'll see this in more detail shortly.

The pruning process can, in principle, be triggered by any statement accessing a heap page. But even though it is a fairly lightweight process, it would still be expensive to attempt pruning every single time a page is accessed.

Instead, it is only triggered when available space in a page is limited and pruning is likely to be useful. When a table row is updated the oldest unpruned transaction ID of any of the page tuples' x_max field is stored in the page header as a hint field (prune_xid).

A subsequent statement accessing the page uses this hint to check if pruning is likely to be useful at all. Then, if there is less than 10% or less than fillfactor percent of unallocated space in a page, pruning will kick in and try to make room for a new tuple.

Let's continue with our small library example and see that process directly in the database. To start from a clean slate, we truncate the table (the page and with it all previous tuples backing the table are wiped). After the insert, we update the row three times so that the page is almost completely full:

TRUNCATE books;

INSERT INTO books VALUES (1, 'Designing Data-Intensive Applications', 'Martin Kleppmann', 'Alice');
UPDATE books SET lent_to = 'Bob' where id = 1;
UPDATE books SET lent_to = 'Charlie' where id = 1;
UPDATE books SET lent_to = 'Doris' where id = 1;

SELECT flags, lower, upper, special, pagesize, prune_xid
FROM page_header(get_raw_page('books', 0));

 flags | lower | upper | special | pagesize | prune_xid
-------+-------+-------+---------+----------+-----------
     0 |    40 |   832 |    8192 |     8192 |      1152

The page has no special flags set, it only has 792 bytes of free space remaining (upper - lower) and prune_xid is set to transaction ID 1152 - providing the hint that there are unpruned tuples in the page.

Let's see the page items:

SELECT
    lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_ctid, t_infomask, t_infomask2
FROM heap_page_items(get_raw_page('books', 0));

 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | t_infomask | t_infomask2
----+--------+----------+--------+--------+--------+--------+------------+-------------
  1 |   6352 |        1 |   1840 |   1151 |   1152 | (0,2)  |       1282 |       16388
  2 |   4512 |        1 |   1840 |   1152 |   1153 | (0,3)  |       9474 |       49156
  3 |   2672 |        1 |   1840 |   1153 |   1154 | (0,4)  |       8450 |       49156
  4 |    832 |        1 |   1840 |   1154 |      0 | (0,4)  |      10242 |       32772

There are four tuples, corresponding to the initial insert and the three updates of course. As we already saw earlier, every updating transaction set its ID as the t_xmax on the previous row version rendering it invisible to any newer transactions. The lp_flags are all 1 indicating normal, in use, tuples. We'll discuss t_ctid in more detail in a bit.

Since there is less than 10% of free space remaining in our page (fillfactor is at 100 percent per default, so it has no effect) and prune_xid is non-zero, the next access to the page will trigger the pruning mechanism:

EXPLAIN (ANALYZE, BUFFERS, COSTS OFF) SELECT * FROM books;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on books (actual time=0.037..0.038 rows=1 loops=1)
   Buffers: shared hit=1 dirtied=1
 Planning:
   Buffers: shared hit=11
 Planning Time: 0.292 ms
 Execution Time: 0.047 ms

The query plan again shows that the page was dirtied by a read-only query.

Let's verify that this time the pruning mechanism modified our page.

 flags | lower | upper | special | pagesize | prune_xid
-------+-------+-------+---------+----------+-----------
     1 |    40 |  6352 |    8192 |     8192 |         0

First, in the page header we can see that the prune_xid was reset to 0. The flags 1 indicate that the page has free lines. The number of pointers has not changed (lower is still 40 bytes), but we now have 6312 bytes available again in the page.

Looking at the page items once more, we can make a few observations.

 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | t_infomask | t_infomask2
----+--------+----------+--------+--------+--------+--------+------------+-------------
  1 |      4 |        2 |      0 |        |        |        |            |
  2 |      0 |        0 |      0 |        |        |        |            |
  3 |      0 |        0 |      0 |        |        |        |            |
  4 |   6352 |        1 |   1840 |   1154 |      0 | (0,4)  |      10498 |       32772

Firstly, we still see the four line pointers. Only the last update (t_ctid (0,4)) remains as a normal (lp_flags 1) tuple. It was moved to the bottom of the page (lp_off) so that available space is not fragmented. The two lines in the middle are unused (lp_flags 0) and can immediately be reused for the next tuple.

Line 1 looks different, however. The flag is set to redirect and lp_off is used as a link to the redirected line number. But why redirect at all?

I briefly touched on this earlier: Tuple 1, being the original row version, is still referenced from the books_pkey index which was automatically created as it backs the primary key constraint on the id column. As mentioned above, heap pruning only touches a single heap page and therefore does not modify the index. Of course, the reference must not be broken and so line pointer 1 has to remain where it is and since it cannot point to the now deleted original row version, it redirects to the latest version of the original tuple.

We discussed now what we set out to do: investigate two mechanisms that can cause writes for simple read queries.

However, ending the article after this last example would feel unsatisfying, as you may ask: why was only the original row version included in the index and not the later versions? Put differently again, why aren't line pointers 2 and 3 redirecting to item 4?

Let's briefly discuss one last topic to answer this question.

HOT updates

Postgres can use an optimisation called heap-only tuples (HOT) to record certain row updates only on the heap, i.e. in the main table data, and avoid having to update indexes as well.^[11] HOT updates are only possible if the update does not change any columns that are indexed (otherwise the index would be incorrect) and if there is enough free space in the same page containing the previous row version (as in our example).

Dead tuples can be reclaimed more easily this way as they would otherwise only be removed by a more expensive VACUUM.

Continuing our example with two more updates, the previously deallocated space gets filled again:

UPDATE books SET lent_to = 'Emily' where id = 1;
UPDATE books SET lent_to = 'Franz' where id = 1;

This time, we again want to look at raw_flags of the tuple headers as they provide additional context (I removed the flags related to commit status for clarity in the output below):

SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_ctid, t_infomask, t_infomask2, raw_flags
FROM heap_page_items(get_raw_page('books', 0)),
LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2);

 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid | t_infomask | t_infomask2 |                    raw_flags
----+--------+----------+--------+--------+--------+--------+------------+-------------+------------------------------------------------
  1 |      4 |        2 |      0 |        |        |        |            |             |
  2 |   4512 |        1 |   1840 |   1155 |   1156 | (0,3)  |       8450 |       49156 | {HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}
  3 |   2672 |        1 |   1840 |   1156 |      0 | (0,3)  |      10242 |       32772 | {HEAP_UPDATED,HEAP_ONLY_TUPLE}
  4 |   6352 |        1 |   1840 |   1154 |   1155 | (0,2)  |       9474 |       49156 | {HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}

All three normal line pointers have the HEAP_ONLY_TUPLE, signalling the tuple is not referenced in an index. As already discussed, line pointer 1 redirects to pointer 4. Line pointer 4 has the HEAP_HOT_UPDATED flag set, indicating that there is another tuple with a newer version - the t_ctid column links to the next version (line pointer 2). The same logic applies to row version 2 linking to item 3. Item 3 does not have the HEAP_HOT_UPDATED flag set and has t_ctid pointing to itself, ending the HOT update chain and signalling that it is the latest version.

While this optimisation can save updating indexes, the downside is that this chain must be followed to find the latest visible row version.

If there is not enough space for the update in the page containing the current row version, then the new version has to be inserted in the next page and into indexes.

Wrap up

In this article I tried to provide some details about two mechanisms that can cause writes for simple read queries and some of the implications of Postgres' multi-version concurrency control.

The next time you encounter dirtied buffers in the output of your query plan, or see positive values for written or dirtied blocks in the pg_stat_statements view for read-only queries, I hope this article convinced you that these are indications of normal and useful behaviour of your Postgres server.

If you want to learn more about the inner workings of Postgres, reading more of the documentation is always a good starting point. I can also highly recommend PostgreSQL 14 Internals by Egor Rogov. Most of the details described there are still relevant and applicable to PostgreSQL 17.

Discussions on reddit and hacker news.

The examples and the logic of this article are based on PostgreSQL 17. ↩
You do not want to use the column types in my example for a real table. I chose them in a way to make the examples in the post simpler, in particular to have predictable (and large) tuple sizes. Have a look here why you shouldn't use char(n). ↩
There are some differences in the file structure for other relations, see the docs for more details. ↩
The heap is the main data area for tables. Pages are also used to store other types of data, such as indexes where there can be different kinds of pages. If not otherwise stated, I am referring to heap pages that store table data. ↩
The tuple size of 1840 bytes are made up as follows:
- The tuple header is 23 bytes long but is padded with 1 extra byte as it must be MAXALIGN-bytes aligned, i.e. 8 bytes on most machines.
- The id column of type integer is 4 bytes long
- The three char columns add up to 1800 bytes - note though that the defined length is the character limit and not necessarily the same as the size in bytes. In the examples I showed these two are the same because the columns only contain ASCII characters which are encoded with one byte. If you add a character that requires more than one byte to store, the tuple size will increase. As noted before though, don't use char(n) as a column type.
- Each of the three char have an overhead of 4 extra bytes, totalling 1812 bytes
Note that generally the column order matters for the amount of space a tuple requires, as types that are not a multiple of MAXALIGN may have to be padded for alignment.

For example: a table with an int (4 bytes) column followed by a bigint (8 bytes) column will require 40 bytes (24 header, 4 int, 4 padding for 8-byte alignment, 8 bigint) per tuple, whereas a table with the two columns swapped will only require 36 bytes (24 header, 8 bigint, 4 int). As a general rule of thumb, order your columns so that the ones with the largest types come first. ↩
The user data is stored there in binary format and may even be outsourced to other tables for some columns (see TOAST). To make sense of the binary user data you also need the table's column metadata stored in pg_attribute. ↩
There is a small caveat to this: for unchanged columns that are stored out-of-line in a separate TOAST table, the data is preserved as-is. But that does not change the fact that an UPDATE will produce a new row version. ↩
The exact tuple visibility rules are quite complex. ↩
From the docs:

PostgreSQL actually treats every SQL statement as being executed within a transaction. If you do not issue a BEGIN command, then each individual statement has an implicit BEGIN and (if successful) COMMIT wrapped around it.
↩
If you follow this example in your own Postgres instance: you may not see any dirtied pages in the query plan if you perform the steps quickly after another. This is because the pages are already dirty from the previous update. To see the same output of the query plan I showed, you need to create a CHECKPOINT after the update which flushes the current changes to disk. You can verify this by examining the relevant pages in the shared buffer with SELECT * FROM pg_buffercache ca LEFT JOIN pg_class cls ON ca.relfilenode = cls.relfilenode WHERE cls.relname = 'books'; ↩
Summarising indexes still need to be updated, see the docs. ↩