PostgreSQL Overview
PostgreSQL is one of the world's most advanced open source database systems, and it has many features that are widely used by developers and system administrators alike. Starting with PostgreSQL 10, many new features have been added to PostgreSQL, which contribute greatly to the success of this exceptional open source product. In this book, many of these cool features will be covered and discussed in great detail.
In this chapter, you will be introduced to PostgreSQL and its cool new features available in PostgreSQL 10.0 and beyond. All relevant new functionalities will be covered in detail. Given the sheer number of changes made to the code and given the size of the PostgreSQL project, this list of features is of course by far not complete, so I tried to focus on the most important aspects relevant to most people.
The features outlined in this chapter will be split into the following categories  Database administration
SQL and developer related Backup, recovery, and replication Performance related topics
What is new in PostgreSQL 10.0.
PostgreSQL 10.0 has been released in late 2017 and is the first version that follows the new numbering scheme introduced by the PostgreSQL community. From now on, the way major releases are done will change and therefore, the next major version after PostgreSQL
10.0 will not be 10.1 but PostgreSQL 11. Versions 10.1 and 10.2 are merely service releases and will only contain bug fixes.
Understanding new database administration functions
PostgreSQL 10.0 has many new features that can help the administrator reduce work and make systems more robust.
One of these features that makes life easier for administrators is related to additional information in pg  stat  activity.
Using additional information in pg  stat  activity
Before PostgreSQL 10.0, pg  stat  activity only contained information about normal backend processes serving end users (connections). However, this has changed. Since PostgreSQL 10.0, a
lot more information is exposed. It is possible to figure out what these other system processes are doing.
The following listing shows the content of  pg  stat  activity on an idle database instance 
What you see here is that every server process is listed. It will allow you to gain some insights into what is happening the server.
Introducing SCRAM SHA 256
Most people use passwords to connect to the database and manage security. Traditionally, people utilized md5. However, md5 is not safe anymore and therefore new authentication methods are needed. Starting with version 10.0, PostgreSQL supports SCRAM SHA 256, which is far safer than the previous authentication method.

The old way of doing it is still supported. However, it is strongly recommended to move to
SCRAM SHA 256 in favor of md5.
Improving support for replication
The introduction of PostgreSQL also saw the introduction of logical replication, which has not been in the core before.
Understanding logical replication
Since version 8.0, PostgreSQL has supported binary replication (also often referred to as WAL shipping). The ability to distribute transaction log ( WAL) has been improved steadily over the years.
With the introduction of PostgreSQL 10.0, a new feature has been added to PostgreSQL—Logical replication. How does it work. Logical replication allows you to publish a set of tables on one server and ask other servers to subscribe to the changes.

To publish data, the new CREATE  PUBLICATION command has been introduced 
Once the data has been published, remote servers can subscribe to these changes and receive information about what has happened to those published data sets 
CREATE  SUBSCRIPTION is used on the slave side to attach to these changes. The beauty of the concept is that a server can publish one set of tables while subscribing to some other tables at the same time—there is no such thing as always master or always slave anymore. Logical replication allows you to flexibly distribute data.
Introducing quorum COMMIT
PostgreSQL has offered support to synchronous replication for quite some time now. Traditionally, only one server could act as a synchronous standby. This has changed. In PostgreSQL 10.0, the community has introduced quorum COMMITs. The idea is actually quite simple. Suppose you want five out of seven servers to confirm a transaction before the master returns a COMMIT. This is exactly what a quorum COMMIT does. It gives the developers and administrators a chance to define what COMMIT does in a more fine  grained way.
To configure quorum COMMITs, the syntax of  synchronous  standby  names has been extended. Here are two simple examples 
Partitioning data
There have been talks about introducing partitioning to PostgreSQL for years. However, big, important features take time to implement and this is especially true if you are aiming for a good, extensible, and future proof implementation. In PostgreSQL 10.0, table partitioning has finally made it to the PostgreSQL core. Of course, the implementation is far from complete, and a lot of work has to be done in the future to add even more features. However, support for partitioning is important and will definitely be one of the most desirable things in PostgreSQL 10.0.
As of now, partitioning is able to 
Automatically create proper child constraints
Route changes made to the parent table to the child table
However, as stated earlier, there are still a couple of missing features that have not been addressed yet. Here are some of the more important things  Create child tables automatically in case data comes in, which is not covered by partitioning criteria yet
No support for hash partitioning
Move updated rows that no longer match the partition
Handle partitions in parallel
The roadmap for PostgreSQL 11.0 already suggests that many of these things might be supported in the next release.
Making use of CREATE STATISTICS
CREATE  STATISTICS is definitely one of my personal favorite features of PostgreSQL 10.0 because it allows consultants to help customers in many real world situations. So, what is it all about. When you run SQL, the optimizer has to come up with clever decisions to speed up your queries. However, to do so, it has to rely heavily on estimates to figure out how much data a certain clause or a certain operation returns. Before version 10.0, PostgreSQL only had information about individual columns. Let's look at an example 
In version 9.6, PostgreSQL checks which fraction of the table matches Ford and which fraction matches Mini  Clubman. Then, it would try to guess how many rows match both criteria. Remember, PostgreSQL 9.6 only has information about each column—it does not know that these columns are actually related. Therefore, it will simply multiply the odds of finding Ford with the odds of finding Mini  Clubman and use this number. However, Ford does not produce a Mini  Clubman instance—only BMW does. Therefore, the estimate is wrong. The same cross column correlation problem can happen in other cases too. The number of rows returned by a join might not be clear and the number of groups returned by a GROUP  BY clause might be an issue.
Consider the following example 
The number of children born to people of a certain age will definitely depend on their age. The likelihood that some 30 year old women will have children is pretty high and therefore there will be a count. However, if you happen to be 98, you might not be so lucky and it is pretty unrealistic to have a baby, especially if you are a man (men tend to not give birth to children).
CREATE  STATISTICS will give the optimizer a chance to gain deeper insights into what is going on by storing multivariate statistics. The idea is to help the optimizer handle functional dependencies.
Improving parallelism
PostgreSQL 9.6 was the first version supporting parallel queries in their most basic form. Of course, not all parts of the server are fully parallel yet. Therefore, it is an ongoing effort to speed up even more operations than before. PostgreSQL 10.0 is a major step towards even more parallelism as a lot more operations can now benefit from multi core systems.
Indexes are a key area of improvement and will benefit greatly from additional features introduced into PostgreSQL 10.0. There is now full support for parallel b tree scans as well as for bitmap scans. For now, only b tree indexes can benefit from parallelism but this will most likely change in future releases too, to ensure that all types of indexes can enjoy an even better performance.
In addition to indexing, the PostgreSQL community has also worked hard to introduce support for parallel merge joins and to allow for more procedures to run in parallel. Some of the latest blog posts from the PostgreSQL community already suggest that many new features related to parallelism are in the pipeline for PostgreSQL 11.0.
Introducing ICU encodings
When a PostgreSQL database is created, the administrator can choose the encoding, which should be used to store the data. Basically, the configuration decides which characters exist and in which order they are displayed. Here is an example—de  AT@UTF 8. In this case, we will use Unicode characters, which will be displayed in an Austrian sort order (Austrians speak some sort of German). So, de  AT will define the order in which the data will be sorted.
To achieve this kind of sorting, PostgreSQL relies heavily on the operating system. The trouble is that if the sort order of characters changes in the operating system for some reason (maybe because of a bug or because of some other reason), PostgreSQL will have troubles with its indexes. A normal b tree index is basically a sorted list, and if the sort order changes, naturally, there is a problem.
The introduction of the ICU library is supposed to fix this problem. ICU offers stronger promises than the operating system and is, therefore, more suitable for long term storage of data. With the introduction of PostgreSQL 10.0, ICU encodings can be enabled.
Summary
In PostgreSQL 10.0, a lot of functionalities have been added that allow people to run even more professional applications even faster and more efficiently. All areas of the database server have been improved and many new professional features have been added. In the future, even more improvements will be made. Of course, the changes listed in this chapter are by far not complete because many small changes were made.
Understanding Transactions and Locking
Locking is an important topic in any kind of database. It is not enough to understand just how it works to write proper or better applications; it is also essential from a performance point of view. Without handling locks properly, your applications might not only be slow, they might also be wrong and behave in very unexpected ways. In my opinion, locking is the key to performance and having a good overview will certainly help. Therefore, understanding locking and transaction is important for administrators and developers alike. In this chapter, you will learn the following topics 
Working with PostgreSQL transactions
Understanding basic locking
Making use of FOR  SHARE and FOR  UPDATE Understanding transaction isolation levels Considering SSI transactions
Observing deadlocks and similar issues
Optimizing storage and managing cleanups
At the end of the chapter, you will be able to understand and utilize PostgreSQL
transactions in the most efficient way possible.
Working with PostgreSQL transactions
PostgreSQL provides you with a highly advanced transaction machinery that offers countless features to developers and administrators alike. In this section, it is time to look at the basic concept of transactions.

The first important thing to know is that in PostgreSQL, everything is a transaction. If you send a simple query to the server, it is already a transaction. Here is an example 
In this case, the SELECT statement will be a separate transaction. If the same command is executed again, different timestamps will be returned.
Keep in mind that the now() function will return the transaction time. The
SELECT statement will, therefore, always return two identical timestamps.
If more than one statement has to be a part of the same transaction, the BEGIN statement must be used 
where transaction  mode is one of 
The BEGIN statement will ensure that more than one command will be packed into a transaction. Here is how it works 
The important point here is that both timestamps will be identical. As mentioned earlier, we are talking about transaction time here.
Description  Commit the current transaction
Syntax 
There are a couple of syntax elements here. You can just use COMMIT, COMMIT  WORK, or COMMIT  TRANSACTION. All three options have the same meaning. If this is not enough, there is more 
The END clause is the same as the COMMIT clause.
ROLLBACK is the counterpart of COMMIT. Instead of successfully ending a transaction, it will simply stop the transaction without ever making things visible to other transactions 
Some applications use ABORT instead of ROLLBACK. The meaning is the same.
Handling errors inside a transaction
It is not always the case that transactions are correct from beginning to end. However, in
PostgreSQL, only error free transactions can be committed. Here is what happens 
Note that division  by  zero did not work out.
In any proper database, an instruction similar to this will instantly
It is important to point out that PostgreSQL will error out, unlike MySQL, which is far less strict. After an error has occurred, no more instructions will be accepted even if those instructions are semantically and syntactically correct. It is still possible to issue a COMMIT. However, PostgreSQL will roll back the transaction because it is the only correct thing to be done at this point.
Making use of SAVEPOINT
In professional applications, it can be pretty hard to write reasonably long transactions without ever encountering a single error. To solve the problem, users can utilize something called SAVEPOINT. As the name indicates, it is a safe place inside a transaction that the application can return to in the event things go terribly wrong. Here is an example 
After the first SELECT clause, I decided to create SAVEPOINT to make sure that the application can always return to this point inside the transaction. As you can
see, SAVEPOINT has a name, which is referred to later.
After returning the savepoint called a, the transaction can proceed normally. The code has jumped back before the error, so everything is fine.
The number of savepoints inside a transaction is practically unlimited. We have seen customers with over 250,000 savepoints in a single operation. PostgreSQL can easily handle this.
Many people ask, what will happen if you try to reach a savepoint after a transaction has ended. The answer is, the life of a savepoint ends as soon as the transaction ends. In other words, there is no way to return to a certain point in time after the transactions have been completed.


Transactional DDLs
PostgreSQL has a very nice feature that is unfortunately not present in many commercial database systems. In PostgreSQL, it is possible to run DDLs (commands that change the data structure) inside a transaction block. In a typical commercial system, a DDL will implicitly commit the current transaction. Not so in PostgreSQL.
Apart from some minor exceptions (DROP  DATABASE, CREATE  TABLESPACE DROP TABLESPACE, and so on), all DDLs in PostgreSQL are transactional, which is a huge plus and a real benefit to end users.
Here is an example 
In this example, a table has been created and modified, and the entire transaction is aborted instantly. As you can see, there is no implicit COMMIT or any other strange behavior. PostgreSQL simply acts as expected.
Transactional DDLs are especially important if you want to deploy software. Just imagine running a Content Management System (CMS). If a new version is released, you'll want to upgrade. Running the old version would still be OK; running the new version is also OK but you really don't want a mixture of old and new. Therefore, deploying an upgrade in a single transaction is definitely highly beneficial as it upgrades an atomic operation.
In order to facilitate good software practices, we can include several separately coded modules from our source control system into a single deployment transaction.
Understanding basic locking
In this section, you will learn basic locking mechanisms. The goal is to understand how locking works in general and how to get simple applications right.
To show how things work, a simple table can be created. For demonstration purposes, I will add one row to the table 
The first important thing is that tables can be read concurrently. Many users reading the same data at the same time won't block each other. This allows PostgreSQL to handle thousands of users without problems.
Multiple users can read the same data at the same time without blocking each other.
The question now is  what happens if reads and writes occur at the same time. Here is an example. Let us assume that the table contains one row and its id  =  0 
Two transactions are opened. The first one will change a row. However, this is not a problem as the second transaction can proceed. It will return the old row as it was before the UPDATE. This behavior is called Multi Version Concurrency Control (MVCC).
A transaction will see data only if it has been committed by the writing transaction prior to the initiation of the read transaction. One transaction cannot inspect the changes made by another active connection. A transaction can see only those changes that have already been committed.

There is also a second important aspect  many commercial or open source databases are still (as of 2017) unable to handle concurrent reads and writes. In PostgreSQL, this is absolutely not a problem. Reads and writes can coexist.
Writing transactions won't block reading transactions.
After the transaction has been committed, the table will contain 2.
What will happen if two people change data at the same time. Here is an example 
Suppose you want to count the number of hits on a website. If you run the code as outlined just now, no hit can be lost because PostgreSQL guarantees that one UPDATE is performed after the other.
 PostgreSQL will only lock rows affected by UPDATE. So, if you have 1,000 rows, you can theoretically run 1,000 concurrent changes on the same table.
It is also noteworthy that you can always run concurrent reads. Our two writes will not block reads.
Avoiding typical mistakes and explicit locking
In my life as a professional PostgreSQL consultant (https   www.cybertec postgresql. com), I have seen a couple of mistakes that are made again and again. If there are constants in life, these typical mistakes are definitely some of the things that never change.
Here is my favorite 
In this case, there will be either a duplicate key violation or two identical entries. Neither variation of the problem is all that appealing.

One way to fix the problem is to use explicit table locking 
where lockmode  is one of the following 
ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE| SHARE |
SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE

As you can see, PostgreSQL offers eight types of locks to lock an entire table. In PostgreSQL, a lock can be as light as an ACCESS  SHARE lock or as heavy as an ACCESS  EXCLUSIVE lock. The following list shows what these locks do 

ACCESS  SHARE  This type of lock is taken by reads and conflicts only with ACCESS  EXCLUSIVE, which is set by DROP  TABLE and the like. Practically, this means that SELECT cannot start if a table is about to be dropped. This also implies that DROP  TABLE has to wait until a reading transaction is completed.
ROW  SHARE  PostgreSQL takes this kind of lock in the case of SELECT  FOR UPDATE SELECT  FOR  SHARE. It conflicts with EXCLUSIVE and ACCESS EXCLUSIVE.
ROW  EXCLUSIVE  This lock is taken by INSERT, UPDATE, and DELETE. It conflicts with SHARE, SHARE  ROW  EXCLUSIVE, EXCLUSIVE, and ACCESS  EXCLUSIVE. SHARE  UPDATE  EXCLUSIVE  This kind of lock is taken by CREATE  INDEX
CONCURRENTLY, ANALYZE, ALTER  TABLE, VALIDATE, and some other flavors of
ALTER  TABLE as well as by VACUUM (not VACUUM  FULL). It conflicts with the
SHARE  UPDATE  EXCLUSIVE, SHARE, SHARE  ROW  EXCLUSIVE, EXCLUSIVE, and
ACCESS  EXCLUSIVE lock modes.
SHARE  When an index is created, SHARE locks will be set. It conflicts with ROW EXCLUSIVE, SHARE  UPDATE  EXCLUSIVE, SHARE  ROW  EXCLUSIVE, EXCLUSIVE, and ACCESS  EXCLUSIVE.
SHARE  ROW  EXCLUSIVE  This one is set by CREATE  TRIGGER and some forms of
ALTER  TABLE and conflicts with everything but ACCESS  SHARE.
EXCLUSIVE  This type of lock is by far the most restrictive one. It protects against reads and writes alike. If this lock is taken by a transaction, nobody else can read or write to the table affected.
ACCESS  EXCLUSIVE  This lock prevents concurrent transactions from reading and writing.

Given the PostgreSQL locking infrastructure, one solution to the max problem outlined previously would be as follows 
Keep in mind that this is a pretty nasty way of doing this kind of operation because nobody else can read or write to the table during your operation. Therefore, ACCESS  EXCLUSIVE should be avoided at all costs.

Considering alternative solutions
There is an alternative solution to the problem. Consider the following example  you are asked to write an application generating invoice numbers. The tax office might require you to create invoice numbers without gaps and without duplicates. How would you do it. Of course, one solution would be a table lock. However, you can really do better. Here is what I would do 

In this case, I introduced a table called t  watermark. It contains just one row. The WITH will be executed first. The row will be locked and incremented, and the new value will be returned. Only one person can do this at a time. The value returned by the CTE is then used in the invoice table. It is guaranteed to be unique. The beauty is that there is only a simple row lock on the watermark table; no reads will be blocked in the invoice table. Overall, this way is more scalable.
 
Making use of FOR SHARE and FOR UPDATE
Sometimes, data is selected from the database; then some processing happens in the application and, finally, some changes are made back on the database side. This is a classic example of SELECT  FOR  UPDATE.

Here is an example 

The problem here is that two people might select the same unprocessed data. Changes
made to these processed rows will then be overwritten. In short, a race condition will occur.

To solve this problem, developers can make use of SELECT  FOR  UPDATE. Here is how it works 
The SELECT  FOR  UPDATE will lock rows just like an UPDATE would. This means that no changes can happen concurrently. All locks will be released on COMMIT as usual.

If one SELECT  FOR  UPDATE is waiting for some other SELECT  FOR  UPDATE, one has to wait until the other one completes (COMMIT or ROLLBACK). If the first transaction does not want to end, for whatever reason, the second transaction might potentially wait forever. To avoid this, it is possible to use SELECT  FOR  UPDATE  NOWAIT.
 
Here is how it works 

If   NOWAIT is not flexible enough for you, consider using lock  timeout. It will contain the amount of time you want to wait on locks. You can set this on a per session level 

test=  SET lock  timeout TO 5000; SET

In this, the value is set to 5 seconds.

While SELECT does basically no locking, SELECT  FOR  UPDATE can be pretty harsh. Just imagine the following business process  we want to fill up an airplane providing 200 seats. Many people want to book seats concurrently. In this case, the following might happen 
 
The trouble is that only one seat can be booked at a time. There are potentially 200 seats available but everybody has to wait for the first person. While the first seat is blocked, nobody else can book a seat even if people don't care which seat they get in the end.
SELECT  FOR  UPDATE  SKIP  LOCKED will fix the problem. Let's create some sample data first 
If everybody wants to fetch two rows, we can serve 100 concurrent transactions at a time without having to worry about blocking transactions.
Keep in mind that waiting is the slowest form of execution. If only one transaction can be active at a time, it is pointless to buy ever bigger servers.
However, there is more. In some cases, a FOR  UPDATE can have unintended consequences. Most people are not aware of the fact that FOR  UPDATE will have an impact on foreign keys. Let's assume that we have two tables  one to store currencies and the other to store
accounts 
Although there is a SELECT  FOR  UPDATE on accounts, the UPDATE on the currency table be will be blocked. This is necessary because otherwise, there is a chance of breaking the
foreign key constraint altogether. In a fairly complex data structure, you can therefore easily end up with contentions in an area where they are least expected (some highly important lookup tables).

On top of FOR  UPDATE, there are FOR  SHARE, FOR  NO  KEY  UPDATE, and FOR  KEY  SHARE. The following listing describes what these modes actually mean 

FOR  NO  KEY  UPDATE  This one is pretty similar to FOR  UPDATE. However, the lock is weaker and, therefore, it can coexist with SELECT  FOR  SHARE.
FOR  SHARE  FOR  UPDATE is pretty strong and works on the assumption that you are definitely going to change rows. FOR  SHARE is different because more than one transaction can hold a FOR  SHARE lock at the same time.
FOR  KEY  SHARE  This behaves similarly to FOR  SHARE, except that the lock is weaker. It will block FoR  UPDATE but will not block FOR  NO  KEY  UPDATE.
The important thing here is to simply try things out and observe what happens. Improving locking behavior is really important as it can dramatically improve the scalability of your application.
Understanding transaction isolation levels
Up until now, you have seen how to handle locking as well as some basic concurrency. In this section, you will learn transaction isolation. To me, this is one of the most neglected topics in modern software development. Only a small fraction of software developers are actually aware of this issue, which in turn leads to mind boggling bugs.
Here is an example of what can happen 
Most users would actually expect the left transaction to always return 300 regardless of the second transaction. However, this is not true. By default, PostgreSQL runs in the READ COMMITTED transaction isolation mode. This means that every statement inside a
transaction will get a new snapshot of the data, which will be constant throughout the query.
An SQL statement will operate on the same snapshot and will ignore changes by concurrent transactions while it is running.
 If you want to avoid this, you can use TRANSACTION  ISOLATION  LEVEL  REPEATABLE READ. In this transaction isolation level, a transaction will use the same snapshot through the entire transaction. Here is what will happen 
As just outlined, the first transaction will freeze its snapshot of the data and provide us with constant results throughout the entire transaction. This feature is especially important if you want to run reports. The first and the last page of a report should always be consistent and operate on the same data. Therefore, the repeatable read is key to consistent reports.

Note that isolation related errors won't always pop up instantly. It can happen that trouble is noticed years after an application has been moved to production.

Repeatable read is not more expensive than read committed. There is no need to worry about performance penalties. For normal Online  Transaction Processing (OLTP), read committed has various advantages because changes can be seen much earlier and the odds of unexpected errors are usually lower.
 Considering SSI transactions
On top of read committed and repeatable read, PostgreSQL offers Serializable Snapshot Isolation (SSI) transactions. So, in all, PostgreSQL supports three isolation levels. Note that read uncommitted (which still happens to be the default in some commercial databases) is not supported  if you try to start a read uncommitted transaction, PostgreSQL will silently map to read committed. Let us get back to the serializable isolation level.
The idea behind serializable is simple; if a transaction is known to work correctly when there is only a single user, it will also work in the case of concurrency when this isolation level is chosen. However, users have to be prepared; transactions may fail (by design) and error out. In addition to this, a performance penalty has to be paid.
If you want to know more about this isolation level, consider checking out https   wiki.postgresql.org wiki Serializable.
Consider using serializable only when you have a decent understanding of what is going on inside the database engine.
Observing deadlocks and similar issues
Deadlocks are an important issue and can happen in every database I am aware of. Basically, a deadlock will happen if two transactions have to wait on each other.
In this section, you will see how this can happen. Let's suppose we have a table containing two rows 
As soon as the deadlock is detected, the following error message will show up 

ERROR  deadlock detected
DETAIL  Process 91521 waits for ShareLock on transaction 903;
blocked by process 77185.
Process 77185 waits for ShareLock on transaction 905;
blocked by process 91521.
HINT  See server log for query details.
CONTEXT  while updating tuple (0,1) in relation "t  deadlock"
 PostgreSQL is even kind enough to tell us which row has caused the conflict. In my example, the root of all evil is a tuple (0,  1). What you can see here is ctid, which is a unique identifier of a row in a table. It tells us about the physical position of a row inside the table. In this example, it is the first row in the first block (0).
It is even possible to query this row if it is still visible to your transaction 
Keep in mind that this query might not return a row if it has already been deleted or modified.
However, it isn't only the case that deadlocks can lead to potentially failing transactions. It can also happen that transactions are not serialized for various reasons. The following example shows what can happen. To make the example work, I assume that you've still got two rows, id  =  1 and id  =  2.
In this example, two concurrent transactions are at work. As long as transaction 1 is just selecting data, everything is fine because PostgreSQL can easily preserve the illusion of static data. But what happens if the second transaction commits a DELETE. As long as there are only reads, there is still no problem. The trouble begins when transaction 1 tries to delete or modify data, which is at this point already really dead. The only solution here for PostgreSQL is to error out 
Practically, this means that end users have to be prepared to handle erroneous transactions. If something goes wrong, properly written applications must be able to try again.
Utilizing advisory locks
PostgreSQL has a highly efficient and sophisticated transaction machinery that is capable of handling locks in a really fine grained and efficient way. Some years ago, some people came up with the idea of using this code to synchronize applications with each other.
Thus, advisory locks were born.
When using advisory locks, it is important to mention that they won't go away on COMMIT as normal locks do. Therefore, it is really important to make sure that unlocking is done properly and in a totally reliable way.
If you decide to use an advisory lock, what you really lock is a number. So, this is not about rows or data; it is really just a number. Here is how it works 
The first transaction will lock 15. The second transaction has to wait until this number has been unlocked again. The second session will even wait after the first one has committed. This is highly important as you cannot rely on the fact that the end of the transaction will nicely and miraculously solve things for you.
Optimizing storage and managing cleanup
Transactions are an integral part of the PostgreSQL system. However, transactions come with a small price tag attached. As already shown in this chapter, it can happen that concurrent users will be presented with different data. Not everybody will get the same data returned by a query. In addition to this, DELETE and UPDATE are not allowed to actually overwrite data as ROLLBACK would not work. If you happen to be in the middle of a large DELETE operation, you cannot be sure whether you will be able to COMMIT or not. In addition to this, data is still visible while you perform DELETE, and sometimes data is even visible once your modification has long since finished.
Consequently, this means that cleanup has to happen asynchronously. A transaction cannot clean up its own mess and COMMIT ROLLBACK might be too early to take care of dead rows.
 The solution to this problem is VACUUM 
VACUUM will visit all pages that potentially contain modifications and find all the dead space. The free space found is then tracked by Free Space Map (FSM) of the relation.
Note that VACUUM will, in most cases, not shrink the size of a table. Instead, it will track and find free space inside existing storage files.
Tables will usually have the same size after VACUUM. If there are no valid rows at the end of a table, file sizes can go down in some rare cases. This is not the rule but rather the exception.


What this means to the end users will be outlined in the Watching VACUUM  at work section of this chapter.
Configuring VACUUM and autovacuum
Back in the early days of PostgreSQL projects, people had to run VACUUM manually. Fortunately, this is long gone. Nowadays, administrators can rely on a tool called autovacuum, which is part of the PostgreSQL Server infrastructure. It automatically takes care of cleanup and works in the background. It wakes up once per minute (see autovacuum  naptime  =  1 in postgresql.conf) and checks if there is work to do. If there is work, autovacuum will fork up to three worker processes (see autovacuum  max  workers in postgresql.conf).
The main question is, when does autovacuum trigger the creation of a worker process.
Actually, the autovacuum process does not fork processes itself. Instead, it tells the main process to do so. This is done to avoid zombie processes in the case of failure and to improve robustness.
 The answer to this question can again be found in postgresql.conf 
autovacuum  vacuum  scale  factor tells PostgreSQL that a table is worth vacuuming if
20  of data has been changed. The trouble is that if a table consists of one row, one change
is already 100 . It makes absolutely no sense to fork a complete process to clean up just one
row. Therefore, autovacuum  vacuuum  threshold says that we need 20  and this 20 
must be at least 50 rows. Otherwise, VACUUM won't kick in. The same mechanism is used
when it comes to optimizer stats creation. 10  and at least 50 rows are needed to justify new optimizer stats. Ideally, autovacuum creates new statistics during a normal VACUUM to avoid unnecessary trips to the table.


Digging into transaction wraparound related issues
There are two more settings in postgresql.conf that are quite important to understand 

autovacuum  freeze  max  age = 200000000 autovacuum  multixact  freeze  max  age = 400000000

To understand the overall problem, it is important to understand how PostgreSQL handles concurrency. The PostgreSQL transaction machinery is based on the comparison of transaction IDs and the states transactions are in.

Let's look at an example. If I am transaction ID 4711 and if you happen to be 4712, I won't see you because you are still running. If I am transaction ID 4711 but you are transaction ID
3900, I will see you provided you have committed, and I will ignore you if you failed.

The trouble is as follows  transaction IDs are finite, not unlimited. At some point, they will start to wrap around. In reality, this means that transaction number 5 might actually be after transaction number 800,000,000. How does PostgreSQL know what was first. It does so by storing a watermark. At some point, those watermarks will be adjusted, and this is exactly when VACUUM starts to be relevant. By running VACUUM (or autovacuum), you can ensure that the watermark is adjusted in a way that there are always enough future transaction IDs left to work with.
 


Not every transaction will increase the transaction ID counter. As long as a transaction is still reading, it will only have a virtual transaction ID. This ensures that transaction IDs are not burned too quickly.


autovacuum  freeze  max  age defines the maximum number of transactions (age) that a table's pg  class.relfrozenxid field can attain before a VACUUM operation is forced to prevent transaction ID wraparound within the table. This value is fairly low because it also has an impact on clog cleanup (the clog or commit log is a data structure that stores two bits per transaction, which indicate whether a transaction is running, aborted, committed, or
still in a subtransaction).
autovacuum  multixact  freeze  max  age configures the maximum age (in multixacts) that a table's pg  class.relminmxid field can attain before a VACUUM operation is forced to prevent multixact ID wraparound within the table. Freezing tuples is an important performance issue and there will be more about this process in Chapter  6, Optimizing Queries for Good Performance, where we will discuss query optimization.
In general, trying to reduce the VACUUM load while maintaining operational security is a good idea. A VACUUM instance on large tables can be expensive, and therefore keeping an eye on these settings makes perfect sense.
A word on VACUUM FULL
Instead of normal VACUUM, you can also use VACUUM  FULL. However, I really want to point out that VACUUM  FULL actually locks the table and rewrites the entire relation. In the case of a small table, this might not be an issue. However, if your tables are large, the table lock can really kill you in minutes! VACUUM  FULL blocks upcoming writes and therefore some
people talking to your database might have the feeling that it is actually down. Hence, a lot of caution is advised.
To get rid of VACUUM  FULL, I recommend that you check out pg  squeeze (http   www.cybertec.at introducing pg  squeeze a postgresql extension to auto  rebuild bloated tables ), which can rewrite a table without blocking writes.
Watching VACUUM at work
After this introduction, it is time to see VACUUM in action. I have included this section here because my practical work as a PostgreSQL consultant and supporter (http   postgresql  support.de ) indicates that most people only have a very vague understanding of what happens on the storage side.
To stress this point again, in most cases, VACUUM will not shrink your tables; space is usually not returned to the filesystem.
Here is my example 
The idea is to create a simple table containing 100,000 rows. Note that it is possible to turn autovacuum off for specific tables. Usually, this is not a good idea for most applications. However, there are corner case, where autovacuum  enabled  =  off makes sense. Just consider a table whose life cycle is very short. It does not make sense to clean out tuples if the developer already knows that the entire table will be dropped within seconds. In data warehousing, this can be the case if you use tables as staging areas. VACUUM is turned off in this example to ensure that nothing happens in the background; all you see is triggered by me and not by some process.
First of all, the size of the table is checked 
pg  relation  size returns the size of a table in bytes. pg  size  pretty will take this number and turn it into something human readable.
What happens is highly important to understand PostgreSQL; the database engine has to copy all the rows. Why. First of all, we don't know whether the transaction will be successful, so the data cannot be overwritten. The second important aspect is that a concurrent transaction might still be seeing the old version of the data.
The UPDATE operation will copy rows.
Logically, the size of the table will be larger after the change has been made 
ctid is the physical position of a row on a disk. Using ORDER  BY  ctid  DESC, you will basically read the table backwards in the physical order. Why should you care. The reason is that there are some very small values and some very big values at the end of the table. What happens if they are deleted.
Although only 2  of the data has been deleted, the size of the table has gone down by two thirds. The reason is that if VACUUM only finds dead rows after a certain position in the table, it can return space to the filesystem. This is the only case in which you will actually see the table size go down. Of course, normal users have no control over the physical position of data on the disk. Therefore, storage consumption will most likely stay somewhat the same unless all rows are deleted.
Why are there so many small and big values at the end of the table anyway. After the table is initially populated with 100,000 rows, the last block is not completely full, so the first UPDATE will fill up the last block with changes. This naturally shuffles the end of the table a bit. In this carefully crafted example, this is the reason for the strange layout at the end of the table.
In real world applications, the impact of this observation cannot be stressed enough. There is no performance tuning without really understanding storage.
Making use of snapshot too old
VACUUM does a good job and it will reclaim free space as needed. However, when can VACUUM actually clean out rows and turn them into free space. The rule is this  if a row cannot be seen by anybody anymore, it can be reclaimed. In reality, this means that everything that is no longer seen even by the oldest active transaction can be considered to be really dead.
This also implies that really long transactions can postpone cleanup for quite some time. The logical consequence is table bloat. Tables will grow beyond proportion and performance will tend to go downhill. Fortunately, starting with PostgreSQL 9.6, the database has a nice feature that allows the administrator to intelligently limit the duration
of a transaction. Oracle administrators will be familiar with the snapshot too old error; since PostgreSQL 9.6, this error message is also available. However, it is more of a feature than an unintended side effect of bad configuration (which it actually is in Oracle).
To limit the lifetime of snapshots, you can make use of a setting in postgresql.conf 
If this variable is set, transactions will fail after a certain amount of time. Note that this setting is on an instance level and it cannot be set inside a session. By limiting the age of a transaction, the risk of insanely long transactions will decrease drastically.
Summary
In this chapter, you learned transactions, locking and its logical implications, and the general architecture the PostgreSQL transaction machinery can have for storage, concurrency, and administration. You saw how rows are locked and which features are available in PostgreSQL.
In Chapter  3, Making use of Indexes, you will learn one of the most important topics in database work  indexing. You will learn about the PostgreSQL query optimizer as well as various types of indexes and their behavior.
In Chapter  2, Understanding Transactions and Locking, you learned concurrency and locking. In this chapter, it is time to attack indexing head on. The importance of this topic cannot be stressed enough—indexing is (and will most likely remain) one of the most important topics in the life of every database engineer.
After 18 years of professional, full time PostgreSQL consulting and PostgreSQL 24x7 support (www.cybertec postgresql.com), I can say one thing for sure—bad indexing is the main source of bad performance. Of course, it is important to adjust memory parameters and all that. However, it is all in vain if indexes are not used properly. There is simply no replacement for a missing index.
Therefore, I have dedicated an entire chapter to indexing alone to give you as many insights as possible.
In this chapter, you will learn these topics  When does PostgreSQL use indexes.
How does an optimizer handle things.
What types of indexes are there and how do they work. Using your own indexing strategies
At the end of the chapter, you will be able to understand how indexes can be used beneficially in PostgreSQL.
 Understanding simple queries and the cost model
In this section, we will get started with indexes. To show how things work, some test data is needed. The following code snippet shows how data can be created easily 
In the first line, a simple table is created. Two columns are used  an autoincrement column that just keeps creating numbers and a column that will be filled with static values.
The generate  series function will generate numbers from 1 to 2 million. So, in this example, 2 million static values for hans and 2 million static values for paul are created.
In all, 4 million rows have been added 
In this case, the timing command will tell psql to show the runtime of a query. Note that this is not the real execution time on the server, but the time measured by psql. In case of very short queries, network latency can be a substantial part of the total time, so this has to be taken into account.


Making use of EXPLAIN
In this example, reading 4 million rows has taken more than 100 milliseconds. From a performance point of view, it is a total disaster. To figure out what goes wrong, PostgreSQL offers the EXPLAIN command 
When you have a feeling that a query is not performing well, EXPLAIN will help you to reveal the real performance problem.
 Here is how it works 
What you see in this listing is an execution plan. In PostgreSQL, a SQL statement will be executed in four stages. The following components are at work 
The parser will check for syntax errors and obvious problems
The rewrite system takes care of rules (views and other things)
The optimizer will figure out how to execute a query in the most efficient way and work out a plan
The plan provided by the optimizer will be used by the executor to finally create the result
The purpose of EXPLAIN is to see what the planner has come up with to run the query efficiently. In my example, PostgreSQL will use a parallel sequential scan. This means that two workers will cooperate and work on the filter condition together. The partial results are then united through a thing called a gather node, which has been introduced in PostgreSQL
9.6 (it is a part of the parallel query infrastructure). If you look at the plan more precisely,
you will see how many rows PostgreSQL expects at each stage of the plan (in this
example, rows  =  1, that is, one row will be returned).
In PostgreSQL 9.6 and 10.0, the number of parallel workers will be determined by the size of the table. The larger an operation is, the more parallel workers PostgreSQL will fire up. For a very small table, parallelism is not used as it would create too much overhead.
Parallelism is not a must. It is always possible to reduce the number of parallel workers to mimic pre PostgreSQL 9.6 behavior by setting the following variable to 0 
Note that this change has no side effect as it is only in your session. Of course, you can also decide the change in the postgresql.conf file, but I would not advise you to do this, as you might lose quite a lot of performance provided by the parallel queries.
 Digging into the PostgreSQL cost model
If only one CPU is used, the execution plan will look like this 
The pg  relation  size function will return the size of the table in bytes. Given the example, you can see that the relation consists of 21622 blocks (8k each). According to the cost model, PostgreSQL will add costs of one for each block it has to read sequentially.
The configuration parameter to influence that is as follows 
However, reading a couple of blocks from a disk is not everything we have to do. It is also necessary to apply the filter and to send these rows through a CPU. Two parameters are here to account for these costs 

As you can see, this is exactly the number seen in the plan. Costs will consist of a CPU part and an I O part, which will all be turned into a single number. The important thing here is that costs have nothing to do with real execution, so it is impossible to translate costs to milliseconds. The number the planner comes up with is really just an estimate.

Of course, there are some more parameters outlined in this brief example. PostgreSQL also has special parameters for index related operations, as follows 
random  page  cost  =  4  If PostgreSQL uses an index, there is usually a lot of random I O involved. On traditional spinning disks, random reads are much more important than sequential reads, so PostgreSQL will account for them accordingly. Note that on SSDs, the difference between random and sequential reads does not exist anymore, so it can make sense to set random  page  cost  =
1 in the postgresql.conf file.
cpu  index  tuple  cost  =  0.005  If indexes are used, PostgreSQL will also consider that there is some CPU cost invoiced.
If you are utilizing parallel queries, there are even more cost parameters 
parallel  tuple  cost  =  0.1  This defines the cost of transferring one tuple from a parallel worker process to another process. It basically accounts for the overhead of moving rows around inside the infrastructure. parallel  setup  cost  =  1000.0  This adjusts the costs of firing up a worker process. Of course, starting processes to run queries in parallel is not free, and so,
this parameter tries to model those costs associated with process management.
 
min  parallel  relation  size  =  8  MB  This defines the minimum size of a table considered for parallel queries. The larger a table grows, the more CPUs PostgreSQL will use. The size of the table has to triple to allow for one more worker process.
Deploying simple indexes
Firing up more worker processes to scan ever larger tables is sometimes not the solution. Reading entire tables to find just a single row is usually not a good idea.
Therefore, it makes sense to create indexes 

PostgreSQL uses Lehman Yao's high concurrency b tree for standard indexes. Along with some PostgreSQL specific optimizations, these trees provide end users with excellent performance. The most important thing is that Lehman Yao allows you to run many operations (reading and writing) on the very same index at the same time, which helps to improve throughput dramatically.
As you can see, our index containing 4 million rows will eat up 86  MB of disk space. In addition to this, writes to the table will be slower because the index has to be kept in sync all the time.

In other words, if you insert into a table featuring 20 indexes, you also have to keep in mind that we have to write to all those indexes on INSERT, which seriously slows down the writing.
 
Making use of sorted output
B tree indexes are not only used to find rows; they are also used to feed sorted data to the next stage in the process 
In this case, the index already returns data in the right sort order and therefore there is no need to sort the entire set of data. Reading the last 10 rows of the index will be enough to answer this query. Practically, this means that it is possible to find the top N rows of a table in a fraction of a millisecond.

However, ORDER  BY is not the only operation requiring sorted output. The min and max functions are also all about sorted output, so an index can be used to speed up these two operations as well. Here is an example 
In PostgreSQL, an index (a b tree, to be more precise) can be read in normal order or backwards. The thing now is that a b tree can be seen as a sorted list. So, naturally, the lowest value is at the beginning and the highest value is at the end. Therefore, min and max are perfect candidates for a speed up. What is also worth noticing is that in this case, the main table needs not be referenced at all.

In SQL, many operations rely on sorted input; therefore, understanding these operations is essential because there are serious implications on the indexing side.

Using more than one index at a time
Up until now, you have seen that one index at a time has been used. However, in many
real world situations, this is, by far, not sufficient. There are cases demanding more logic in
the database.
PostgreSQL allows the use of multiple indexes in a single query. Of course, this makes sense if many columns are queried at the same time. However, that's not always the case. It can also happen that a single index is used multiple times to process the very same column.
Here is an example 


The point here is that the id column is needed twice. First, the query looks for 30 and then, for 50. As you can see, PostgreSQL will go for a bitmap scan.

A bitmap scan is not the same as a bitmap index, which people from an Oracle background might know. They are two totally distinct things and have nothing in common. Bitmap indexes are an index type in Oracle, while bitmap scans are a scan method.


The idea behind a bitmap scan is that PostgreSQL will scan the first index, collecting a list of blocks containing the data. Then, the next index will be scanned to again compile a list of blocks. This works for as many indexes as desired. In the case of OR, these lists will then be unified, leaving us with a large list of blocks containing the data. Using this list, the table
will be scanned to retrieve these blocks.
 


The trouble now is that PostgreSQL has retrieved a lot more data than needed. In our case, the query will look for two rows; however, a couple of blocks might have been returned by the bitmap scan. Therefore, the executor will do as recheck to filter out these rows, which do not satisfy our conditions.

Bitmap scans will also work for AND conditions or a mixture of AND and OR. However, if
PostgreSQL sees an AND condition, it does not necessarily force itself into a bitmap scan.
Let's suppose that we got a query looking for everybody living in Austria and a person with a certain ID. It really makes no sense to use two indexes here because after searching for the ID, there is really not much data left. Scanning both indexes would be much more
expensive because there are 8 million people (including me) living in Austria, and reading
so many rows to find just one person is pretty pointless from a performance standpoint. The
good news is that the PostgreSQL optimizer will make all these decisions for you by
comparing the costs of different options and potential indexes, so there is no need to worry.


Using bitmap scans effectively
The question naturally arising now is, when is a bitmap scan most beneficial and when is it chosen by the optimizer. From my point of view, there are really only two use cases 

Avoiding using the same block over and over again
Combining relatively bad conditions

The first case is quite common. Suppose you are looking for everybody who speaks a certain language. For the sake of the example, we can assume that 10  of all people speak the required language. Scanning the index would mean that a block in the table has to be scanned all over again as many skilled speakers might be stored in the same block. By applying a bitmap scan, it is ensured that a specific block is only used once, which of course leads to better performance.

The second common use case is to use relatively weak criteria together. Let's suppose we are looking for everybody between 20 and 30 years of age owning a yellow shirt. Now, maybe 15  of all people are between 20 and 30 and maybe 15  of all people actually own a yellow shirt. Scanning a table sequentially is expensive, and so PostgreSQL might decide to choose two indexes because the final result might consist of just 1  of the data. Scanning both indexes might be cheaper than reading all of the data.

In PostgreSQL 10.0, parallel bitmap heap scans are supported. Usually, bitmap scans are used by comparatively expensive queries. Added parallelism in this area is, therefore, a huge step forward and definitely beneficial.
 


Using indexes in an intelligent way
So far, applying an index feels like the Holy Grail, which always improves performance magically. However, this is not the case. Indexes can also be pretty pointless in some cases.

Before digging into things more deeply, here is the data structure we have used for this example. Remember that there are only two distinct names and unique IDs 

 
At this point, one index has been defined, which covers the id column. In the next step, the name column will be queried. Before doing this, an index on the name will be created 
Now, it is time to see if the index is used correctly 
As expected, PostgreSQL will decide on using the index. Most users would expect this. But note that my query says hans2. Remember, hans2 does not exist in the table and the query plan perfectly reflects this. rows=1 indicates that the planner only expects a very small subset of data being returned by the query.
There is not a single row in the table, but PostgreSQL will never estimate zero rows because it would make subsequent estimations a lot harder because useful cost calculations of other nodes in the plan would be close to impossible.
 Let's see what happens if we look for more data 
In this case, PostgreSQL will go for a straight sequential scan. Why is that. Why is the system ignoring all indexes. The reason is simple; hans and paul make up the entire dataset because there are no other values (PostgreSQL knows that by checking the system
statistics). Therefore, PostgreSQL figures that the entire table has to be read anyway. There is no reason to read all of the index and the full table if reading just the table is sufficient.
In other words, PostgreSQL will not use an index just because there is one. PostgreSQL will use indexes when they make sense. If the number of rows is smaller, PostgreSQL will again consider bitmap scans and normal index scans 
The most important point to learn here is that execution plans depend on input values.
They are not static and not independent of the data inside the table. This is a very important observation, which has to be kept in mind all the time. In real world examples, the fact that plans change can often be the reason for unpredictable runtimes.
 Improving speed using clustered tables
In this section, you will learn about the power of correlation and the power of clustered tables. What is the whole idea. Consider you want to read a whole area of data. This might be a certain time range, some block, IDs, or so.

The runtime of such queries will vary depending on the amount of data and the physical arrangement of data on the disk. So, even if you are running queries that return the same number of rows, two systems might not provide the answer within the same time span, as the physical disk layout might make a difference.
Here is an example 
As you might remember, the data has been loaded in an organized and sequential way. Data has been added ID after ID, and so it can be expected that the data will be on the disk in a sequential order. This holds true if data is loaded into an empty table using some autoincrement column.

You have already seen EXPLAIN in action. In this example, EXPLAIN (analyze true, buffers true, and timing true) has been utilized. The idea is that analyze will not just show the plan but also execute the query and show us what has happened.

EXPLAIN analyze is perfect for comparing planner estimates with what really happened.

It is the best way to figure out whether the planner was correct or way off. The buffers true parameter will tell us how many 8k blocks were touched by the query. In this example, a total of 85 blocks were touched. Shared hit means that data was coming from the PostgreSQL I O cache (shared buffers). Altogether, it took PostgreSQL around four milliseconds to retrieve the data.
What happens if the data in your table is somewhat random. Will things change.

To create a table containing the same data but in random order, you can simply use ORDER BY  random(). It will make sure that the data is indeed shuffled on disk 
To function properly, PostgreSQL will need optimizer statistics. These statistics will tell PostgreSQL how much data there is, how values are distributed, and whether the data is correlated on disk. To speed things up even more, I have added a VACUUM call. Please mind that VACUUM will be discussed later in this book in a broader detail 
There are a couple of things to observe here. First of all, a staggering total of 8,057 blocks were needed and the runtime has skyrocketed to over 14 milliseconds. The only thing here
is that the somewhat rescued performance was the fact that data was again coming from the memory and not from the disk. Just imagine what it would mean if you had to access the disk 8,057 times just to answer this query. It would be a total disaster because disk wait would certainly slow down things dramatically.

However, there is more to see. You can even see that the plan has changed. PostgreSQL now uses a bitmap scan instead of a normal index scan. This is done to reduce the number of blocks needed in the query to prevent the even worse behavior.

How does the planner know how data is stored on the disk. pg  stats is a system view containing all the statistics about the content of the columns. The following query reveals the relevant content 
You can see that PostgreSQL takes care of every single column. The content of the view is created by a thing called ANALYZE, which is vital to the performance 
Usually, ANALYZE is automatically executed in the background using the autovacuum daemon, which will be covered later in this book.
Back to our query. As you can see, both tables have two columns (id and name). In the case of t  test.id, the correlation is 1, which means that the next value somewhat depends on the previous one. In my example, numbers are simply ascending. The same applies
In t  random, the situation is quite different; a negative correlation means that data is shuffled. You can also see that the correlation for the name column is around 0.5. In reality, it means that there is usually no straight sequence of identical names in the table, but it rather means that names keep switching all the time when the table is read in the physical order.
Why does this lead to so many blocks being hit by the query. The answer is relatively simple. If the data we need is not packed together tightly but spread out over the table evenly, more blocks are needed to extract the same amount of information, which in turn leads to worse performance.
Clustering tables
In PostgreSQL, there is a command called CLUSTER that allows us to rewrite a table in the desired order. It is possible to point to an index and store data in the same order as the index 
The CLUSTER command has been around for many years and serves its purpose well. But, there are some things to consider before blindly running it on a production system 

The CLUSTER command will lock the table while it is running. You cannot insert or modify data while CLUSTER is running. This might not be acceptable on a production system.
Data can only be organized according to one index. You cannot order a table by postal code, name, ID, birthday, and so on, at the same time. It means
that CLUSTER will make sense if there is a search criteria, which is used most of
the time.
Keep in mind that the example outlined in this book is more of a worst case scenario. In reality, the performance difference between a clustered and a non  clustered table will depend on the workload, the amount of data retrieved, cache hit rates, and a lot more.
The clustered state of a table will not be maintained as changes are made to a table during normal operations. Correlation will usually deteriorate as time goes by.
 Here is an example of how to run the CLUSTER command 
Depending on the size of the table, the time needed to cluster will vary.
Making use of index only scans
So far, you have seen when an index is used and when it is not. In addition to this, bitmap scans have been discussed.
However, there is more to indexing. The following two examples will only differ slightly although the performance difference might be fairly large. Here is the first query 
There is nothing unusual here. PostgreSQL uses an index to find a single row. What happens if only a single column is selected.

As you can see, the plan has changed from an index scan to an index only scan. In our example, the id column has been indexed, so its content is naturally in the index. There is no need to go to the table in most cases if all the data can already be taken out of the index. Going to the table is (almost) only required if additional fields are queried, which is not the case here. Therefore, the index only scan will promise significantly better performance than a normal index scan.

Practically, it can even make sense to include an additional column into an index here and there to enjoy the benefit of this feature. In MS SQL, adding additional columns is known as covering indexes. Similar behavior can be achieved in PostgreSQL as well.


Understanding additional b tree features
In PostgreSQL, indexing is a large field and covers many aspects of database work. As I have outlined in this book already, indexing is the key to performance. There is no good performance without proper indexing. Therefore, it is worth inspecting these indexing  related features in more detail.

Combined indexes
In my job, as a professional PostgreSQL support vendor, I am often asked about the difference between combined and individual indexes. In this section, I will try to shed some light on this question.
The general rule is this if a single index can answer your question, it is usually the best choice. However, you cannot index all possible combinations of fields people are filtering on. What you can do is use the properties of combined indexes to achieve as much gain as possible.
Let's suppose we have a table containing three columns  postal  code, last  name,
and first  name. A telephone book would make use of a combined index like this. You
will see that data is ordered by location. Within the same location, data will be sorted by last name and first name.
The following table will show which operations are possible given the three column index 
 
If columns are indexed separately, you will most likely end up seeing bitmap scans. Of course, a single hand tailored index is better.
Adding functional indexes
So far, you have seen how to index the content of a column as it is. However, this might not always be what you really want. Therefore, PostgreSQL allows the creation of functional indexes. The basic idea is very simple; instead of indexing a value, the output of a function is stored in the index.
The following example shows how the cosine of the id column can be indexed 
All you have to do is put the function on the list of columns and you are done. Of course, this won't work for all kinds of functions. Functions can only be used if their output is immutable 
Functions such as age are not really suitable for indexing because their output is not constant. Time goes on and consequently, the output of age will change too. PostgreSQL will explicitly prohibit functions that have the potential to change their result given the same input. The cos function is fine in this respect because the cosine of a value will still be the same in 1,000 years from now.
To test the index, I have written a simple query to show what will happen 
As expected, the functional index will be used just like any other index.
Reducing space consumption
Indexing is nice and its main purpose is to speed up things as much as possible. As with all the good stuff, indexing comes with a price tag  space consumption. To do its magic, an index has to store values in an organized fashion. If your table contains 10 million integer values, the index belonging to the table will logically contain these 10 million integer values plus additional overhead.
A b tree will contain a pointer to each row in the table, and so it is certainly not free of charge. To figure out how much space an index will need, you can ask the psql using the di+ command 
In my database, the staggering amount of 344 MB has been burned to store these indexes. Now, compare this to the amount of storage burned by the underlying tables 

The size of both tables combined is just 338 MB. In other words, our indexing needs more space than the actual data. In the real world, this is common and actually pretty likely. Recently, I visited a Cybertec customer in Germany and I saw a database in which 64  of the database size was made up of indexes that were never used (not a single time over the period of months). So, over indexing can be an issue just like under indexing. Remember, these indexes don't just consume space. Every INSERT or UPDATE must maintain the values in the indexes as well. In extreme cases, like our example, this vastly decreases write throughput.

Note that it only makes sense to exclude very frequent values that make up a large part of the table (at least 25  or so). Ideal candidates for partial indexes are gender (we assume that most people are male or female), nationality (assuming that most people in your country have the same nationality), and so on. Of course, applying this kind of trickery requires
some deep knowledge of your data, but it certainly pays off.

Adding data while indexing
Creating an index is easy. However, keep in mind that you cannot modify a table while an index is being built. The CREATE  INDEX command will lock up the table using a SHARE lock to ensure that no changes happen. While this is clearly no problem for small tables, it will cause issues on large ones on production systems. Indexing a terabyte of data or so will take some time and therefore, blocking a table for too long can become an issue.

The solution to the problem is the CREATE  INDEX  CONCURRENTLY command. Building the index will take a lot longer (usually at least twice as long), but you can use the table normally during index creation.

Here is how it works 

Note that PostgreSQL does not guarantee success if you are using the CREATE  INDEX CONCURRENTLY command. An index can end up being marked as invalid if the operations going on on your system somehow conflict with the index creation.



Introducing operator classes
So far, the goal was to figure out what to index and to blindly apply an index on this column or on a group of columns. There is one assumption, however, that we have silently accepted to make this work. Up until now, we have worked on the assumption that the order in which the data has to be sorted is a somewhat fixed constant. In reality, this assumption might not hold true. Sure, numbers will always be in the same order, but other kinds of data will most likely not have a predefined, fixed sort order.

To prove my point, I have compiled a real world example. Take a look at the following two records 

My question now is, are those two rows ordered properly. They might be because one comes before another. However, this is wrong because these two rows do have some hidden semantics. What you see here are two Austrian social security numbers. 09  08
78 actually means August 9, 1978, and 01  05  77 actually means May 1, 1977. The first four numbers consist of a checksum and some sort of auto incremented three digit number. So in reality, 1977 comes before 1978 and we might consider swapping those two lines to achieve the desired sort order.

The problem is that PostgreSQL has no idea what these two rows actually mean. If a column is marked as text, PostgreSQL will apply the standard rules to sort the text. If the column is marked as a number, PostgreSQL will apply the standard rules to sort numbers. Under no circumstances will it ever use something as odd as I've described. If you think that the facts I outlined previously are the only things to consider when processing those numbers, you are wrong. How many months does a year have. 12. Far from true. In the Austrian social security system, these numbers can hold up to 14 months. Why. Remember, ... three digits are simply an auto increment value. The trouble is that if an
immigrant or a refugee has no valid paperwork and if his birthday is not known, he will be assigned an artificial birthday in the 13th month. During the Balkan wars in 1990, Austria offered asylum to over 115,000 refugees. Naturally, this three digit number was not enough, and a 14th month was added. Now, which standard data type can handle this kind of COBOL leftover from the early 1970s (that was when the layout of the social security number was introduced). The answer is, none.
To handle special purpose fields in a sane way, PostgreSQL offers operator classes 

An operator class will tell an index how to behave. Let's take a look at a standard binary tree. It can perform five operations 

The standard operator classes support the standard data types and standard operators we have used throughout this book. If you want to handle social security numbers, it is necessary to come up with your own operators capable of providing you with the logic you need. Those custom operators can then be used to form an operator class, which is
nothing more than a strategy passed to the index to configure how it should behave.
Hacking up an operator class for a b tree
To give you a practical example of what an operator class looks like, I have hacked up some code to handle social security numbers. To keep it simple, I have paid no attention to details such as checksums.
Creating new operators
The first thing that has to be done is come up with the desired operators. Note that five operators are needed. There is one operator for each strategy. A strategy of an index is really like a plugin that allows you to put in your own code.
Before getting started, I have compiled some test data 

Basically, the concept is as follows; operator calls a function, which gets one or two parameters, one for the left argument and one for the right argument of the operator.

As you can see, an operator is nothing more than a function call. So, consequently, it is necessary to implement the logic needed into those functions hidden by the operators. In order to fix the sort order, I have written a function called normalize  si 
 
As you can see, all we did is swap some digits. It is now possible to just use the normal string sort order. In the next step, this function can already be used to compare social security numbers directly.
The first function needed is the less than function, which is needed by the first strategy 
Given this knowledge, the next functions needed by our future operators can be defined 
So far, four functions have been defined. A fifth function for the equals operator is not necessary. We can simply take the existing operator because equals do not depend on sort order anyway.
The function must not be written in SQL. It only works in a procedural or in a compiled language. The reason for that is SQL functions can be inline under some circumstances and this would cripple the entire endeavor.
The second issue is that you should stick to the naming convention used in this chapter—it is widely accepted by the community. Less than functions should
be called    lt, less or equal to functions should be called    le, and so on.
Now that all functions are in place, it is time to define these operators 
So far, four functions have been defined. A fifth function for the equals operator is not necessary. We can simply take the existing operator because equals do not depend on sort order anyway.
Now that all functions are in place, it is time to define these operators 
Depending on the type of index you are using, a couple of support functions are needed. In the case of standard b trees, there is only one support function needed, which is used to speed things up internally 

The si  same function will either return  1 if the first parameter is smaller, 0 if both parameters are equal, and 1 if the first parameter is greater. Internally, the     same function is the workhorse, so you should make sure that your code is optimized.
Creating operator classes
Finally, all components are in place and it is finally possible to create the operator class needed by the index 
Note that the operator class has a name and that it has been explicitly defined to work with b trees. The operator class can already be used during index creation 
Testing custom operator classes
In our example, the test data consists of just two rows. Therefore, PostgreSQL will never use an index because the table is just too small to justify the overhead of even opening the
index. To be able to still test without having to load too much data, you can advise the optimizer to make sequential scans more expensive.
 Making operations more expensive can be done in your session using the following instruction 
Understanding PostgreSQL index types
So far, only binary trees have been discussed. However, in many cases, b trees are just not enough. Why is that the case As discussed in this chapter, b trees are basically based on sorting. Operators can be handled using b trees. The trouble is, not all.

Data types can be sorted in a useful way. Just imagine a polygon. How would you sort these objects in a useful way. Sure, you can sort by the area covered, its length or so, but doing this won't allow you to actually find them using a geometric search.

The solution to the problem is to provide more than just one index type. Each index will serve a special purpose and do exactly what is needed. The following index types are available (as of PostgreSQL 10.0) 
Take a look at the tree. You will see that R1 and R2 are on top. R1 and R2 are the bounding boxes containing everything else. R3, R4, and R5 are contained by R1. R8, R9, and R10 are contained by R3, and so on. A GiST index is therefore hierarchically organized. What you can see in the diagram is that some operations, which are not available in b trees are supported. Some of those operations are overlaps, left of, right of, and so on. The layout of a GiST tree is ideal for geometric indexing.
 


Extending GiST
Of course, it is also possible to come up with your own operator classes. The following strategies are supported 
If you want to write operator classes for GiST, a couple of support functions have to be provided. In the case of a b tree, there is only the same function   GiST indexes provide a lot more 
Implementing operator classes for GiST indexes is usually done in C. If you are interested in a good example, I advise you to check out the btree  GiST module in
the contrib directory. It shows how to index standard data types using GiST and is a good source of information as well as inspiration.


GIN indexes
Generalized inverted (GIN) indexes are a good way to index text. Suppose you want to index a million text documents. A certain word may occur millions of times. In a normal b  tree, this would mean that the key is stored millions of times. Not so in a GIN. Each key (or word) is stored once and assigned to a document list. Keys are organized in a standard b  tree. Each entry will have a document list pointing to all entries in the table having the same key. A GIN index is very small and compact. However, it lacks an important feature found in the b trees sorted data. In a GIN, the list of item pointers associated with a certain key is sorted by the position of the row in the table and not by some arbitrary criteria.

Extending GIN
Just like any other index, GIN can be extended. The following strategies are available 

On top of this, the following support functions are available 

If you are looking for a good example of how to extend GIN, consider looking at
the btree  gin module in the PostgreSQL contrib directory. It is a valuable source of
information and a good way to start your own implementation.

If you are interested in full text search, more information will be provided later on in this chapter.

SP GiST indexes
Space partitioned GiST (SP GiST) has mainly been designed for in memory use. The reason for this is an SP GiST stored on disk needs a fairly high number of disk hits to function. Disk hits are way more expensive than just following a couple of pointers in RAM.

The beauty is that SP GiST can be used to implement various types of trees such as quad  trees, k d trees, and radix trees (tries).

The following strategies are provided 

To write your own operator classes for SP GiST, a couple of functions have to be provided 

BRIN indexes
Block range indexes (BRIN) are of great practical use. All indexes discussed until now need quite a lot of disk space. Although a lot of work has gone into shrinking GIN indexes and
the like, they still need quite a lot because an index pointer is needed for each entry. So, if there are 10 million entries, there will be 10 million index pointers. Space is the main concern addressed by the BRIN indexes. A BRIN index does not keep an index entry for each tuple but will store the minimum and the maximum value of 128 (default) blocks of data (1 MB). The index is therefore very small but lossy. Scanning the index will return more data than we asked for. PostgreSQL has to filter out these additional rows in a later step.

The following example demonstrates how small a BRIN index really is 

In my example, the BRIN index is 2,000 times smaller than a standard b tree. The question naturally arising now is, why don't we always use BRIN indexes. To answer this kind of question, it is important to reflect on the layout of BRIN; the minimum and maximum value for 1 MB are stored. If the data is sorted (high correlation), BRIN is pretty efficient because we can fetch 1 MB of data, scan it, and we are done. However, what if the data is shuffled. In this case, BRIN won't be able to exclude chunks of data anymore because it is very likely that something close to the overall high and the overall low is within 1 MB of data. Therefore, BRIN is mostly made for highly correlated data. In reality, correlated data is
quite likely in data warehousing applications. Often, data is loaded every day and therefore dates can be highly correlated.

Extending BRIN indexes
BRIN supports the same strategies as a b tree and therefore needs the same set of operators. The code can be reused nicely 

Adding additional indexes
Since PostgreSQL 9.6, there has been an easy way to deploy entirely new index types as extensions. This is pretty cool because if those index types provided by PostgreSQL are not enough, it is possible to add additional ones serving precisely your purpose. The instruction to do this is CREATE  ACCESS  METHOD 

Don't worry too much about this command—just in case you ever deploy your own index type, it will come as a ready to use extension.

One of these extensions implements bloom filters. Bloom filters are probabilistic data structures. They sometimes return too many rows but never too few. Therefore, a bloom filter is a good method to pre filter data.
How does it work. A bloom filter is defined on a couple of columns. A bitmask is calculated based on the input values, which is then compared to your query. The upside of a bloom filter is that you can index as many columns as you want. The downside is that the entire bloom filter has to be read. Of course, the bloom filter is smaller than the underlying data and so it is, in many cases, very beneficial.
To use bloom filters, just activate the extension, which is a part of the PostgreSQL contrib package 
As stated previously, the idea behind a bloom filter is that it allows you to index as many columns as you want. In many real world applications, the challenge is to index many columns without knowing which combinations the user will actually need at runtime. In the case of a large table, it is totally impossible to create standard b tree indexes on, say, 80 fields or more. A bloom filter might be an alternative in this case 
Note that I have queried a combination of random columns; they are not related to the actual order in the index. The bloom filter will still be beneficial.
If you are interested in bloom filters, consider checking out the website 
Achieving better answers with fuzzy searching
Performing precise searching is not the only thing expected by users these days. Modern websites have educated users in a way that they always expect a result, regardless of the user input. If you search on Google, there will always be an answer even if the user input is wrong, full of typos, or simply pointless. People expect good results regardless of the input data.
Taking advantage of pg  trgm
To do fuzzy searching with PostgreSQL, you can add the pg  trgm extension. To activate the extension, just run the following instruction 

Speeding up LIKE queries
LIKE queries definitely cause some of the worst performance problems faced by people around the globe these days. In most database systems, LIKE is pretty slow and requires a sequential scan. In addition to that, end users quickly figure out that a fuzzy search will, in many cases, return better results than precise queries. A single type of LIKE query on a large table can, therefore, often cripple the performance of an entire database server if it is called often enough.

Fortunately, PostgreSQL offers a solution to the problem and the solution happens to be installed already 

Handling regular expressions
However, this is still not everything. Trigram indexes are even capable of speeding up simple regular expressions. The following example shows how this can be done 

PostgreSQL will inspect the regular expression and use the index to answer the question.

Internally, PostgreSQL can transform the regular expression into a graph and traverse the index accordingly.


Understanding full text search   FTS
If you are looking up names or for simple strings, you are usually querying the entire content of a field. In Full Text Search (FTS), this is different. The purpose of the full text search is to look for words or groups of words, which can be found in a text. Therefore, FTS is more of a contains operation as you are basically never looking for an exact string.

In PostgreSQL, FTS can be done using GIN indexes. The idea is to dissect a text, extract valuable lexemes (= "preprocessed tokens of words"), and index those elements rather than the underlying text. To make your search even more successful, those words are preprocessed.

The example shows a simple sentence. The to  tsvector function will take the string, apply English rules, and perform a stemming process. Based on the configuration (english), PostgreSQL will parse the string, throw away stop words, and stem individual words. For example, car and cars will be transformed to the car. Note that this is not about finding the word stem. In the case of many, PostgreSQL will simply transform the string
to mani by applying standard rules working nicely with the English language.

Comparing strings
After taking a brief look at the stemming process, it is time to figure out how a stemmed text can be compared to a user query. The following code snippet checks for the
word wanted 
In this case, false is returned because bmw cannot be found in our input string. In
the to  tsquery function, & means and and | means or. It is therefore easily possible to
build complex search strings.
Defining GIN indexes
If you want to apply text search to a column or a group of columns, there are basically two choices 
Create a functional index using GIN
Add a column containing ready to use tsvectors and a trigger to keep them in sync
In this section, both options will be outlined. To show how things work, I have created some sample data 
Deploying an index on the function is easy, but it can lead to some overhead. Adding a materialized column needs more space, but will lead to a better runtime behavior 
Fortunately, PostgreSQL already provides a C function that can be used by a trigger to sync the tsvector column. Just pass a name, the desired language, as well as a couple of columns to the function, and you are already done. The trigger function will take care of all that is needed. Note that a trigger will always operate within the same transaction as the statement making the modification. Therefore, there is no risk of being inconsistent.

Debugging your search
Sometimes, it is not quite clear why a query matches a given search string. To debug your query, PostgreSQL offers the ts  debug function. From a user's point of view, it can be used just like
ts  debug will list every token found and display information about the token. You will see which token the parser found, the dictionary used, as well as the type of object. In my example, blanks, words, and hosts have been found. You might also see numbers, email addresses, and a lot more. Depending on the type of string, PostgreSQL will handle things differently. For example, it makes absolutely no sense to stem hostnames and e mail addresses.

Gathering word statistics
Full text search can handle a lot of data. To give end users more insights into their texts, PostgreSQL offers the pg  stat function, which returns a list of words 
The word column contains the stemmed word, ndoc tells us about the number of documents a certain word occurs. nentry indicates how often a word was found all together.

Taking advantage of exclusion operators
So far, indexes have been used to speed things up and to ensure uniqueness. However, a couple of years ago, somebody came up with the idea of using indexes for even more. As you have seen in this chapter, GiST supports operations such as intersects, overlaps, contains, and a lot more. So, why not use those operations to manage data integrity.

Here is an example 

The use of exclusion operators is very useful and can provide you with highly advanced means to handle integrity.


Summary
This chapter was all about indexes. You learned when PostgreSQL will decide on an index and which types of indexes exist. On top of just using indexes, it is also possible to implement your own strategies to speed up your applications with custom operators and indexing strategies.

For those of you who really want to take things to the limit, PostgreSQL offers custom access methods.

Chapter  4, Handling Advanced SQL, is all about advanced SQL. Many people are not aware of what SQL is really capable of, and therefore, I am going to show people some efficient, more advanced SQL stuff.

In Chapter  3,  Making Use of Indexes, you learned about indexing as well as about PostgreSQL's ability to run custom indexing code to speed up queries. In this chapter, you will learn about advanced SQL. Most readers of this book will have some experience of using SQL. However, experience has shown that the advanced features outlined in this book are not widely known and therefore it makes sense to cover them in this context to help people to achieve their goals faster and more efficiently. There has been a long discussion on whether the database is just a simple data store or whether business logic should be in the database or not. Maybe this chapter will shed some light and show how capable a modern relational database really is.

This chapter is about modern SQL and its features. A variety of different and sophisticated
SQL features are included and presented in detail. The topics covered are 

Grouping sets Ordered sets Hypothetical aggregates
Windowing functions and analytics

At the end of the chapter, you will be able to understand and use advanced SQL.

Introducing grouping sets
Every advanced user of SQL should be familiar with GROUP  BY and HAVING clauses. But are you also aware of CUBE, ROLLUP, and GROUPING  SETS. If not, you might find this chapter worth reading.
 
Loading some sample data
To make this chapter a pleasant experience for you, I have compiled some sample data, which has been taken from the BP energy report  http   www.bp.com en global corporate energy economics statistical review of wo
rld energy downloads.html.

Here is the data structure that will be used 

As in the previous chapter, you can download the file before importing it. On some operating systems, curl is not there by default or has not been installed, so downloading the file before might be an easier option for many people.

There is data for between 1965 and 2010, for 14 nations in two regions of the world 

Applying grouping sets
The GROUP  BY clause will turn many rows into one row per group. However, if you do reporting in real life, you might also be interested in the overall average. One additional line might be needed.

ROLLUP will inject an additional line, which will contain the overall average. If you do reporting, it is highly likely that a summary line will be needed. Instead of running two queries, PostgreSQL can provide the data by running just a single query. There is also a second thing you might notice here  different versions of PostgreSQL might return data in a different order. The reason for that is that in PostgreSQL 10.0 the way those grouping sets are implemented has improved significantly. Back in 9.6 and before, PostgreSQL had to do
a lot of sorting. Starting with 10.0, it is already possible to use hashing for those operations,
which will speed things up, dramatically in many cases 
In this example, PostgreSQL will inject three lines into the result set. One line will be
injected for the Middle East and one for North America. On top of that, we will get a line for
the overall averages. If you are building a web application, the current result is ideal
because you can easily build a GUI to drill into the result set by filtering out the null values.
ROLLUP is suitable when you instantly want to display a result. I have always used it to display final results to end users. However, if you are doing reporting, you might want to pre calculate more data to ensure more flexibility. The CUBE keyword is what you might have been looking for 
Note that even more rows have been added to the result. CUBE will create the same data as  GROUP  BY  region,  country  +  GROUP  BY  region  +  GROUP  BY  country  +  the overall  average. So, the whole idea is to extract many results and various levels of aggregation at once. The resulting cube contains all possible combinations of groups.

Investigating performance
Grouping sets is a powerful feature; they help to reduce the number of expensive queries. Internally, PostgreSQL will basically turn to traditional GroupAggregates to make things work. A GroupAggregate node requires sorted data, so be prepared that PostgreSQL might do a lot of temporary sorting 

In PostgreSQL, hash aggregates are only supported for normal GROUP  BY clauses involving no grouping sets. In PostgreSQL 10.0, the planner already has more options than in PostgreSQL 9.6. Expect grouping sets to be faster in the new version.

Combining grouping sets with the FILTER clause
In real world applications, grouping sets can often be combined with FILTER clauses. The idea behind the FILTER clause is to be able to run partial aggregates.

Here is an example 

The idea here is that not all columns will use the same data for aggregation. The FILTER clauses allow you to selectively pass data to those aggregates. In my example, the second aggregate will only consider data before 1990, while the third aggregate will take care of more recent data.

If it is possible to move conditions to a WHERE clause, it is always more desirable, as less data has to be fetched from the table. FILTER is only useful if the data left by the WHERE clause is not needed by each aggregate.

Making use of ordered sets
Ordered sets are powerful features, but are not widely regarded as such and not widely known in the developer community. The idea is actually quite simple  data is grouped normally and then the data inside each group is ordered given a certain condition. The calculation is then performed on this sorted data.

A classic example would be the calculation of the median.
The median is the middle value. If you are, for example, earning the median income, the number of people earning less and more than you is identical. 50  of people do better and 50  of people do worse.

One way to get the median is to take sorted data and move 50  into the dataset. This is an example of what the WITHIN  GROUP clause will ask PostgreSQL to do 

The  percentile  disc function will skip 50  of the group and return the desired value. Note that the median can significantly deviate from the average. In economics, the deviation between median and average income can even be used as an indicator for social equality or inequality. The higher the median compared to the average, the more the income inequality. To provide more flexibility, the ANSI standard does not just propose a
median function. Instead, percentile  disc allows you to use any value between 0 and 1.


In this case, PostgreSQL will again inject additional lines into the result set.

As proposed by the ANSI SQL standard, PostgreSQL provides you with two percentile   functions. The percentile  disc function will return a value that is really contained by the dataset. The percentile  cont function will interpolate a value if no exact match is found. The following example shows how this works 
4 is a value that really exists–3.48 has been interpolated. The percentile    functions are not the only ones provided by PostgreSQL. To find the most frequent value within a group, the mode function is available. Before showing an example of how to use the mode function, I have compiled a query telling us a bit more about the content of the table 

Understanding hypothetical aggregates
Hypothetical aggregates are pretty similar to standard ordered sets. However, they help to answer a different kind of question  what would be the result if a value was there. As you can see, this is not about values inside the database but about the result if a certain value was actually there.

The only hypothetical function provided by PostgreSQL is rank 

It tells us  If somebody produced, 9000 barrels per day, it would be ranked the 27th best year in North  America and 21st in the Middle  East.

In my example, I used NULLS  LAST. When data is sorted, nulls are usually at the end. However, if sort order is reversed, nulls should still be at the end of the list. NULLS  LAST ensures exactly that.
In the previous chapter, you learned a lot about  advanced SQL and ways to see SQL in a different light. However, database work does not only consist of hacking up fancy SQL. Sometimes, it is also about  keeping things  running in a professional manner. To do that, it is highly  important to keep an eye on system statistics,  log files, and so on. Monitoring is the key to running databases professionally.
In this chapter, you will learn these topics: 
Gathering runtime statistics
Creating log files
Gathering important information
Making sense of database statistics
At the end of this chapter you will be able to configure PostgreSQL’s logging infrastructure properly and take care of logfiles in the most professional way possible.
Gathering runtime statistics
The first thing  you really have to learn is to use and understand what  PostgreSQL's onboard statistics have got to offer. In my personal opinion: There is no way to improve performance and reliability without first collecting  the data  to make prudent decisions.
This section will guide you through PostgreSQL's runtime statistics and explain  in detail how you can extract more data  from your database setups.
Working with PostgreSQL system views
PostgreSQL offers a large set of system views  that allow administrators and developers alike to take a deep  look into what  is really going  on in their system. The trouble is that many people actually collect all this data  but cannot  make real sense out of it. The general rule is this: there  is no point  in drawing a graph for something you don't understand anyway. The goal in this section, therefore, is to shed  some light on what  PostgreSQL has to offer to hopefully make it easier for people to fully take advantage of what  is there  to serve them.
Checking live traffic
Whenever I inspect a system, there  is a system view I like to inspect first before digging deeper. I am, of course,  talking about  pg stat activity. The idea behind the view is to give you a chance to figure  out what  is going  on right now.
Here is how it works
pg stat activity will provide you with one line per active connection. You will see the internal object ID of the database (datid), the name  of the database somebody is connected to, as well as the process  ID serving this connection (pid). On top of that, PostgreSQL will tell you who is connected (usename; note the missing r) and that user's  internal object ID (usesysid).
Then there  is a field called application name, which  is worth commenting on a bit more extensively. In general, application name can be set freely by the end user:
The point  is this: assume thousands of connections are coming  from a single IP. Can you, as the administrator, tell what  a specific connection is really doing  right now? You might not know  all the SQL by heart.  If the client is kind enough to set an application name parameter, it is a lot easier to see what  the purpose of a connection really is. In my example, I have set the name  to the domain the connection belongs to. This makes  it easy to find similar  connections, which  might cause similar  problems.
The next three  columns (client ) will tell you where a connection comes from. PostgreSQL will show  IP addresses and (if it has been configured) even hostnames.
backend start will tell you when a certain  connection has started. xact start indicates when a transaction has started. Then there  are query start and state change. Back in the dark  old days, PostgreSQL would only show  active queries. During a time when queries took a lot longer  than  today,  this made sense of course.  On modern hardware, OLTP queries might only consume a fraction  of a millisecond, and therefore it is hard to catch such queries doing  potential harm.  The solution was to either  show  the active query  or the previous query  executed by the connection you are looking at.
Here is what  you might see:
The query  is now marked as idle. The difference between state change and query start is the time the query  needed to execute.
pg stat activity will therefore give you a great overview of what  is going  on in your system right now. The new state change field makes  it a lot more likely to spot expensive queries.
The question now is this: once you have found bad queries, how can you actually get rid of them?  PostgreSQL provides two functions to take care of these things: pg cancel backend and pg terminate backend. The pg cancel backend function will terminate the query  but will leave the connection in place.
The pg terminate backend function is a bit more radical  and will kill the entire database connection along with the query.
If you want  to disconnect all other  users  but yourself, here is how you can do that:
Inspecting databases
Once you have inspected active database connections, you can dig deeper and inspect database level statistics.  pg stat database will return one line per database inside  your PostgreSQL instance.
This is what  you can find there:
Next to the database ID and the database name,  there  is a column called numbackends that shows  the number of database connections that are currently open.
Then there  are xact commit and xact rollback. These two columns indicate whether your  application tends  to commit or roll back. blks hit and blks read will tell you about cache hits and cache misses. When inspecting these two columns, keep in mind that we are mostly  talking about  shared buffer hits and shared buffer misses. There is no reasonable way on the database level to distinguish filesystem cache hits and real disk hits. At Cybertec (https:  www.cybertec postgresql.com), we like to correlate disk wait with cache misses in pg stat database to get an idea of what  really goes on in the system.
The tup  columns will tell you whether there  is a lot of reading or a lot of writing going  on in your  system.
Then we have temp files and temp bytes. These two columns are of incredible importance because they will tell you whether your  database has to write  temporary files to disk, which  will inevitably slow down operations. What can be the reasons for high temporary file usage?  The major reasons are as follows:
Poor settings: If your  work mem settings are too low, there  is no way to do anything in RAM, and therefore PostgreSQL will go to disk.
Stupid operations: It happens quite frequently that people torture their system with fairly expensive, pointless queries. If you see many  temporary files on an OLTP system, consider checking  for expensive queries.
Indexing and other administrative tasks: Once in a while, indexes might be created or people might run DDLs. These operations can lead to temporary file I O but are not necessarily considered a problem (in many  cases).
In short,  temporary files can happen even if your  system is perfectly fine. However, it definitely makes  sense to keep an eye on them  and ensure that temp  files are not needed frequently.
Finally there  are two more important fields: blk read time and blk write time. By default, these two fields are empty and no data  is collected.  The idea behind these fields is to give you a way to see how much  time was spent  on IO. The reason these fields are empty is that track io timing is off by default. This is for good reasons. Imagine you want  to check how long it takes to read  1 million  blocks. To do that, you have to call the time function in your  C library  twice, which  leads to 2 million  additional function calls just to read  8 GB of data.  It really depends on the speed  of your  system whether this will lead to a lot of overhead or not.
Fortunately there  is a tool that helps  you to determine how expensive the timing is
In my case, the overhead of turning track io timing on for a session  or in the postgresql.conf file is around 23 nanoseconds, which  is fine. Professional high end servers can provide you with numbers as low as 14 nanoseconds, while really bad virtualization can return values up to 1,400 nanoseconds or even 1,900 nanoseconds. If you are using some cloud service, you can expect around 100 – 120 nanoseconds (in most cases). In case you are confronted with four digit values,  measuring the I O timing might surely  lead to real measurable overhead, which  will slow down your  system. The general rule is this: on real hardware, timing is not an issue; on virtual systems, check it out before you turn  it on.
It is also possible to turn things on selectively by using ALTER DATABASE,
ALTER  USER, or the like.
Inspecting tables
Once you have gained an overview of what  is going  on in your  databases, it might be a good idea to dig deeper and see what  is going  on in individual tables. Two system views are here to help you: pg stat user tables and pg statio user tables. Here is the first one:
By my judgment, pg stat user tables is one of the most important but also one of the most misunderstood or even ignored system views.  I have a feeling that many  people read it but fail to extract the full potential of what  can really be seen here. When used properly, pg stat user tables can, in some cases, be nothing short  of a revelation.
Before we dig into the interpretation of data,  it is important to understand which  fields are actually there. First of all, there  is one entry  for each table, which  will show  us the number of sequential scans that happened on the table (seq scan). Then we have seq tup read, which  tells us how many  tuples the system has to read  during those sequential scans.
Remember the seq tup read column, it contains vital information, which  can help to find performance problems.
idx scan is next on the list. It will show  us how often an index was used  for this table. PostgreSQL will also show  how many  rows  those scans returned. Then there  are a couple  of columns starting with n tup . Those will tell us how much  we inserted, updated, and deleted. The most important thing  here is related to HOT UPDATE. When running an UPDATE, PostgreSQL has to copy a row to ensure that ROLLBACK will work  correctly. HOT UPDATE is pretty good because it allows  PostgreSQL to ensure that a row does not have to leave a block. The copy of the row stays inside  the same block, which  is beneficial  for performance in general. A fair amount of HOT UPDATE indicates that you are on the right track in case of UPDATE intense workload. The perfect  ratio between normal and HOT UPDATE cannot  be stated here for all use cases. People  have really got to think  for themselves to figure,  which  workload benefits  from many  in place  operations. The general rule is this: the more UPDATE intense your  workload is, the better  it is to have many  HOT UPDATE clauses.
Finally there  are some VACUUM statistics,  which  mostly  speak  for themselves.
Making sense of pg stat user tables
Reading all this data  might be interesting; however, unless  you are able to make sense out of it, it is pretty pointless. One way to use pg stat user tables is to detect  which  tables might need an index. One way to get a clue of the right direction is to use the following query, which  has served me well over the years:
The idea is to find large tables, which  have been used  frequently in a sequential scan. Those tables will naturally come out on top of the list to bless us with enormously high seq tup read values,  which can be mind blowing.
Work your  way from top to bottom and look for expensive scans. Keep in mind that sequential scans are not necessarily bad. They appear naturally in backups, analytical statements, and so on without causing any harm. However, if you are running large sequential scans all the time, your performance will go down the drain.
Note that this query  is really golden  it will help you to spot tables with missing indexes. Practical  experience of close to two decades has shown again  and again  that missing indexes are the single most important reason for bad performance. Therefore the query  you are looking at is literally  gold.
Once you are done  looking for potentially missing indexes,  consider taking  a brief look on the caching  behavior of your  tables. pg statio user tables will contain information about  all kinds  of things  such as caching  behavior of the table (heap blks ), of your indexes (idx blks ), as well as the oversized attribute storage technique (TOAST) tables. Finally you can find out more about  TID scans, which  is usually not relevant to the overall performance of the system:
Although pg statio user tables contains important information, it is usually the case that pg stat user tables is more likely to provide you with a really relevant insight (such as a missing index or so).
Digging into indexes
While pg stat user tables is important to spotting missing indexes,  it is sometimes necessary to find indexes which  should really not exist. Recently,  I was on a business trip to Germany and discovered a system that contained mostly  pointless indexes (74  of the total storage consumption). While this might not be a problem if your  database is really small, it does make a difference in case of large systems  having hundreds of gigabytes of pointless indexes can seriously harm  your  overall  performance.
pg stat user indexes can be inspected to find those pointless indexes:
The output of this statement is highly  useful.  It doesn't only contain information about  how often an index was used  it also tells us how much  space has been wasted for each index. Finally, it adds up all the space consumption in column 6. You can now go through the table and rethink all those indexes that have rarely  been used.  It is hard to come up with a general rule when to drop an index so some manual checking  makes  a lot of sense.
Do not just blindly drop indexes.  In some cases, indexes are simply  not used  because end users  use the application differently than  expected. In case end users  change  (a new secretary is hired  or so on), an index might very well turn  into a useful  object again.
There is also a view called pg statio user indexes, which  contains caching  information about  an index. Although it is interesting, it usually does not contain information leading to big leaps forward. 
Tracking the background worker
In this section, it is time to take a look at the background writer statistics.  As you might know,  database connections will in many  cases not write  blocks to disks directly. Instead, data  is written by the background writer process  or by the checkpointer.
To see how data  is written, the pg stat bgwriter view can be inspected:
The first thing  that should catch your  attention here is the first two columns. You will learn later in this book that PostgreSQL will perform regular checkpoints, which  are necessary to ensure that data  has really made it to disk. If your  checkpoints are too close to each other, checkpoint req might point  you in the right direction. If requested checkpoints are high, it can mean  that a lot of data  is written and that checkpoints are always triggered because of high throughput. In addition to that, PostgreSQL will tell you about  the time needed to write  data  during a checkpoint and the time needed to sync. In addition to that, buffers checkpoint indicates how many  buffers  were written during the checkpoint, and how many  were written by the background writer (buffers clean).
But there  is more: maxwritten clean tells us about  the number of times the background writer stopped a cleaning scan because it had written too many  buffers.
Finally, there  are buffers backend (number of buffers  directly written by a backend database connection), buffers backend fsync (number of buffers  flushed by a database connection), and buffers alloc, which  contains the number of buffers  allocated. In general it is not a good thing if database connections start to write their own stuff themselves.

Tracking, archiving, and streaming
In this section, we will take a look at some replication and transaction log archiving related features. The first thing  to inspect is pg stat archiver, which  tells us about  the archiver process  moving the transaction log (WAL) from the main server  to some backup device:
pg stat archiver contains important information about  your  archiving process.  First of all, it will inform  you about  the number of transaction log files, which  have been archived (archived count). It will also know  the last file, that was archived and when that happened (last archived wal and last achived time).
While knowing the number of WAL files is certainly interesting, it is not really that important. Therefore consider taking  a look at failed count and last failed wal. If your  transaction log archiving failed, it will tell you the latest file that failed and when that happened. It is recommended to keep an eye on those fields because otherwise it might happen that archiving does not work  without you even noticing.
If you are running streaming replication, the following two views  will be really important for you. The first one is called pg stat replication and will provide information about the streaming process  from the master to the slave. One entry  per WAL sender process  will be visible. If there  is no single entry,  there  is no transaction log streaming going  on, which might not be what  you want.
Let us take a look at pg stat replication:
You will find columns to indicate the username connected via streaming replication. Then there  is the application name  along with connection data  (client ). Then, PostgreSQL will tell us when the streaming connection has started. In production, a young connection can point  to a network problem or to something even worse  (reliability issues and so on). The state column shows  in which  state the other  side of the stream is. Note that there  will be more information on this in Chapter  10, Making Sense of Backups and Replication.
There are fields telling  us how much  transaction log has been sent over the network connection (sent location), how much  has been sent to the kernel  (write location), how much  has been flushed to disk (flush location), and how much  has already been replayed (replay location). Finally the sync status is listed. Since PostgreSQL 10.0 there are also additional fields, which already contain the time difference between the master and the slave. The   lag fields contain intervals, which give some indication about the actual time difference between your servers. 
While pg stat replication can be queried on the sending server  of a replication setup, pg stat wal receiver can be consulted on the receiving end. It provides similar information and allows  this information to be extracted on the replica  (or alike).
First of all, PostgreSQL will tell us the process  ID of the WAL receiver  process.  Then the view shows  us the status of the connection in use. receive start lsn will tell us the transaction log position used  when the WAL receiver  was started. receive start tli contains the timeline in use, when the WAL receiver  was started. At some point,  you might want  to know  the latest WAL position and timeline. To get those two numbers, use received lsn and received tli.
In the next two columns, there  are two timestamps: last msg send time and last msg receipt time. The first one says when a message was last sent and when it was received.
latest end lsn contains the last transaction log position reported to the WAL sender process  at latest end time. Finally, there  are the slot name and an obfuscated version of the connection information.
Checking SSL connections
Many people running PostgreSQL use SSL to encrypt connections from the server  to the client. More recent versions of PostgreSQL provide a view to gain an overview of those encrypted connections, which  is pg stat ssl:
Every process  is represented by the process  ID. If a connection uses SSL, the second column is set to true. The third and fourth column will define  the version as well as the cipher. Finally, there  are the number of bits used  by the encryption algorithm, an indicator of whether compression is used  or not, as well as the distinguished name (DN) field from the client certificate.
Inspecting transactions in real time
Up to now, a couple  of statistics tables have already been discussed. The idea behind all of them  is to see what  is going  on in the entire  system. But what  if you are a developer who wants to inspect an individual transaction? pg stat xact user tables is here to help. It does not contain system wide transactions but only data  about  your  current transaction:
Developers can therefore look into a transaction just before it commits to see whether it has caused any performance issues. It helps  to distinguish overall  data  from what  has just been caused by your  application. 
The ideal way for application developers to use this view is to add  a function call in the application before commit to track, what  the transaction has done.  This data  can then be inspected so that the output of the current transaction can be distinguished from the overall workload.
Tracking vacuum progress
In PostgreSQL 9.6, the community introduced a system view many  people have been waiting for. For many  years, people wanted to track the progress of a vacuum process  to see how long things  might still take.
pg stat progress vacuum has been invented to answer those questions:
Most of the columns speak  for themselves, and therefore I won't go into too much  detail. There are just a couple  of things  that should be kept in mind.  First of all, the process  is not linear  it can jump quite a bit. In addition to that, vacuum is usually pretty fast so progress can be rapid and hard to track.
Using pg stat statements
After discussing the first couple  of views,  it is time to turn  our attention to one of the most important views,  which  can be used  to spot performance problems. I am of course  speaking about  pg stat statements. The idea is to have information about  queries on your system. It helps  to figure  out which  types  of queries are slow and how often queries are called. 
To use the module, three  steps are necessary:
Add  pg stat statements to shared preload libraries in the postgresql.conf file.
Restart  the database server.
Run CREATE  EXTENSION  pg stat statements in the database(s) of your choice.

Let's inspect the definition of the view
provides one line per query. By default it tracks 5,000 (can be changed by setting  pg stat statements.max). 
Queries and parameters are separated. PostgreSQL will put placeholders into the query. This allows  identical queries, which  just use different parameters, to be aggregated. SELECT  ...  FROM  x  WHERE  y  =  10 will be turned into SELECT  ...  FROM  x  WHERE  y  =  ?.
For each query, PostgreSQL will tell us the total time it has consumed along with the number of calls. In more recent versions, min time, max time, mean time, and stddev have been added. The standard deviation is especially noteworthy because it will tell us whether a query  has stable or fluctuating runtimes. Unstable runtimes can happen for various reasons:
If  the data  is not fully cached  in RAM, queries, which  have to go to disk,  will take a lot longer  than  their cached  counterparts
Different  parameters can lead to different plans  and totally  different result  sets
Concurrency and locking can have an impact
PostgreSQL will also tell us about  the caching  behavior of a query. The shared  columns show  how many  blocks came from the cache ( hit) or from the operating system ( read). If many  blocks come from the operating system, the runtime of a query  might fluctuate.
The next block of columns is all about  local buffers.  Local buffers  are memory blocks allocated by the database connection directly.
On top of all this information, PostgreSQL provides information about  temporary file I O. Note that temporary file I O will naturally happen in case a large index is built or in case some other  large DDL is executed. However, temporary files are usually a very bad thing  to have in OLTP as it will slow down the entire  system by potentially blocking the disk. A high amount of temporary file I O can point  to a couple  of undesirable things.  The following list contains my top three:
Undesirable work mem settings (OLTP)
Suboptimal maintenance work mem settings (DDLs) 
Queries, which  should not be run in the first place
Finally there  are two fields containing information about  I O timing. By default, those two fields are empty. The reason for this is that measuring timing can be quite a lot of overhead on some systems. Therefore, the default value  for track io timing is false  remember to turn  it on if you need this data. 
Once the module has been enabled, PostgreSQL is already collecting  data  and you can use the view.
Never  run SELECT     FROM  pg stat statements in front of a customer. More than  once, people have started pointing at queries. They happened to know  and started to explain  why, who, what,  when,  and so on. When you use this view, always create a sorted output so that the most  relevant information can be seen instantly.
Here at Cybertec, we have found the following query  very helpful to gain an overview of what  is happening on the database server:
It shows  the top 10 queries and their runtime including a percentage. It also makes  sense to display the average execution time of the query  so that you can decide  whether the runtime of those queries is too high or not. 
Work your  way down the list and inspect all queries, which  seem to run too long on average.
Keep in mind that working through the top 1,000 queries is usually not worth it. In most cases, the first queries are already responsible for most of the load on the system.
In my example, I have used  a substring to shorten the query  to fit on a page. This makes  no sense if you really want  to see what  is going  on.
Remember that pg stat statements will by default cut off queries at 1024 bytes:
Consider increasing this value  to, say, 16,384. If your  clients are running Java applications based  on Hibernate, a larger  value  of track activity query size will ensure that queries are not cut off before the interesting part  is shown.
At this point,  I want  to use the situation to point  out how important pg stat statements really is. It is by far the easiest way to track down performance problems. A slow query  log can never  be as useful  as pg stat statements, because a slow query  log will only point to individual slow queries  it won't show  us problems caused by tons of medium queries. Therefore, it is recommended to always turn  this module on. The overhead is really small and in no way harms the overall  performance of the system.
By default, 5,000 types  of queries are tracked (as of PostgreSQL 9.6). In most reasonably sane applications, this will be enough.
To reset the data,  consider using  the following instruction:
Creating log files
After taking  a deep  look at the system views  provided by PostgreSQL, it is time to configure logging.  Fortunately, PostgreSQL provides easy means to work  with log files and helps people to set up a good configuration easily. Collecting logs is important because it can point to errors  and potential database problems.
The postgresql.conf file has all the parameters you need to provide you with all the information.
Configuring postgresql.conf file
In this section, we will go through some of the most important entries  in the postgresql.conf file to configure logging and see how logging can be used  in the most beneficial  way.
Before we get started, I want  to say a few words about  logging in PostgreSQL in general. On Unix systems, PostgreSQL will send  log information to stderr by default. However, stderr is not a good place for logs to go because you will surely  want  to inspect the log stream at some point.  Therefore, it really makes  sense to work  through this chapter to adjust things  to your  needs.
Defining log destination and rotation
Let us go through the postgresql.conf file and see what  can be done:
The first configuration option defines  how the log is processed. By default, it will go to stderr (on Unix). On Windows, the default is eventlog, which  is the Windows onboard tool to handle logging.  Alternatively, you can choose to go with csvlog or syslog.
In case you want  to make PostgreSQL log files, you should go for stderr and turn  the logging collector on. PostgreSQL will then create log files.
The logical question now is: what  will the names of those log files be and where will those files be stored? postgresql.conf has the answer:
log directory will tell the system where to store the log. If you are using  an absolute path,  you can explicitly configure where logs will go. If you prefer  the logs to be in the PostgreSQL data  directly, simply  go for a relative  path.  The advantage is that the data directory will be self contained and you can move it without having to worry.
In the next step, you can define  the filename PostgreSQL is supposed to use. PostgreSQL is very flexible and allows  you to use all the shortcuts provided by strftime. To give you an impression of how powerful this feature is, a quick count  on my platform reveals  that strftime provides 43 (!) placeholders to create the filename. Everything people usually need is certainly possible.
Once the filename has been defined, it makes  sense to briefly think  about  cleanup. The following settings will be available:
By default, PostgreSQL will keep producing log files in case files are older than  one day or larger  than  10 MB. log truncate on rotation specifies if you want  to append to a log file or not. Sometimes a log filenames is defined in a way that it becomes  cyclic. The log truncate on rotation parameter defines  whether to overwrite or to append to the file, which  already exists. Given the default log file, this will of course  not happen.
One way to handle auto rotation is to use something like postgresql  a.log or so along with log truncate on rotation  =  on.  a means that the day of the week will be used inside  the log file. The advantage here is that the day of the week tends  to repeat itself every seven  days. Therefore the log will be kept for a week and recycled.  If you are aiming for weekly  rotation, a 10 MB file size might not be enough. Consider turning the maximum file size off.
Configuring syslog
Some people prefer  to use syslog to collect log files. PostgreSQL offers the following configuration parameters:
syslog is pretty popular among sysadmins. Fortunately it is easy to configure. Basically you set a facility and an identifier. If log destination is set to syslog, this is already everything there  is to do.
Logging slow queries
The log can also be used  to track down individual slow queries. Back in the old days, this was pretty much  the only way to spot performance problems.
How  does it work?  postgresql.conf has a variable called log min duration statement. If this is set to a value  greater than  zero, every query exceeding our chosen  setting  will make it to the log:
Most people see the slow query  log as the ultimate source  of wisdom. However, I would like to add  a word of caution. There are many  slow queries, and they just happen to eat up a lot of CPU: index creation, data  exports, analytics, and so on.
Those long running queries are totally  expected and are in many  cases not the root cause of all evil. It happens frequently that many  shorter queries are to blame. Here is an example:
1,000 queries x 500 milliseconds is worse  than  2 queries x 5 seconds. The slow query  log can be misleading in some cases.
Still, it does not mean  that it is pointless  it just means that it is a source  of information and not the source  of information.
Defining what and how to log
After taking  a look at some basic settings, it is time to decide  what  to log. By default, only errors  will be logged.  However, this might not be enough. In this section, you will learn what  can be logged  and what  a logline  will look like.
By default, PostgreSQL does not log information about  checkpoints. The following setting  is here to change  exactly that:
In most cases, it does not make sense to log connections as extensive logging significantly slows down the systems. Analytical systems won't suffer much.  However, OLTP might be seriously impacted.
If you want  to see how long statements take, consider switching the following setting  to on:
Let us move on to one of the most important settings. So far, we have not defined the layout of the messages yet. And so far, the log files contain errors  in the following form:
The log will state ERROR along with the error  message. Before PostgreSQL 10.0 there were no timestamp, no username, and so on. You had to change the value instantly to make any sense of the logs. In PostgreSQL 10.0 the default value has changed to something ways more reasonable:To change  that, take a look at log line prefix:
log line prefix is pretty flexible and allows  you to configure the log line to exactly match  your  needs.  In general, it is a good idea to log a timestamp. Otherwise, it is close to impossible to see when something bad has happened. Personally I also like to know  the username, the transaction ID, and the database. However, it is up to you to decide  on what you really need.
Sometimes slowness is caused by bad locking behavior. In general locking related issues can be hard to track down. log lock waits can help to detect  such issues. If a lock is held longer  than  deadlock timeout then a line will be sent to the log, provided the following configuration variable is turned on:
Finally, it is time to tell PostgreSQL what  to actually log. So far, only errors,  slow queries, and the like have been sent to the log. log statement has three  possible settings:
none means that only errors  will be logged.  ddl means that errors  as well as DDLs (CREATE TABLE, ALTER  TABLE, and so on) will be logged.  mod will already include data  changes and all will send  every statement to the log.
Note that all can lead to a lot of logging information, which  can slow down your  system. To give you an impression of how much  impact  there can be, I have compiled a blog post. It can be found here:  https:  www.cybertec postgresql.com en logging the hidden speedbrakes  
If you want  to inspect replication in more detail,  consider turning the following setting  to on:
It will send  replication related commands to the log (for more information visit the
following website: https:  www.postgresql.org docs current static protocol replication.html
It can happen quite frequently that performance problems are caused by temporary file I O. To see which  queries cause the problem, the following setting  can be used:
While pg stat statements contains aggregated information, log temp files will
point  to specific queries causing issues. It usually makes  sense to set this one to a reasonably low value.  The correct value  depends on your  workload but maybe  4 MB is already a good start.
By default, PostgreSQL will write  log files in the time zone where the server  is located. However, if you are running a system that is spread all over the world, it can make sense to adjust  the time zone in a way that you can go and compare log entries:
Keep in mind that on the SQL side, you will still see the time in your  local time zone. However, if this variable is set, log entries  will be in a different time zone.
Summary
This chapter was all about  system statistics.  You learned how to extract information from PostgreSQL and how to use system statistics in a beneficial  way. The most important views were discussed in detail.
The next chapter is all about  query  optimization. You will learn to inspect queries and how they are optimized.
In previous chapters, you have learned how to read  system statistics and how to make use of what  PostgreSQL provides. Armed with this knowledge, this chapter is all about  good query  performance. You will learn more about  the following topics:
Optimizer internals 
Execution plans 
Partitioning data
Enabling and disabling optimizer settings
Parameters for good query  performance
At the end of the chapter, I hope  that you will be able to write  better  and faster queries. And if your  queries still happen to be bad, you should be able to understand why this is the case. You will also be able to use the new techniques to partition data.
Learning what the optimizer does
Before even attempting to think  about  query  performance, it makes  sense to familiarize yourself with what  the query  optimizer does. Having a deeper understanding of what  is going  on under the hood  makes  a lot of sense because it helps  you to see what  the database is really up to and what  it is doing. 
Optimizations by example
To demonstrate how the optimizer works,  I have compiled an example, one which  I have used  over the years in PostgreSQL training. Suppose there  are three  tables:
Let us assume further that those tables contain millions  or maybe  hundreds of millions  of rows. In addition to that there  are indexes:
Finally, there  is a view joining the first two tables together.
Let us suppose now the end user wants to run the following query. What will the optimizer do with this query?  Which choices are there?
Before looking at the real optimization process,  I want  to focus on some options the planner has.
Evaluating join options
The planner has a couple  of options here and I want  to use the chance to show  what  can go wrong if trivial  approaches are used.
Suppose, the planner would just steam  ahead and calculate the output of the view. What is the best way to join 100 million  with 200 million  rows?
In this section, a couple  of (not all) join options will be discussed to show  what  PostgreSQL is able to do. 
Nested loops
One way to join two tables is to use a nested loop. The principle is simple.  Here is some pseudo code:
Nested loops are often used  if one of the sides is very small and contains only a limited set of data.  In our example, a nested loop would lead to 100 million  x 200 million  iterations through the code. This is clearly not an option because runtime would simply  explode.
A nested loop is generally O(n2) so it is only efficient if one side of the join is very small. In my example, this is not the case so a nested loop can be ruled out to calculate the view.
Hash joins
The second option is a hash join. The following strategy could  be applied to solve our little problem:
Both sides can be hashed and the hash keys could  be compared leaving  us with the result  of the join. The trouble here is that all the values have to be hashed and put somewhere.
Merge joins
Finally, there  is a merge  join. The idea here is to use sorted lists to join the results. If both sides of the join are sorted, the system can just take rows  from the top and see if they match and return them.  The main requirement here is that the lists are sorted. Here is a sample plan:
To join, data  has to be provided in sorted order. In many  cases, PostgreSQL will just sort the data.  However, there  are other  options to provide the join with sorted data.  One way is to consult an index, as shown in the next example:
One side of the join or both sides can use sorted data  coming  from lower  levels of the plan. If the table is accessed  directly, an index is the obvious choice to do that but only if the returned result set is significantly smaller than the entire table. Otherwise, we encounter almost double the overhead because first we have to read the entire index, then the entire table. If the result set is a large portion of the table, a sequential scan is more efficient; especially if it is being accessed in primary key order .
The beauty of a merge  join is that it can handle a lot of data.  The downside is that data  has to be sorted or taken  from an index at some point.
Sorting  is O(n   log(n)). Therefore, sorting 300 million  rows  to perform the join is not attractive either.
Note that since the introduction of PostgreSQL 10.0 all join options described here are also available in a “parallel version”. The optimizer will therefore not just consider those standard join options but also evaluate if it makes sense to do parallel queries or not.
Applying transformations
Obviously, doing  the obvious thing  (joining the view first) makes  no sense at all. A nested loop would send  execution times through the roof. A hash join has to hash millions  of rows and a nested loop has to sort 300 million  rows. All three  options are clearly not suitable here. The way out is to apply logical transformations to make the query  fast. In this section, you will learn what  the planner does to speedup the query. A couple  of steps will be performed.
Inlining the view
The first transformation done  by the optimizer is to inline views.  Here is what  happens:
The view is inlined and transformed to a subselect. What does this one buy us? Actually, nothing. All it does is to open  the door  for further optimization, which  will really be a game changer to this query.
Flattening  subselects
The next thing  is to flatten  subselects. By getting rid of subselects, a couple  more options to optimize the query  will appear.
Here is what  the query  will look like after flattening the subselects:
It is now a normal join. Note, we would have done  that on our own but the planner will take care of those transformations for us anyway. The door  is open  for a key optimization.
Applying equality constraints
The following process creates equality constraints. The idea is to detect  additional constraints, join options, and filters. Let us take a deep  breath and take a look at the query: If aid  =  cid and aid  =  bid, we know  that bid  =  cid. If cid  =  4 and all the others  are equal  too, we know  that aid and bid have to be 4 as well, which  leads us to the following query:
The importance of this optimization cannot  be stressed enough. What the planner did here was to open  the door  for two additional indexes,  which  were not clearly visible in the original query.
By being able to use indexes on all three  columns, there  is no need to calculate this expensive horror view anymore. PostgreSQL has the option to just retrieve a couple  of rows from the index and use whatever join option makes  sense. 
Exhaustive searching
Now that those formal transformations have been done,  PostgreSQL will perform an exhaustive search.  It will try out all possible plans  and come up with the cheapest solution to your  query. PostgreSQL knows which  indexes are possible and just uses the cost model to determine how to do things  in the best way possible.
During an exhaustive search,  PostgreSQL will also try to determine the best join order. In the original query, the join order  was fixed to A → B and A → C. However, using  those equality constraints we could  join B → C and join A later. All options are open  to the planner.

Trying it all out
Now  that all those optimizations have been discussed, it is time to see which  plan PostgreSQL might create for us:
As you can see, PostgreSQL will use three  indexes.  It is also interesting to see that PostgreSQL decides to go for a nested loop to join the data.  This makes  perfect  sense because there  is virtually no data  coming  back from the index scans. Therefore, using  a loop to join things  is perfectly feasible and highly  efficient.
Making the process fail
So far you have seen what  PostgreSQL can do for you and how the optimizer helps  to speedup queries. PostgreSQL is pretty smart  but it needs  smart  users.  There are some cases in which  the end user cripples the entire  optimization process  by doing  stupid things.  Let us drop the view:
While this view is logically equivalent to the example shown previously, the optimizer has to treat things  differently. Every OFFSET other  than 0 will change  the result and therefore the view has to be calculated. The entire optimization process is crippled by adding things such as OFFSET.
The PostgreSQL community did not dare  to optimize thiscase of having an OFFSET  0 in a view. People  are simply  not supposed to do that. I am using  this just as an example to show  that some operations can cripple performance and that developers should be aware of the underlying optimization process. However, if you are smart and if you happen to know how PostgreSQL works, this trick can be used as “optimization”.

Here is the new plan:
Just take a look at the costs predicted by the planner. Costs have skyrocketed from a two digit number to a staggering one. Cleary,  this query is going to provide you with bad performance.
There are more ways to cripple performance and it makes  sense to keep the optimization process  in mind.
Constant folding
However, there  are many  more optimizations in PostgreSQL, which  happen behind the scenes and which  contribute to overall  good performance. One of those features is called constant folding. The idea is to turn  expressions into constants, as shown in the following example:
As you can see, PostgreSQL will try to look for 4. As aid is indexed, PostgreSQL will go for an index scan. Note that our table has just one column so PostgreSQL even figured that all the data  it needs  can be found in the index.
In this case, the index lookup code will fail and PostgreSQL has to go for a sequential scan. Keep in mind that this is a single core plan. In case the size of the table is large or in case your PostgreSQL configuration is different, you might see a multi core plan. For the sake of simplicity this chapter only contains single core plans, to make reading easier.
Understanding function inlining
As outlined in this section already, there are many optimizations which  help to speedup queries. One of them  is called function inlining. PostgreSQL is able to inline immutable SQL functions. The main idea is to reduce the number of function calls which  have to be made in order to speed  things  up.
Here is an example of a function:
To demonstrate how things work, I will recreate the table with less content to speedup the index creation:
Join pruning
PostgreSQL provides an optimization called join pruning. The idea is to remove joins if they are not needed by the query. This can come in handy in case queries are generated by some middleware or some ORM. If a join can be removed, it naturally speeds things up dramatically and leads to less overhead.
The question now is: how does join pruning work?  Here is an example:
As you can see, PostgreSQL will join those tables directly. So far there  are no surprises. However, the following query  is slightly  modified. Instead of selecting  all columns, it only selects those columns on the left hand side of the join:
PostgreSQL will go for a direct  inside  scan and skip the join completely. There are two reasons why this is actually possible and logically correct:

No columns are selected  from the right side of the join; thus looking those columns up does not buy us anything
The right side is unique, which  means that joining cannot  increase  the number of rows  due to duplicates on the right side
 
If joins can be pruned automatically, it might happen that queries are a magnitude faster. The beauty here is that the speedup can be achieved by just removing columns which  might not be needed by the application anyway.
Speedup set operations
Set operations allow the results of multiple queries to be combined into a single result  set. Set operators include UNION, INTERSECT, and EXCEPT. PostgreSQL implements all of them and offers many  important optimizations to speed  them  up.
The planner is able to push restrictions down into the set operation, opening the door  for fancy indexing and speedups in general. Let us take a look at the following query, which shows  how this works:
What you see here is that two relations are added to each other.  The trouble is that the only restriction is outside the subselect. However, PostgreSQL figures  that the filter can be pushed further down the plan. xid  =  3 is therefore attached to aid and bid, opening the option to use indexes on both tables. By avoiding the sequential scan on both tables, the query  will run a lot faster.
Note that there  is a distinction between UNION clause and UNION  ALL clause. UNION ALL clause will just blindly append the data  and deliver the result  of both tables.
UNION clause is different: it will filter out duplicates. The following plan shows  how that works:

PostgreSQL has to add  a Sort node  on top of the Append node  to ensure that duplicates can be filtered  later on.

Many people who are not fully aware of the difference between UNION clause and UNION  ALL clause complain about  bad performance because they are unaware that PostgreSQL has to filter out duplicates, which  is especially painful in the case of large datasets.



Understanding execution plans
After digging into some important optimizations implemented into PostgreSQL, I want to shift your  attention a bit more to execution plans. You have already seen some plans  in this book. However, in order  to make full use of plans, it is important to develop a systematic approach to reading this information. Reading plans systematically is exactly within the scope of this section.
Approaching plans systematically
The first thing you have to know is that an EXPLAIN clause can do quite a lot for you and I would highly  recommend to make full use of those features.
As many readers might already know, an EXPLAIN ANALYZE clause will execute the query and return the plan including real runtime information. Here is an example:
The plan looks a bit scary but don't panic, we will go through it step by step. When reading a plan, make sure that you read  it from the inside  to the outside. In our example, execution starts with a sequential scan on b. There are actually two blocks of information here: the cost block and the actual time block. While the cost block contains estimations, the actual  time block is hard evidence. It shows  real execution time. In this example, the sequential scan has taken  85.7 milliseconds.
Note that the costs shown on your system might not be identical. Small difference in the optimizer statistics can cause differences. The important thing here is really the way the plan has to be read.
Data is then passed on to Limit node,  which  ensures that there  is not too much  data.  Note that each stage of execution will also show  us the number of rows  involved. As you can see, PostgreSQL will only fetch 1 million  rows  from the table in the first place; the Limit node ensures that this will actually happen. However, there  is a price tag: at this stage, the runtime has jumped to 169 milliseconds already. Finally, data  is sorted, which  takes a lot of time. The most important thing  when looking at the plan is to figure  out where time is actually lost. The best way to do that is to take a look at the actual time block and try to figure out where time jumps. In this example, the sequential scan takes some time but it cannot be sped up significantly. Instead we see that time skyrockets as sorting starts.
Of course,  sorting can be speedup but more on that later in this chapter.
Making EXPLAIN more verbose
In PostgreSQL, the output of an EXPLAIN clause can be beefed  up a little to provide you with more information. To extract as much  as possible out of a plan, consider turning the following options on:
analyze true will actually execute the query  as shown previously. verbose true will add  some more information to the plan (such as column information, and so on). costs true will show  information about  costs. timing  true is equally important, as it will provide us with good runtime data  so that we can see where in the plan time gets lost. Finally, there  is buffers  true, which  can be very enlightening. In my example, it reveals that we needed to access 45 buffers to execute the query.
Spotting problems
Given all the information shown in the previous chapter, it is already possible to spot a couple  of potential performance problems, which  are highly  important in real life.
Spotting changes in runtime
When looking at a plan, there  are always two questions which  you have got to ask yourself:
Is the runtime shown by the EXPLAIN ANALYZE clause justified  for the given query?
If the query is slow, where does the runtime jump?
In my case, the sequential scan is rated  at 2.625 milliseconds. The sort is done  after 7.199 milliseconds so the sort takes roughly 4.5 milliseconds to complete and is therefore responsible for most of the runtime needed by the query.
Looking  for jumps  in the execution time of the query  will reveal what  is really going  on. Depending on which  type of operation will burn  too much  time, you have to act accordingly. General advice  is not possible here because there  are simply  too many  things which  can cause issues.
Inspecting estimates
However, there  is something which  should always be done: make sure that estimates and real numbers are reasonably close together. In some cases, the optimizer will make poor decisions because the estimates are way off for some reason. It can happen that estimates are off because the system statistics are not up to date. Running an ANALYZE clause is therefore definitely a good thing  to start with. However, optimizer stats are mostly  taken care of by the autovacuum daemon so it is definitely worth considering other  options causing bad estimates. Take a look at the following example:
Apart from the fact that this plan will ensure significantly better  performance, it will also fix statistics—even if the index is not used:
test= EXPLAIN ANALYZE 
	SELECT   
	FROM  t estimate
	WHERE  cos(id) > 4;
                              QUERY  PLAN
 Index  Scan  using  idx cosine on t estimate
	(cost=0.29..8.30 rows=1 width=4)
	(actual time=0.002..0.002 rows=0 loops=1) 
	Index  Cond:  (cos((id)::double precision) > '4'::double precision)

Planning time:  0.095  ms 
Execution time:  0.011  ms 
(4 rows)

However, there  is more to wrong estimates than  meets  the eye. One problem which  is often underestimated is called cross column correlation. Consider a simple  example involving two columns:
20  of people like to ski
20  of people are from Africa

If we want  to count  the number of skiers in Africa, mathematics says that the result  will be 0.2 x 0.2 = 4  of the overall  population. However, there  is no snow  in Africa and the income is low. Therefore the real result  will surely  be lower.  The observation Africa and the observation skiing are not statistically independent. In many  cases, the fact that PostgreSQL keeps  column statistics which  do not span  more than  one column, can lead to bad results.
Of course,  the planner does a lot to prevent these things  from happening as often as possible. Still, it can be an issue.
Starting with PostgreSQL 10.0, we   have multivariate statistics in PostgreSQL, which  has put an end to cross column correlation once and for all.
Inspecting buffer usage
However, the plan itself is not the only thing  which  can cause issues. In many  cases, dangerous things  are hidden on some other level. Memory and caching  can lead to undesired behavior, which  is often hard to understand for end users  who are not trained to see the problem described in this section. 
Here is an example:
Before inspecting the data,  make sure that you have executed the query  twice. Of course,  it makes  sense to use an index here. However, I want  to point  to something else. In my query, PostgreSQL has found 2,112 buffers  inside  the cache and 421136 buffers  had to be taken from the operating system. Now  there  are two things  which  can happen. If you are lucky, the operating system lands  a couple  of cache hits and the query  is fast. If the filesystem cache is not lucky, those blocks have to be taken from disk. This might look obvious; however, it can lead to wild swings in execution time. A query  which  runs  entirely in cache can be 100 times faster than  a query  which  has to slowly  collect random blocks from disk.
Let me try to outline the problem using  a simple  example. Suppose we have a phone system storing 10 billion rows  (which  is not uncommon at large phone carriers). Data flows in at a rapid rate and users  want  to query  this data.  If you have 10 billion rows, data  will only partially fit into memory and therefore a lot of stuff will naturally end up coming  from disk.
We can run a simple  query  now:
Even if you are on the phone, your  data  will be spread all over the place. If you end a phone call just to start the next call, thousands of people will do the same so the odds that two of your  calls will end up in the very same 8,000 block is naturally close to zero. Just imagine for the time being that there  are 100,000 calls going  on at the same time. On disk, data  will be randomly distributed. In case your  phone number shows  up often, it means that for each row at least one block has to be fetched  from disk (assuming a very low cache hit rate). Suppose 5,000 rows  will be returned. Assuming you have to go to disk 5,000 times, it leads to something like 5,000 x 5 milliseconds = 25 seconds of execution time. Note that the execution time of this query  might vary between milliseconds and, say, 30 seconds, depending on how much  has been cached  by the operating system or by PostgreSQL.
Keep in mind that every server  restart will naturally clean out the PostgreSQL and filesystem caches, which  can lead to real trouble after a node  failure.
Fixing high buffer usage
The question is now: how can the situation be improved? One way to do that is to run a CLUSTER clause:
The CLUSTER clause will rewrite the table in the same order  as a (B tree) index. If you are running an analytical workload, this can make sense. However, in an OLTP system, the CLUSTER clause might not be feasible because a table lock is required while the table is rewritten.
Understanding and fixing joins
Joins are an important thing; everybody needs  them on a regular basis. Consequently, joins are also relevant to maintaining or achieving good performance. To ensure that you can write  good joins, I have decided to include a section  about  joining into this book.
 
Getting joins right
 
Before we dive into optimizing joins, it is important to take a look at some of the most common problems arising  with joins and which  should ring alarm  bells to you.
Here is an example:
Most people assume that the average is calculated based  on a single row. However, as stated earlier,  this is not the case and therefore queries like that are often considered to be a performance problem because, for some reason, PostgreSQL does not index the table on the left hand side of the join. Of course,  we are not looking at a performance problem here—we are definitely looking at a semantic issue here. It happens on a regular basis that people writing outer  joins don't mean  what  they order  PostgreSQL to do. So my personal advice  is to always question the semantic correctness of an outer  join before attacking the performance problem reported by the client.
I cannot  stress enough how important this kind of work  is to ensure that your  queries are correct and do exactly what  is needed.
Processing outer joins
After verifying that your  queries are actually correct from a business point  of view, it makes sense to check, what  the optimizer can do to speedup your  outer  joins. The most important thing  is that PostgreSQL can in many  cases reorder inner  joins to speedup things dramatically. However, in the case of outer  joins, this is not always possible. Only a handful of reordering operations are actually allowed:
(A leftjoin B on (Pab)) innerjoin C on (Pac) = (A innerjoin C on (Pac)) leftjoin B on (Pab)

Pac is a predicate referencing A and C, and so on (in this case, clearly Pac cannot  reference B, or the transformation is nonsensical):
(A leftjoin B on (Pab)) leftjoin C on (Pac) = (A leftjoin C on (Pac)) leftjoin B on (Pab) 
(A leftjoin B on (Pab)) leftjoin C on (Pbc) = (A leftjoin (B leftjoin C on (Pbc)) on (Pab) 
The last rule only holds  if predicate Pbc must  fail for all null B rows  (that is, Pbc is strict for at least one column of B). If Pbc is not strict, the first form might produce some rows  with non null C columns where the second form would make those entries  null.
While some joins can be reordered, a typical  type of query  cannot  benefit from join reordering:
The way to approach this is to check if all outer  joins are really necessary. In many  cases, it happens that people write  outer  joins without actually needing them.  Often the business case does not even contain the necessity to use outer  joins.
Understanding the join collapse limit variable 
During the planning process,  PostgreSQL tries to check all possible join orders. In many cases, this can be pretty expensive because there  can be many  permutations, which naturally slows down the planning process.  The join collapse limit variable is here to give the developer a tool to actually work  around these problems and define,  in a more straightforward way, how a query  should be processed.
To show  what  this setting  is all about,  I have compiled a little example:
Basically, these three  queries are identical and treated by the planner in the same way. The first query  consists  of implicit  joins. The last one consists  only of explicit joins. Internally, the planner will inspect those requests and order  joins accordingly to ensure the best runtime possible. The question now is: how many  explicit joins will PostgreSQL plan implicitly? This is exactly what  you can tell the planner by setting  the join collapse limit variable. The default value  is reasonably good for normal queries. However, if your  query  contains a very high number of joins, playing around with this setting  can reduce planning time considerably. Reducing planning time can be essential to maintain good throughput.
To see how the join collapse limit variable changes the plan, I have written a simple query:
Try to run the query  with different settings and see how the plan changes. Unfortunately, the plan is too long to copy it here so I cannot  include the actual  changes in this section.
Enabling and disabling optimizer settings
So far, the most important optimizations performed by the planner have been discussed in more or less detail.  PostgreSQL has become very smart  over the years. Still it can happen that something goes south and users  have to convince  the planner to do the right thing.
To modify plans,  PostgreSQL offers a couple  of runtime variables, which  will have a significant impact  on planning. The idea is to give the end user the chance to make certain types  of nodes in the plan more expensive than  others.  What does that mean  in practice? Here is a simple  plan:
The plan shows  that PostgreSQL reads  the data  from the function and sorts both results. Then a merge  join is performed.
However, what  if a merge  join is not the fastest  way to run the query?  In PostgreSQL there is no way to put planner hints  into comments as you could  do in Oracle. Instead you can ensure that certain  operations are simply  considered to be expensive. SET enable mergejoin  TO  off command will simply  make merging too expensive:
PostgreSQL will still perform a nested loop. The important part  here is that off does not really mean  off—it just means treat as a very expensive thing. This is important because otherwise the query  could  not be performed.
Which settings are there  to influence the planner? The following switches are available:
enable bitmapscan  =  on
enable hashagg  =  on 
enable hashjoin  =  on 
enable indexscan  =  on 
enable indexonlyscan  =  on 
enable material  =  on 
enable mergejoin  =  on 
enable nestloop  =  on
enable seqscan  =  on 
enable sort  =  on
enable tidscan  =  on
While those settings can definitely be beneficial,  I want  to point  out that those tweaks should be handled with care. Only use them  to speed up individual queries and do not turn off things  globally.  Things  can turn  against you fairly quickly  and destroy performance. Therefore it really makes  sense to think  twice before changing these parameters.
Understanding genetic query optimization
The result  of the planning process  is key to achieving superior performance. As shown in this chapter, planning is far from trivial  and involves various complex calculations. The more tables are touched by a query, the more complicated planning will become.  The more tables there  are, the mores  choices the planner will have. Logically, planning time will increase. At some point  planning will take so long that performing the classical exhaustive search  is not feasible anymore. On top of that, the errors  made during planning are so great anyway that finding the theoretically best plan does not necessarily lead to the best plan in terms  of runtime.

The genetic query optimization (GEQO) can come to the rescue.  What is GEQO? The idea is actually stolen from nature and resembles the natural process  of evolution.

PostgreSQL will approach the problem just like a traveling salesman problem and encode the possible joins as integer strings. An example, 4 1 3 2 means:  first join 4 and 1, then 3, and then 2. The numbers represent the relation's IDs. At the beginning, the genetic
optimizer will generate a random set of plans.  Those plans  are then inspected. The bad ones are discarded and new ones are generated based  on the genes of the good ones. This way, potentially even better  plans  are generated. The process  can be repeated as often as desired. At the end of the day, we are left with a plan which  is expected to be a lot better  than  just using  a random plan. The GEQO can be turned on and off by adjusting the geqo variable:
If your  queries are so large that you start to reach this threshold, it certainly makes  sense to play with this setting  to see how plans  are changed by the planner if you change  those variables.
As a general rule, however, I would say that you should try to avoid  GEQO as long as you can and try to fix things  first by trying to somewhat fix the join order  using
the join collapse limit variable. Note that every query  is different so it certainly helps to experiment and gain more experience by learning how the planner behaves under which circumstances.

If you want  to see what  a really crazy join is, consider checking  out the following talk I have given in Madrid at: http:  de.slideshare.net han sjurgenschonig postgresql joining 1 million tables.
Partitioning data
Given default 8k blocks, PostgreSQL can store up to 32 TB of data  inside  a single table. If you compile  PostgreSQL with 32k blocks, you can even put up to 128 TB into a single table. However, large tables like that are not necessarily too convenient anymore and it can make sense to partition tables to make processing easier and in some cases a bit faster. Starting with PostgreSQL 10.0, we  have improved partitioning, which  will offer end users  significantly easier handling of data  partitioning.
In this chapter the old means of partitioning as well as the new features available as of PostgreSQL 10.0 will be covered. 
Creating partitions
First I want to focus your attention on the old method to partition data.
Before digging deeper into the advantages of partitioning, I want  to show  how partitions can be created. The entire  thing  starts  with a parent table:
Actually, the process  is quite simple.  PostgreSQL will simply  unify  all tables and show  us all the content from all the tables inside  and below the partition we are looking at. Note that all tables are independent and are just connected logically through the system catalog.
Applying table constraints
What happens if filters are applied?
PostgreSQL will apply the filter to all the partitions in the structure. It does not know  that the table name  is somehow related to the content of the tables. To the database, names are just names and have nothing to do with what  you are looking for. This makes  sense, of course,  as there  is no mathematical justification for doing  anything else.
The point  now is: how can we teach the database that the 2016 table only contains 2016 data, the 2015 table only contains 2015 data,  and so on? Table constraints are here to do exactly that. They teach PostgreSQL about  the content of those tables and therefore allow the planner to make smarter decisions than  before. The feature is called constraint exclusion and helps  dramatically to speed up queries in many cases. 
The following listing  shows  how table constraints can be created:
For each table a CHECK constraint can be added.
Note that PostgreSQL will only create the constraint if all the data  in those tables is perfectly correct and if every single row satisfies  the constraint. In contrast to MySQL, constraints in PostgreSQL are taken  seriously and honored under any circumstances.
In PostgreSQL, those constraints can overlap—this is not forbidden and can make sense in some cases. However, it is usually better  to have non overlapping constraints because PostgreSQL has the option to prune more tables.
Here is what  happens after adding those table constraints:
The planner will be able to remove many  of the tables from the query  and only keep those which  potentially contain the data.  The query  can greatly benefit from a shorter and more efficient plan. Especially  if those tables are really large, removing them  can boost speed considerably.
 
Modifying inherited structures
Once in a while data  structures have to be modified. The ALTER  TABLE clause is here to do exactly that. The question is: how can partitioned tables be modified?
Basically, all you have to do is tackle the parent table and add  or remove columns. PostgreSQL will automatically propagate those changes through to the child tables and ensure that changes are made to all the relations as follows:
As you can see, the column is added to the parent and automatically added to the child table here.
Note that this works  for columns, and so on. Indexes are a totally  different story. In an inherited structure, every table has to be indexed separately. If you add  an index to the parent table, it will only be present on the parent—it won't be deployed on those child tables. Indexing all those columns in all those tables is your  task and PostgreSQL is not going  to make those decisions for you. Of course,  this can be seen as a feature or as a limitation. On the upside, you could  say that PostgreSQL gives you all the flexibility to index things  separately and therefore potentially more efficiently.  However, people might also argue that deploying all those indexes one by one is a lot more work.
Moving tables in and out of partitioned structures 
Suppose you have an inherited structure. Data is partitioned by date and you want  to provide the most recent years to the end user. At some point,  you might want  to remove some data  from the scope of the user without actually touching it. You might want  to put data  into some sort of archive  or so.
 
PostgreSQL provides a simple  means to achieve  exactly that. First a new parent can be created:
test= CREATE TABLE  t history (LIKE  t data); 
CREATE TABLE
The LIKE keyword allows  you to create a table which  has exactly the same layout  as the t data  table. In case you have forgotten which  columns t data table actually has, this might come in handy as it saves you a lot of work.  It is also possible to include indexes, constraints, and defaults.
Then the table can be moved away  from the old parent table and put below the new one. Here is how it works:
The entire  process  can of course  be done  in a single transaction to assure that the operation stays atomic.
Cleaning up data
One advantage of partitioned tables is the ability to clean up data  quickly.  Suppose that we want  to delete  an entire  year. If data  is partitioned accordingly, a simple  DROP  TABLE clause can do the job:
Understanding PostgreSQL 10.0 partitioning

For many years the PostgreSQL community has been working on built in partitioning. Finally PostgreSQL 10.0 offers the first implementation of in core partitioning, which will be covered in this chapter. For now the partitioning functionality is still pretty basic. However, a lot of infrastructure for future improvements is already in place.
To show you how partitioning works, I have compiled a simple example featuring range partitioning:
In this example one partition will hold all negative values while the other one will take care of positive values. While creating the parent table, you can simply specify, which way you want to partition data. 
NOTE: In PostgreSQL 10.0 there are range partitioning and list partitioning. Support for hash partitioning and alike might be available as soon as PostgreSQL 11.0.
Once the parent table has been created it is already time to create the partitions. To do that the “PARTITION OF” clause has been added.  At this point there are still some limitations. The most important one is that a tuple (= a row) cannot move from one partition to the other.
For example:
If there were rows satisfying this condition, PostgreSQL would simply error out and refuse to change the value. However, in case of a good design, it is a bad idea to change the partitioning key anyway. Also keep in kind that you have to think about indexing each partition.
Adjusting parameters for good query performance
Writing good queries is the first step to reaching good performance. Without a good query, you will most likely suffer from bad performance. Writing good and intelligent code will therefore give you the greatest edge possible. Once your  queries have been optimized from a logical and semantical point  of view, good memory settings can provide you with a final nice speed up. In this section, you will learn what  more memory can do for you and how PostgreSQL can use it for your  benefit. Again, this section assumes that we are using single core queries to make the plans more readable. To ensure that there is always just one core at work, use the following command:
PostgreSQL figured out that the number of groups is actually very small. Therefore, it creates a hash and adds one hash entry  per group and starts  to count.  Due to the low number of groups, the hash is really small and PostgreSQL can quickly  do the count  by incrementing the numbers for each group.
What happens if we group by id and not by name? The number of groups will skyrocket:
The work mem variable governs the size of the hash used  by the GROUP  BY clause. As there are too many  entries,  PostgreSQL has to find a strategy, which  does not require holding the entire  dataset in memory. The solution is to sort the data  by ID and group it. Once the data is sorted, PostgreSQL can move down the list and form one group after the other.  If the first type of value  is counted, the partial result  is read  and can be emitted. Then the next group can be processed. Once the value  in the sorted list changes when moving down, it will never  show  up again, thus  the system knows that a partial result  is ready.
To speedup the query, a higher value  for the work mem variable can be set on the fly (and, of course,  globally):
PostgreSQL knows (or at least assumes) that data  will fit into memory and switch  to the faster plan. As you can see, the execution time is lower.  The query  won't be as fast as in the GROUP  BY  name case because many  more hash values have to be calculated but you will be able to see a nice and reliable  benefit in the vast majority of all cases.
Speeding up sorting
The work mem variable does not only speedup grouping. It can also have a very nice impact on simple  things  such as sorting, which  is an essential mechanism mastered by every database system in the world.
The following query  shows  a simple  operation using  the default setting  of 4  MB:
PostgreSQL needs  13.8 milliseconds to read  the data  and over 200 milliseconds to sort the data.  Due to the low amount of memory available, sorting has to be performed using temporary files. The external  sort  Disk method needs  only small amounts of RAM but has to send  intermediate data  to a comparatively slow storage device,  which  of course  leads to poor throughput. 
Increasing the work mem variable setting  will make PostgreSQL use more memory for sorting:
As there  is enough memory now, the database will do all the sorting in memory and therefore speed up the process  dramatically. The sort takes just 33 milliseconds now, which is a seven  times improvement compared to the query  we had previously. More memory will lead to faster sorting and speedup the system.
Up to now, you have already seen two mechanisms to sort data: external  sort  Disk and quicksort  Memory. In addition to those two mechanisms, there  is also a third algorithm, which  is top N  heapsort  Memory. It can be used  to only provide you with the top N rows:
The algorithm is lightning fast and the entire  query  will be done  in just over 30 milliseconds. The sorting part  is now only 18 milliseconds and is therefore almost  as fast as reading the data  in the first place.
Note that the work mem variable is allocated per operation. It can theoretically happen that a query  needs  the work mem variable more than  once. It is not a global setting—it is really per operation. Therefore you have to set it in a careful way.
There is one thing  you should keep in mind:  many  books claim that setting  the work mem variable too high on an OLTP system might cause your  server  to run out of memory. Yes, if 1,000 people sort 100 MB at the same time, this can result  in memory failures.  However, do you expect the disk to be able to handle that? I doubt it. The solution can only be: stop doing stupid things.  Sorting  100 MB 1,000 times concurrently is not what  should happen in an OLTP system anyway. Consider deploying proper indexes,  write  better  queries, or simply rethink your  requirement. Under any circumstances, sorting so much  data  so often concurrently is a bad idea—stop it before those things  stop your  application.
Speed up administrative tasks
There are more operations which  actually have to do some sorting or memory allocation of any kind. The administrative ones such as the CREATE  INDEX clause do not rely on the work mem variable but use the maintenance work mem variable instead. Here is how it works:
The speed  has now doubled just because sorting has been improved so much.
There are more administrative jobs which  can benefit from more memory. The most prominent ones are the VACUUM clause (to clean out indexes) and ALTER  TABLE clause. The rules for the maintenance work mem variable are the same as for the work mem variable. The setting  is per operation and only the required memory is allocated on the fly.
Summary
In this chapter, a number of query  optimizations have been discussed. You have learned about  the optimizer and about  various internal optimizations such as constant folding, view inlining, joins, and a lot more. All those optimizations contribute to good performance and help to speed  things  up considerably.
After this introduction to optimizations, the next chapter will be about  stored procedures. You will see the options PostgreSQL has to handle user defined code.
In the previous chapter, you learned a lot about  the optimizer as well as optimizations going on in the system. This chapter is going  to be about  stored procedures and how to use them  efficiently  and easily. You will learn what  a stored procedure is made of, which languages are available, and how you can speed  up things  nicely. On top of that, you will be introduced to some of the more advanced features of PL pgSQL.
The following things  will be covered: 
Deciding on the right language
How stored procedures are executed 
Advanced features of PL pgSQL 
Packaging up extensions
Optimizing for good performance
Configuring function parameters
At the end of the chapter, you will be able to write  good and efficient procedures.
Understanding stored procedure languages
When it comes to stored procedures, PostgreSQL differs quite significantly from other database systems. Most database engines force you to use a certain  programming language to write  server side code. Microsoft  SQL Server offers Transact SQL while Oracle encourages you to use PL SQL. PostgreSQL does not force you to use a certain  language but allows  you to decide  on what  you know  best and what you like best. 
The reason PostgreSQL is so flexible is actually quite interesting too in a historical sense. Many years ago, one of the most well known PostgreSQL developers (Jan Wieck), who had written countless patches back in its early days, came up with the idea of using TCL as the server side programming language. The trouble was simple—nobody wanted to use TCL and nobody wanted to have this stuff in the database engine.  The solution to the problem was to make the language interface so flexible that basically  any language can be integrated with PostgreSQL easily. Then, the CREATE  LANGUAGE clause was born:
Nowadays, many  different languages can be used  to write  stored procedures. The flexibility added to PostgreSQL back in the early days  has really paid  off, and so you can choose from a rich set of programming languages.
How  exactly does PostgreSQL handle languages? If you take a look at the syntax  of the CREATE  LANGUAGE clause, you will see a couple  of keywords:

HANDLER: This function is actually the glue between PostgreSQL and any external language you want  to use. It is in charge  of mapping PostgreSQL data  structures to whatever is needed by the language and helps  to pass the code around.
VALIDATOR: This is the policeman of the infrastructure. If it is available, it will be in charge  of delivering tasty syntax  errors  to the end user. Many languages are able to parse  the code before actually executing it. PostgreSQL can use that and tell you whether a function is correct or not when you create it. Unfortunately, not all languages can do this, so in some cases, you will still be left with problems showing up at runtime.
INLINE: If it is present, PostgreSQL will be able to run anonymous code blocks utilizing this handler function. 
The anatomy of a stored procedure
Before actually digging into a specific language, I want  to talk a bit about  the anatomy of a typical  stored procedure. For demo  purposes, I have written a function that just adds up two numbers:
The first thing  you can see is that the procedure is written in SQL. PostgreSQL has to know which  language we are using, so we have to specify that in the definition. Note that the code of the function is passed to PostgreSQL as a string  ('). That is somewhat noteworthy because it allows  a function to become a black box to the execution machinery. In other database engines, the code of the function is not a string  but is directly attached to the statement. This simple abstraction layer is what  gives the PostgreSQL function manager all its power.
Inside the string, you can basically use all that the programming language of your choice has to offer. In my example, I am simply  adding up two numbers passed to the function. For this example, two integer variables are in use. The important part  here is that PostgreSQL provides you with function overloading. In other  words, the mysum(int, int) function is not the same as the mysum(int8,  int8) function. PostgreSQL sees these things  as two distinct functions. Function overloading is a nice feature; however, you have to be very careful not to accidentally deploy too many  functions if your  parameter list happens to change  from time to time. Always make sure that functions that are not needed anymore are really deleted.
The CREATE  OR  REPLACE  FUNCTION clause will not change  the parameter list. You can, therefore, use it only if the signature does not change. It will either  error  out or simply  deploy a new function.
Let's run the function:
Introducing dollar quoting
Passing code to PostgreSQL as a string  is very flexible. However, using  single quotes can be an issue. In many  programming languages, single quotes show  up frequently. To be able to use quotes, people have to escape them  when passing the string  to PostgreSQL. For many years this has been the standard procedure. Fortunately, those old times have passed by and new means to pass the code to PostgreSQL are available:
The solution to the problem of quoting strings is called dollar quoting. Instead of using quotes to start and end strings, you can simply use $$. Currently, I am only aware of two languages that have assigned a meaning to $$. In Perl as well as in bash scripts,  $$ represents the process  ID. To overcome even this little obstacle,  you can use $  almost anything  $ to start and end the string.  The following example shows  how that works:
All this flexibility allows  you to really overcome the problem of quoting once and for all. As long as the start string  and the end string  match,  there won't be any problems left.
Making use of anonymous code blocks
So far, you have learned to write  the most simplistic stored procedures possible, and you have learned to execute code. However, there  is more to code execution than  just full blown stored procedures. In addition to full blown procedures, PostgreSQL allows  the use of anonymous code blocks. The idea is to run code that is needed only once. This kind of code execution is especially useful  to deal with administrative tasks. Anonymous code blocks don't take parameters and are not permanently stored in the database as they don't have a name  anyway.
Here is a simple example:
In this example, the code only issues a message and quits. Again, the code block has to know which  language it uses. The string  is again  passed to PostgreSQL using  simple  dollar quoting.
Using functions and transactions
As you know,  everything that PostgreSQL exposes  in user land  is a transaction. The same, of course,  applies if you are writing stored procedures. The procedure is always part of the transaction you are in. It is not autonomous—it is just like an operator or any other operation.
Here is an example:
All three function calls happen in the same transaction. This is important to understand because it implies  that you cannot  do too much  transactional flow control  inside  a function. Suppose the second function call commits. What happens in such a case anyway? It cannot work.
However, Oracle has a mechanism that allows  for autonomous transactions. The idea is that even if a transaction rolls back, some parts  might still be needed and should be kept. The classical example is as follows:
1.	Start a function to look up secret data
2.	Add  a log line to the document that somebody has modified this important secret data
3.	Commit the log line but roll back the change
4.	You still want  to know that somebody attempted to change data
To solve problems like this one, autonomous transactions can be used.  The idea is to be able to commit a transaction inside  the main transaction independently. In this case, the entry  in the log table will prevail while the change  will be rolled  back.
As of PostgreSQL 10.0, autonomous transactions are not happening. However, I have already seen patches floating around that implement this feature. We will see when these features make it to the core.
To give you an impression of how things  will most likely work,  here is a code snippet based on the first patches:
The point  in this example is that we can decide  on the fly whether to commit or to roll back the autonomous transaction.
Understanding various stored procedure languages
As already stated previously in this chapter, PostgreSQL gives you the power to write stored procedures in various languages. The following options are available and shipped along with the PostgreSQL core:
SQL 
PL pgSQL
PL Perl and PL PerlU 
PL Python
PL Tcl and PL TclU
SQL is the obvious choice to write  stored procedures, and it should be used  whenever possible as it gives the most freedom to the optimizer. However, if you want  to write slightly more complex code, PL pgSQL  might be the language of your  choice. It offers flow control  and a lot more. In this chapter, some of the more advanced and less known features of PL pgSQL  will be shown (this chapter is not meant to be a complete tutorial on PL pgSQL).
Then the core contains code to run stored procedures in Perl. Basically, the logic is the same here. Code will be passed as a string  and executed by Perl. Remember that PostgreSQL does not speak  Perl—it merely has the code to pass things  on to the external programming language.
Maybe you have noticed that Perl and TCL are available in two flavors: Trusted (PL Perl and PL TCL) and Untrusted (PL PerlU  and PL TCLU). The difference between a trusted and an untrusted language is actually an important one. In PostgreSQL, a language is loaded directly into the database connection. Therefore, the language is able to do quite a lot of nasty  stuff. To get rid of security problems, the concept of trusted languages has been invented. The idea is that a trusted language is restricted to the very core of the language. It is not possible to:
Include libraries
Open  network sockets
Perform system calls of any kind (opening files and so on)
Perl offers something called taint mode,  which  is used  to implement this feature in PostgreSQL. Perl will automatically restrict  itself to trusted mode  and error  out if a security violation is about  to happen. In untrusted mode,  everything is possible, and therefore, only the superuser is allowed to run untrusted code.
If you want  to run trusted as well as untrusted code, you have to activate both languages:
plperl and plperlu (respectively pltcl and pltclu).
Python is currently only available as an untrusted language; therefore, administrators have to be very careful when it comes to security in general, as a stored procedure running in untrusted mode  can bypass all security mechanisms enforced by PostgreSQL. Just keep in mind that Python is running as part  of your  database connection and is in no way responsible for security.
Introducing PL pgSQL
Let's get started with the most awaited topic, and I am sure you will love to know  more about  it.
In this section, you will be introduced to some of the more advanced features of PL pgSQL, which  are important for writing proper and highly  efficient code. Note that this is not a beginner's introduction to programming or PL pgSQL  in general.
Handling quoting
One of the most important things  in database programming is quoting. If you are not using proper quoting, you will surely  get into trouble with SQL injection and open,  unacceptable security holes.
What is SQL injection? Consider the following example:
The second function shown here is quote ident. It can be used  to quote object names properly. Note that double quotes are used,  which  is exactly what  is needed to handle table names and alike:
Normally, all table names in PostgreSQL are lowercase. However, if double quotes are used, object names can contain capitalized letters.  In general, it is not a good idea to do this kind of trickery as you have to use double quotes all the time in this case, which  can be a bit inconvenient.
After a basic introduction to quoting, it is important to take a look at how NULL values are handled:
If you call quote literal function on a NULL value,  it will simply  return NULL. There is no need to take care of quoting in this case too.
PostgreSQL provides even more functions to explicitly take care of a NULL value:
It is not only possible to quote  strings and object names. It is also possible to use PL pgSQL onboard means to format  and prepare entire  queries. The beauty here is that you can use the format function to add  parameters to a statement. Here is how it works:
The names of the fields are passed to the format function. Finally the USING clause of the EXECUTE statement is here to add  the parameters to the query, which  is then executed. Again, the beauty here is that no SQL injection can happen.
Here is what  happens:
Managing scopes
After dealing with quoting and basic security (SQL injection)  in general, I want  to shift your focus to another important topic: scopes.
Just like most popular programming languages I am aware of, PL pgSQL  uses variables depending on their context.  Variables are defined in the DECLARE statement of a function. However, PL pgSQL  allows  you to nest a DECLARE statement:
In the DECLARE statement, a variable i is defined and a value  is assigned to it. Then, i is displayed. The output will of course  be 0. Then a second DECLARE statement starts.  It contains an additional incarnation of i, which  is not assigned a value.  Therefore, the value will be NULL. Note that PostgreSQL will now display the inner  i. Here is what  happens:
PostgreSQL allows  you to do all kinds  of trickery. However, it is strongly recommended to keep your  code simple  and easy to read.
Understanding advanced error handling
In every programming language, in every program, and in every module, error  handling is an important thing.  Everything is expected to go wrong once in a while, and therefore it is vital and of key importance to handle errors  properly and professionally. In PL pgSQL,  you can use EXCEPTION blocks to handle errors.  The idea is that in case the BEGIN block does something wrong, the EXCEPTION block will take care and handle the problem correctly. Just like many  other  languages such as Java, you can react on different types  of errors  and catch them  separately.
In the following example, the code might run into a division by zero problem. The goal is to catch this error  and react accordingly:
The BEGIN block can clearly throw an error.  However, the EXCEPTION block catches the error  we are looking at and also takes care of all other  potential problems that can unexpectedly pop up.
 
Technically, this is more or less the same as a savepoint, and therefore the error  does not cause the entire  transaction to fail completely. Only the block causing the error  will be subject to a mini roll back.
By inspecting the sqlerrm variable, you can also have direct  access to the error  message itself. Let us run the code:
PostgreSQL catches the exception and shows  the message in the EXCEPTION block. PostgreSQL is already kind enough to tell us in which  line the error  has happened, which makes  it a lot easier to debug and fix the code in case it is broken.
In some cases, it can also make sense to raise your own exception. As you might expect, this is easy to do:
Making use of GET DIAGNOSTICS
Many of you who have used Oracle in the past might be familiar with the GET DIAGNOSTICS clause. The idea behind the GET  DIAGNOSTICS clause is to allow users  to see what  is going  on in the system. While the syntax  might appear a bit strange to people who are used  to modern code, it is still a valuable tool you can use to make your  applications better.
From my point  of view, there  are two main tasks that the GET  DIAGNOSTICS clause can be used  for:
Inspecting the row count
Fetching context  information and getting a backtrace

Inspecting the row count  is definitely something you will need during everyday programming. Extracting context  information will be useful  if you want  to debug applications.
The following example shows  how the GET  DIAGNOSTICS clause can be used  inside  your code:
As you can see, the GET  DIAGNOSTICS clause gives us quite detailed information about what  is going  on in the system.


Using cursors to fetch data in chunks
If you execute SQL, the database will calculate the result  and send  it to your  application. Once the entire  result  set has been sent to the client, the application can continue to do its job. The problem is just: what  happens if the result  set is so large that it does not fit into the memory anymore? What if the database returns 10 billion rows? The client application usually cannot  handle so much  data  at once and actually it should not. The solution to the problem is a cursor. The idea behind a cursor  is that data  is generated only when it is needed (when  FETCH is called). Therefore, the application can already start to consume data while it is actually being generated by the database. On top of that, the memory required to perform an operation is a lot lower. 
When it comes to PL pgSQL,  cursors also play a major role. Whenever you loop over a result set, PostgreSQL will internally use a cursor  automatically. The advantage is that the memory consumption of your  applications will be reduced dramatically and there  is hardly a chance of ever running out of memory due to processing large amounts of data.  There are various ways to use cursors. Here is the most simplistic one:

This code is interesting for two reasons. First of all, it is a set returning function (SRF). It produces an entire  column and not just a single row. The way to achieve  this is to use setof variable instead of just the datatype. The RETURN  NEXT clause will build  up the result  set until  we have reached the end. The RETURN clause will tell PostgreSQL that we want  to leave the function and that the result  is done.
The second important issue is that looping over the cursor  will automatically create an internal cursor. In other  words, there  is no need to be afraid  that you could  potentially run out of memory. PostgreSQL will optimize the query  in a way that it tries to produce the first 10  of the data  (defined by the cursor tuple fraction variable) as fast as possible. Here is what  the query  will return:
What you have just seen is, in my opinion, the most frequent and most common way to use implicit  cursors in PL pgSQL.  The following example shows  an older mechanism that many people from Oracle might know:
In this example, the cursor  is explicitly declared and opened. Inside,  the loop data  is then explicitly fetched  and returned to the caller. Basically, the query  does exactly the same thing—it is merely a matter of taste which  syntax  developers actually prefer.
Do you still have the feeling that you don't know  enough about  cursors yet? There is more; here is a third option to do exactly the same thing:

In this case, the cursor  is fed with an integer parameter, which  comes directly from the function call ($1).
Sometimes, a cursor  is not used  up by the stored procedure itself but returned for later use. In this case, you can return a simple  use refcursor as the return value:
Actually, in this section, you learned that cursors will only produce data  as it is consumed. This holds  true for most queries. However, I have added a little catch to this example; whenever an SRF is used,  the entire  result  has to be materialized. It is not created on the fly, but at once. The reason is that, SQL must  be able to re scan  a relation, which  is easily possible in the case of a normal table. However, for functions the situation is different. Therefore, an SRF is always calculated and materialized, making the cursor  in this example totally  useless.  In other  words, be careful when writing functions—in some cases, danger is hidden in nifty details.
Utilizing composite types
In most other  database systems, stored procedures are only used  with primitive datatypes such as integer, numeric, varchar, and so on. However, PostgreSQL is very different. You can basically  use all datatypes available to you. This includes primitive as well as composite and custom types.  There are simply  no restrictions as far as datatypes are concerned. To unleash the full power of PostgreSQL, composite types  are highly  important and are often used  by extensions which  can be found on the Internet.
The following example shows  how a composite type can be passed to a function and how it can be used  internally. Finally the composite type will be returned again:
The main issue here is that you can simply  use $1.field name to access the composite type. Returning the type is not hard either.  You just have to assemble the composite type variable on the fly and return it just like any other  datatype. You can even use arrays or even more complex structures easily.
Writing triggers in PL pgSQL
Server side code is especially popular if you want  to react on certain  events  happening in the database. A trigger allows  you to call a function if an INSERT, UPDATE, DELETE or a TRUNCATE clause happens on a table. The function called by the trigger can then modify the data  changed in your  table or simply  perform some operation needed.
In PostgreSQL, triggers have become ever more powerful over the years and provide a rich set of features:
The first thing  to observe is that a trigger is always fired for a table or a view and calls a function. A trigger has a name  and can happen before or after an event. The beauty of PostgreSQL is that you can have as many  triggers on a single table as you want.  While this does not come as a surprise to hardcore PostgreSQL users,  I want  to point  out that this is not possible in many  expensive commercial database engines still in use around the world.
If there  is more than  one trigger on the same table, the following rule was introduced many years ago in PostgreSQL 7.3: Triggers are fired in alphabetical order. First, all those before triggers happen in alphabetical order. Then PostgreSQL performs the row operation the trigger has been fired for and continues executing the after triggers in alphabetical order.  In other  words, the execution order  of triggers is absolutely deterministic and the number of triggers is basically  unlimited.
Triggers can modify data  before or after the actual  modification has happened. In general, this is a good way to verify data  and to error  out in case some custom restrictions are violated. The following example shows  a trigger that is fired in INSERT clause and which changes data  added to the table:
As stated previously, the trigger will always call a function, which  allows  you to nicely abstract code. The important thing  here is that the trigger function has to return trigger. To access the row you are about  to insert,  you can access the NEW variable.
Note: INSERT and UPDATE triggers always provide a NEW variable. UPDATE and DELETE will offer a variable called OLD. Those variables contain the row you are about  to modify.
In my example, the code checks whether the temperature is too low. If it is, the value  is not okay; it is dynamically adjusted. To ensure that the modified row can be used,  NEW is simply returned. If there  is a second trigger called after this one, the next function call will already see the modified row.
In the next step, the trigger can be created:
As you can see, the value  has been adjusted correctly. The content of the table shows  0 for the temperature.
If you are using  triggers, you should be aware of the fact that a trigger knows a lot about itself. It can access a couple  of variables that allow you to write  more sophisticated code and to achieve  better  abstraction.
Let us drop the trigger first:
What you see here is that the trigger knows its name,  the table it has been fired for, and a lot more. If you want  to apply similar  actions  on various tables, those variables help you to avoid duplicate code by just writing a single function, which  can then be used  for all tables you are interested in.
So far you have seen simple row level triggers, which are fired once per statement. However, with the introduction of PostgreSQL 10.0 there are a couple of new features.  Statement level triggers have been around for a while already. However, it was not possible to access the data changed by the trigger. This has been fixed in PostgreSQL 10.0. It is now possible to make use of “transition tables”, which contain all the changes made.
The following listing contains a complete example showing, how a transition table can be used:
In this case we need two trigger definitions because we cannot just squeeze everything into just one definition. Inside the triggers function the transition table is easy to use: It can be accessed just like a normal table.
Let us test the code of the trigger:
Keep in mind that it is not necessarily a good idea to use transition tables for billions of rows. PostgreSQL really is scalable but at some point it is necessary to see that there are performance implications as well.
Introducing PL Perl
There is a lot more to say about  PL pgSQL.  However, as I've only got 40 pages  to cover this topic, it is time to move on to the next procedural language. PL Perl has been adopted by many  people as the ideal language to do string  crunching. As you might know,  Perl is famous for its string  manipulation capabilities and therefore, still fairly popular after all these years.
To enable  PL Perl, you have two choices:
You can deploy trusted or untrusted Perl. If you want  both, you have to enable  both languages.
To show  you how PL Perl works,  I have implemented a function that simply  parses  an e  mail address and returns true or false. Here is how it works:

A text parameter is passed to the function. Inside  the function, all those input parameters can be accessed  using  $ . In this example, the regular expression is executed and the function returns.
The function can be called just like any other  procedure written in any other  language:
Keep in mind that you cannot  load packages and so on if you are inside  a trusted function. For example, if you want  to use the w command to find words, Perl will internally load utf8.pm, which  is of course  not allowed.
Using PL Perl for datatype abstraction
As stated in this chapter, functions in PostgreSQL are pretty universal and can be used  in many  different contexts.  If you want  to use functions to improve data  quality, you can use CREATE  DOMAIN clause:
Making use of the SPI interface
Once in a while, your  Perl procedure has to do database work.  Remember, the function is part  of the database connection. Therefore, it is pointless to actually create a database connection. To talk to the database, the PostgreSQL server  infrastructure provides the SPI interface,  which  is a C interface to talk to database internals. All procedural languages that help you to run server side code use this interface to expose functionality to you. PL Perl does the same, and in this section, you will learn how to use the Perl wrapper around the SPI interface.
The most important thing  you might want  to do is simply  run SQL and retrieve the number of rows  fetched.  The spi exec query function is here to do exactly that. The first parameter passed to the function is the query. The second parameter has the number of rows you actually want  to retrieve. For simplicity reasons, I decided to fetch all of them:
Using SPI for set returning functions
In many cases, you don't just want  to execute some SQL and forget about  it. In most cases, a procedure will loop over the result  and do something with it. The following example will show  how you can loop over the output of a query. On top of that, I decided to beef up the example a bit and make the function return a composite datatype. Working with composite types  in Perl is very easy because you can simply  stuff the data  into a hash and return it.
The return next function will gradually build  up the result  set until  the function is terminated with a simple  return statement.
The example in this listing  generates a table consisting of random values:
SPI will nicely execute the query  and display the number of rows. The important thing  here is that all stored procedure languages provide a means to send  log messages. In the case of PL Perl, this function is called elog and takes two parameters. The first one defines  the importance of the message (INFO, NOTICE, WARNING, ERROR, and so on) and the second parameter contains the actual  message.
The following message shows  what  the query  returns:
Escaping in PL Perl and support functions
So far, we only used  integers, so SQL injection or special table names were not an issue. Basically, the following functions are available:
quote literal: It returns a string  quote  as string  literal
quote nullable: It quotes a string
quote ident: It quotes SQL identifiers (object names, and so on) 
decode bytea: It decodes a PostgreSQL byte array  field 
encode bytea: It encodes data  and turns it into a byte array 
encode literal array: It encodes an array  of literals 
encode typed literal: It converts a Perl variable to the value  of the datatype passed as a second argument and returns a string  representation of this value
encode array constructor: It returns the contents of the referenced array  as a string  in array  constructor format
looks like number: It returns true if a string  looks like a number
is array ref: It returns true if something is an array  reference
These functions are always available and can be called directly without having to include any library.
Sharing data across function calls
Sometimes it is necessary to share  data  across calls. The infrastructure has means to actually do that. In Perl, a hash can be used  to store whatever data  is needed:
In case of a more complex statement, the developer usually does not know  in which  order the functions will be called. It is important to keep that in mind.  In most cases, you cannot rely on an execution order.
Writing triggers in Perl
Every stored procedure language shipped with the core of PostgreSQL allows  you to write triggers in that language. The same of course  applies to Perl. As the possible length of this chapter is limited, I decided not to include an example of a trigger written in Perl but instead to point  you to the official PostgreSQL documentation: https:  www.postgresql.o rg docs 10 static plperl triggers.html.
Basically, writing a trigger in Perl does not differ from writing one in PL pgSQL.  All predefined variables are in place, and as far as return values are concerned, the rules apply in every stored procedure language.
Introducing PL Python
If you don't happen to be a Perl expert,  PL Python might be the right thing  for you. Python has been part  of the PostgreSQL infrastructure for a long time and is therefore a solid, well  tested implementation.
When it comes to PL Python, there  is one thing  you have to keep in mind:  PL Python is only available as an untrusted language. From a security point  of view, it is important to keep that in mind at all times.
To enable  PL Python, you can run the following line from your  command line. test is the name  of the database you want  to use with PL Python:
Once the language is enabled, it is already possible to write  code.
Alternatively, you can use CREATE  LANGUAGE clause of course.  Also keep in mind that in order  to use server side languages, PostgreSQL packages containing those languages are needed (postgresql plpython $(VERSIONNUMBER) and so on). 
Writing simple PL Python code
In this section, you will learn to write  simple  Python procedures. The example discussed here is quite simple:  if you are visiting  a client by car in Austria, you can deduct 42 euro cents per kilometer as expenses in order  to reduce your  income  tax. So what  the function does is to take the number of kilometers and return the amount of money  we can deduct from our tax bill. Here is how it works:
The function ensures that only positive values are accepted. Finally, the result  is calculated and returned. As you can see, the way a Python function is passed to PostgreSQL does not really differ from Perl or PL pgSQL.
Using the SPI interface
As with all procedural languages, PL Python gives you access to the SPI interface.  The following example shows  how numbers can be added up:
When you try this example out, make sure that the call to cursor  is actually a single line. Python is all about  indentation, so it does make a difference if your  code consists  of one or of two lines.
Once the cursor  has been created, we can loop over it and add  up those numbers. The columns inside  those rows  can easily be referenced using  column names.
Calling  the function will return the desired result:
(1 row)
If you want  to inspect the result  set of an SQL statement, PL Python offers various functions to retrieve more information from the result.  Again, those functions are wrappers around what  SPI offers on the C level.
The following function inspects a result  more closely:
The nrows() function will display the number of rows. The status() function tells us whether everything worked out fine. The colnames() function returns a list of columns. The coltypes() function returns the object IDs of the datatypes in the result  set. 23 is the internal number of integer:
Then comes typmod. Consider something like varchar(20): the configuration part  if the type is what  typmod is all about.
Finally there  is a function to return the entire  thing  as a string  for debugging purposes. Calling  the function will return the following result:
There are many  more functions in the SPI interface that help you to execute SQL.
Handling errors
Once in a while, you might have to catch an error.  Of course,  this is also possible in Python. The following example shows  how this works:
Improving stored procedure performance
So far, you have seen how to write  basic stored procedures as well as triggers in various languages. Of course,  there  are many  more languages supported. Some of the most prominent ones are PL R (R is a powerful statistics package) and PL v8 (which  is based  on the Google JavaScript  engine).  However, those languages are beyond the scope of this chapter (regardless of their usefulness).
In this section, we will focus on improving the performance of a stored procedure. There are a couple  of areas in which  we can speed  up processing:
Reduce  the number of calls
Use cached  plans
Give hints  to the optimizer
In this chapter, all three  main areas will be discussed.
Reducing the number of function calls
In many  cases, performance is bad because functions are called way too often. I cannot stress this point  too much:  calling things  too often is the main reason for bad performance. When you create a function, you can choose from three  types  of functions: volatile,  stable, and immutable. Here is an example:
A volatile  function means that the function cannot  be optimized away.  It has to be executed over and over again. A volatile  function can also be the reason why a certain  index is not used.  By default, every function is considered to be volatile.  A stable function will always return the same data  within the same transaction. It can be optimized and calls can be removed. The now() function is a good example of a stable function; within the same transaction it returns the same data.
Immutable functions are the gold standard because they allow for most optimizations, which  is because they always return the same result  given the same input. As a first step to optimizing functions, always make sure that they are marked correctly by adding volatile, stable, or immutable to the end of the definition.
Using cached plans
In PostgreSQL, a query  is executed using  four stages:
The parser: It checks the syntax
Rewrite system: It take cares of rules, and so on
Optimizer planner: It optimizes the query
Executor: It executes  the plan provided by the planner
If the query  is short,  the first three  steps are relatively time consuming compared to the real execution time. Therefore, it can make sense to cache execution plans.  PL pgSQL  basically does all the plan caching  for you automatically behind the scenes. You don't have to care. PL Perl and PL Python will give you the choice. The SPI interface provides functions to handle and run prepared queries, so the programmer has the choice whether a query should be prepared or not. In the case of long queries, it can actually make sense to use unprepared queries—short queries should usually always be prepared to reduce the internal overhead. 
Assigning costs to functions
From the optimizer point  of view, a function is basically  just like an operator. PostgreSQL will also treat the costs the same way as if it was a standard operator. The problem is just this: adding two numbers is usually cheaper than  intersecting coastlines using  some PostGIS provided function. The thing  is that the optimizer does not know  whether a function is cheap  or expensive. Fortunately, we can tell the optimizer to make functions cheaper or more expensive:
The COST parameter indicates how much  more expensive than  a standard operator your operator really is. It is a multiplier for cpu operator cost and not a static value.  In general, the default value  is 100 unless  the function has been written in C.
The second parameter is the ROWS parameter. By default, PostgreSQL assumes that a set returning function will return 1,000 rows  because the system has no way to figure  out precisely how many  rows  there  will be. The ROWS parameter allows  developers to tell PostgreSQL about  the expected number of rows.
Using stored procedures
In PostgreSQL, stored procedures can be used  for pretty much  everything. In this chapter, you have already learned about  CREATE  DOMAIN clause and so on, but it is also possible to create your  own operators, type casts, and even collations.
In this section, you will see how a simple  type cast can be created and how it can be used  to your  advantage. To define  the type cast, consider taking  a look at CREATE  CAST clause:
Summary
In this chapter, you learned how to write  stored procedures. After a theoretical introduction, our attention was focused on some selected  features of PL pgSQL.  In addition to that, you learned how to use PL Perl and PL Python, which  are simply  two important languages provided by PostgreSQL. Of course,  there  are many  more languages available. However, due to the limitations of the scope (and length) of this book, those could  not be covered in detail.  If you want  to know  more, check out the following website:  https:  wik i.postgresql.org wiki PL Matrix.
In the next chapter, you will learn about  PostgreSQL security. You will learn how to manage users  and permissions in general. On top of that, you will also learn about  network security.
Managing PostgreSQL Security
The previous chapter was all about  stored procedures and writing server side code. After introducing you to many  important topics, it is now time to shift our focus to PostgreSQL security. You will learn how to secure  a server  and configure permissions.
The following topics will be covered: 
Configuring network access
Managing authentication
Handling users  and roles 
Configuring database security
Managing schemas, tables, and columns
Row level  security
At the end of the chapter, you will be able to write  good and efficient procedures.
Managing network security
Before moving on to real world, practical examples, I want  to briefly shift your  attention to the various layers of security we will be dealing with. When dealing with security, it makes sense to keep those levels in mind in order  to approach security related issues in an organized way. 
Here is my mental model:
Bind addresses: listen addresses in postgresql.conf file
Host based access control: pg hba.conf file
Instance level permissions: Users, roles, database creation, login, and replication
Database level permissions: Connecting, creating schemas, and so on
Schema level permissions: Using schemas and creating objects inside  a schema
Table level permissions: Selecting, inserting, updating, and so on 
Column level permissions: Allowing or restricting access to columns 
Row level  security: Restricting access to rows

In order  to read  a value,  PostgreSQL has to ensure that you have sufficient permissions on every level. The entire  chain of permissions has to be correct.
Understanding bind addresses and connections 
When you configure a PostgreSQL server,  one of the first things  you have to do is define remote access. By default, PostgreSQL does not accept remote connections. The important thing  here is that PostgreSQL does not even reject the connection because it simply  does not listen on the port. If you try to connect,  the error  message will actually come from the operating system because PostgreSQL does not care at all.
Assuming that there  is a database server using  default configuration on 192.168.0.123, the following will happen:
Telnet tries to create a connection on port 5432 and will instantly be rejected  by the remote box. From the outside, it looks as if PostgreSQL was not running at all.
The key to success can be found in the postgresql.conf file:
The listen addresses setting  will tell PostgreSQL which  addresses to listen on. Technically speaking, those addresses are bind addresses. What does that actually mean? Suppose you have four network cards  in your  machine. You can listen on, say, three  of those IP addresses. PostgreSQL takes requests to those three  cards  into account and does not listen on the fourth one. The port is simply  closed.
You have to put your  server's IP address into listen addresses and not the IPs of the clients.
If you put a ' ' in, PostgreSQL will listen to every IP assigned to your  machine.
Keep in mind that changing listen addresses requires a PostgreSQL service restart. It cannot  be changed on the fly without a restart.
However, there are more settings related to connection management that are highly important to understand:
First of all, PostgreSQL listens  to a single TCP port (the default value  is 5432). Keep in mind that PostgreSQL will listen on a single port only. Whenever a request comes in, the postmaster will fork and create a new process  to handle the connection. By default, up to 100 normal connections are allowed. On top of that, three  additional connections are reserved for superusers. This means that you can either  have 97 connections plus  3 superusers or 100 superuser connections. Note that those connection related settings will also need a restart. The reason for this is that a static amount of memory is allocated to shared memory, which  cannot  be changed on the fly.
Inspecting connections and performance
When I am  consulting, many  people ask whether raising  the connection limit will have an impact  on performance in general. The answer is: not much  (there  is always some overhead due to context  switches and all that). It basically  makes  little difference how many connections there  are. However, what  does make a difference is the number of open snapshots. The more the number of open  snapshots (not connections), the more the overhead on the database side. In other  words, you can increase  max connections cheaply.
If you are interested in some real world data,  consider taking  a look at one of my older blog posts: https:  www.cybertec postgresql.com max connections performance impacts .
Living in a world without TCP
In some cases, you might not want  to use a network. It often happens that a database will only talk to a local application anyway. Maybe, your  PostgreSQL database has been shipped along with your  application, or maybe  you just don't want  the risk of using  a network: In this case, Unix sockets are what  you need.  Unix sockets are a network free means of communication. Your application can connect  through a Unix socket locally without exposing anything to the outside world.
What you need,  however, is a directory. By default, PostgreSQL will use the  tmp directory. However, if more than  one database server  is running per machine, each one will need a separate data  directory to live in.
Apart from security, there  are various reasons why not using  a network might be a good idea. One of these reasons is performance. Using Unix sockets is a lot faster than  going through the loopback device (127.0.0.1). If that sounds surprising to you, don't worry  it does to many  people. However, the overhead of a real network connection should not be underestimated if you are only running very small queries. 
To show  you what  it really means, I have included a simple  benchmark.
I have created a script.sql file. This is a simple  script that just creates a random number and selects it. So it is the most simplistic statement possible. There is nothing simpler than just fetching  a number.
So, let's run this simple  benchmark on a normal laptop. To do so, I have written a small thing  called script.sql. It will be used  by the benchmark:
[hs@linuxpc ~]$ cat  tmp script.sql
SELECT 1

Then you can simply  run pgbench to execute the SQL over and over again. The  f option allows  passing the name  of the SQL to the script.  c  10 means that we want  10 concurrent connections to be active for 5 seconds ( T  5). The benchmark is running as the postgres user and is supposed to use the postgres database, which  should be there  by default. Note that the following examples will work  on RHEL derivatives. Debian based systems will use different paths:
As you can see, no hostname is passed to pgbench, so the tool connects locally to the Unix socket and runs  the script as fast as possible. On this four core Intel box, the system was able to achieve  around 174,000 transactions per second.
What happens if  h  localhost is added?
The throughput will drop like a stone to 107,000 transactions per second. The difference is clearly related to networking overhead.

Note that by using  the  j option (the number of threads assigned to pgbench), you can squeeze some more transactions out of your  systems. However, it does not change  the overall  picture of the benchmark in my situation. In other  tests, it does because pgbench can be a real bottleneck if you don't provide enough CPU power.

As you can see, networking can not only be a security issue but also a performance issue.
Managing pg hba.conf
After configuring bind addresses, we can move on to the next level. The pg hba.conf file will tell PostgreSQL how to authenticate people coming  over the network. In general, pg hba.conf file entries  have the following layout:
There are four types  of rules that can be put into the pg hba.conf file:
local: This can be used  to configure local Unix socket connections.
host: This can be used  for SSL and non SSL connections.
hostssl: These are only valid for SSL connections. To make use of this option, SSL must  be compiled into the server,  which  is the case if you are using prepackaged versions of PostgreSQL. In addition to that, ssl  =  on has to be set in the postgresql.conf file when the server  is started.
hostnossl: This works  for non SSL connections.
A list of rules can be put into the pg hba.conf file. Here is an example:

You can see three  simple  rules. The local record says that all users  from local Unix sockets for all databases are to be trusted. The trust method means that no password has to be sent to the server  and people can log in directly. The other  two rules say that the same applies to connections from localhost 127.0.0.1 and ::1 128, which  is an IPv6 address.
As connecting without a password is certainly not the best of all choices for remote access, PostgreSQL provides various authentication methods that can be used  to configure pg hba.conf file flexibly. Here is the list of possible authentication methods:

trust: This allows  authentication without providing a password. The desired user has to be available on the PostgreSQL side.
reject: The connection will be rejected.
md5  and  password: The connections can be created using  a password. md5 means that the password is sent over the wire encrypted. In the case of password, the credentials are sent in plain  text, which  should not be done  on a modern system anymore. md5 is not considered safe anymore. You should use scram sha 256 instead in PostgreSQL 10 and beyond.
Scram sha 256: This setting is the successor of md5 and uses a ways more secure hash than the previous version.
GSS  and  SSPI: This uses GSSAPI or SSPI authentication. This is only possible for TCP IP connections. The idea here is to allow for single sign on.
ident: This obtains the operating system username of the client by contacting the Ident  server  on the client and checking  whether it matches the requested database username. 
peer: Suppose you are logged  in as abc on Unix. If peer is enabled, you can only log in to PostgreSQL as abc. If you try to change  the username, you will be rejected.  The beauty is that abc won't need a password in order  to authenticate. The idea here is that only the database administrator can log in to the database on a Unix system and not somebody else who just has the password or a Unix account on the same machine. This only works  for local connections.
pam: It uses the pluggable authentication module (PAM). This is especially important if you want  to use a means of authentication that is not provided by PostgreSQL out of the box. To use PAM, create a file called  etc pam.d postgresql on your  Linux system and put the desired PAM modules you are planning to use into the config file. Using PAM, you can even authenticate against less common components. However, it can also be used  to connect  to active directory and so on.
ldap: This configuration allows  you to authenticate using  lightweight directory access protocol (LDAP). Note that PostgreSQL will only ask LDAP for authentication; if a user is present only on the LDAP but not on the PostgreSQL side, you cannot  log in. Also note that PostgreSQL has to know  where your LDAP server  is. All of this information has to be stored in pg hba.conf file as outlined in the official documentation: https:  www.postgresql.org docs 10 static auth methods.htmlAUTH LDAP.
radius: The remote authentication dial in user service (RADIUS) is a means to do single sign on.  Again, parameters are passed using  configuration options. 
cert: This authentication method uses SSL client certificates to perform authentication, and therefore, it is possible only if SSL is used.  The advantage here is that no password has to be sent. The CN attribute of the certificate  will be compared to the requested database username, and if they match,  the login will be allowed. A map  can be used  to allow for user mapping.

Rules can simply  be listed one after the other.  The important thing  here is that the order does make a difference, as shown in the following example:

When PostgreSQL walks  through pg hba.conf file, it will use the first rule that matches. So, if our request is coming  from 192.168.1.54, the first rule will always match  before we make it to the second one. This means that 192.168.1.54 will be able to log in if the password and user are correct; therefore, the second rule is pointless.
If you want  to exclude the IP, make sure that those two rules are swapped. 
Handling SSL
PostgreSQL allows  you to encrypt the transfer between the server  and the client. Encryption is highly  beneficial,  especially if you are communicating over long distances. SSL offers a simple  and secure  way to ensure that nobody is able to listen to your  communication. In this section, you will learn to set up SSL.
The first thing  to do is to set the ssl parameter to on in the postgresql.conf file on server start. In the next step, you can put SSL certificates into the $PGDATA directory. If you don't want  the certificates to be in some other  directory, change  the following parameters:
If you want  to use self signed certificates, perform the following steps:
Answer the questions asked  by OpenSSL. Make sure you enter  the local hostname as common name.  You can leave the password empty. This call will generate a key that is passphrase protected; it will not accept a passphrase that is less than  four characters long.
To remove the passphrase (as you must  if you want  automatic startup of the server),  run the commands:
Enter the old passphrase to unlock  the existing  key. Now  do this to turn  the certificate  into a self signed certificate  and to copy the key and certificate  to where the server  will look for them:
Once the proper rules have been put into the pg hba.conf file, you can use SSL to connect to your  server.  To verify that you are indeed using  SSL, consider checking  out the pg stat ssl function. It will tell you for every connection and whether it uses SSL or not, and it will provide some important information about  encryption:
Handling instance level security
So far, we have configured bind addresses and we have told PostgreSQL which  means of authentication to use for which  IP ranges. Up to now, the configuration was purely network related.
In the next step, we can shift our attention to permissions at the instance level. The most important thing  to know  is that users  in PostgreSQL exist at the instance level. If you create a user, it is not just visible inside  one database  it can be seen by all the databases. A user might have permissions to access just a single database, but basically  users  are created at the instance level. 
To those of you who are new to PostgreSQL, there  is one more thing  you should keep in mind  users and roles are the same thing.  CREATE  ROLE and CREATE  USER clauses  have different default values (literally,  the only difference is that roles do not get the LOGIN attribute by default), but at the end of the day, users  and roles are the same. Therefore, CREATE  ROLE and CREATE  USER clauses  support the very same syntax:

Let's discuss those syntax  elements one by one. The first thing  you see is that a user can be a superuser or a normal user. If somebody is marked as superuser, there  are no longer  any restrictions that a normal user has to face. A superuser can drop objects (databases and so on) as he wishes.
The next important thing  is that it takes permissions on the instance level to create a new database. Note than  when somebody creates a database, this user will automatically be the owner of the database. The rule is this: the creator is always automatically the owner of an object (unless  specified otherwise as it can be done  with the CREATE  DATABASE clause). The beauty is that object owners can also drop an object again. 
The CREATEROLE NOCREATEROLE clause defines  whether somebody is allowed to create new users roles or not.
The next important thing  is the INHERIT NOINHERIT clause. If the INHERIT clause is set (which  is the default value) a user can inherit permissions from some other  user. Using inherited permissions allows  using  roles as a good way to abstract permissions. For example, you can create a role of bookkeeper and make many  other  roles inherit from bookkeeper. The idea is that you only have to tell PostgreSQL once what  a bookkeeper is allowed to do even if you have many  people working in accounting.
The LOGIN NOLOGIN clause defines  whether a role is allowed to log in to the instance. Note that the LOGIN clause is not enough to actually connect  to a database. To do that, more permissions are needed. At this point,  we are trying to make it into the instance, which  is basically  the gate to all the databases inside  the instance. Let us get back to our example: bookkeeper might be marked as NOLOGIN because you want  people to log in with their real name.  All your  accountants (say joe and jane) might be marked as the LOGIN clause but can inherit all the permissions from the bookkeeper role. A structure like this makes  it easy to assure that all bookkeepers will have the same permissions while ensuring their individual activity is operated and logged under their separate identities.
If you are planning to run PostgreSQL with streaming replication, you can do all the transaction log streaming as superuser. However, doing  that is not recommended from a security point  of view. To assure that you don't have to be superuser to stream xlog, PostgreSQL allows  you to give replication rights  to a normal user, which  can then be used to do streaming. It is common practice to create a special user just for the purpose of managing streaming.
As you will see later in this chapter, PostgreSQL provides a feature called row level security. The idea is that you can exclude rows  from the scope of a user. If a user is explicitly supposed to bypass RLS, set this value  to BYPASSRLS. The default value  is NOBYPASSRLS.
Sometimes it makes  sense to restrict  the number of connections allowed for a user. CONNECTION  LIMIT allows  you to do exactly that. Note that overall  there  can never  be more connections than  defined in the postgresql.conf file (max connecti ons). However, you can always restrict  certain  users  to a lower  value. 
By default, PostgreSQL will store passwords in the system table encrypted, which  is a good default behavior. However, suppose you are doing  a training course.  10 students are attending and everybody is connected to your  box. You can be 100  certain  that one of those people will forget his or her password once in a while. As your  setup is not security critical, you might decide  to store the password in plain  text so that you can easily look it up and give it to a student. This feature might also come in handy if you are testing software.
Often you already know  that somebody will leave your  organization fairly soon. The VALID UNTIL clause allows  you to automatically lock out a specific user if his or her account has expired.
The IN  ROLE clause lists one or more existing  roles to which  the new role will be immediately added as a new member. It helps  to avoid  additional manual steps. An alternative to IN  ROLE is IN  GROUP.
ROLE clause will define  roles that are automatically added as members of the new role.
ADMIN clause is the same as the ROLE clause but adds the WITH  ADMIN  OPTION.
Finally, you can use the SYSID clause to set a specific ID for the user (similar  to what  some Unix administrators do for usernames on the operating system level).
Creating and modifying users
After this theoretical introduction, it is time to actually create users  and see how things  can be used  in a practical example:
The first thing  done  here is that a role called bookkeeper is created. Note that we don't want  people to log in as bookkeeper, so the role is marked as NOLOGIN.

Note also that NOLOGIN is the default value  if you use the CREATE  ROLE clause. If you prefer  the CREATE  USER clause, the default setting  is LOGIN. 
Then, the joe role is created and marked as LOGIN. Finally, the bookkeeper role is assigned to the joe role so that he can do everything a bookkeeper is actually allowed to do.
Once the users  are in place, we can test what  we have so far:
This will actually work  as expected. However, note that the Command Prompt has changed. This is just a way for PostgreSQL to show  you that you are not logged  in as a superuser.
Once a user has been created it might be necessary to modify it. One thing  you might want to change  is the password. In PostgreSQL, users  are allowed to change  their own passwords. Here is how it works:
ALTER  ROLE clause (or ALTER  USER) will allow you to change  most settings which  can be set during user creation. However, there  is even more to managing users.  In many  cases, you want  to assign  special parameters to a user. ALTER  USER clause gives you the means to do that:
The syntax  is fairly simple  and pretty straightforward. To show  you why this is really useful,  I have added a real world example. Let us suppose that, Joe happens to live on the island  of Mauritius. When he logs in, he wants to be in his time zone even if his database server  is located  in Europe:
ALTER  ROLE clause will modify the user. As soon as Joe reconnects, the time zone will already be set for him.

Note that the time zone is not changed immediately. You should either reconnect or use SET  ...  TO  DEFAULT clause.
The important thing  here is that this is also possible for some memory parameters such as work mem and so on, which  have already been covered earlier  in this book.

Defining database level security
After configuring users  at the instance level, it is possible to dig deeper and see what  can be done  at the database level. The first major question that arises is: we explicitly allowed Joe to log in to the database instance. But who or what  allowed Joe to actually connect  to one of the databases? Maybe, you don't want  Joe to access all the databases in your  system. Restricting access to certain  databases is exactly what  you can achieve  on this level. 
For databases, the following permissions can be set using  GRANT clause:
There are two major permissions on the database level that deserve really close attention:
CREATE: It allows  somebody to create a schema  inside  the database. Note that CREATE clause does not allow for the creation of tables; it is about  schemas. In PostgreSQL, a table resides inside  a schema,  so you have to get to the schema level first to be able to create a table.
CONNECT: It allows  somebody to connect  to a database.
The question now... Nobody has explicitly assigned CONNECT permissions to the joe role. So where do those permissions actually come from? The answer is this: there  is a thing called public,  which  is similar  to Unix world. If the world is allowed to do something, so is Joe, who is part  of the general public.
The main thing  is that public  is not a role in the sense that it can be dropped and renamed. You can simply  see it as the equivalent for everybody on the system.
So, to ensure that not everybody can connect  to any database at any time, CONNECT may have to be revoked from the general public.  To do so, you can connect  as superuser and fix the problem:
As you can see, the joe role is not allowed to connect  anymore. At this point  only superusers have access to test.
In general, it is a good idea to already revoke  permissions from the postgres database even before other  databases are created. The idea behind this concept is that those permissions won't be in all those newly  created databases anymore. If somebody needs access to a certain  database, rights  have to be explicitly granted. Rights are not automatically there  anymore. 
If you want  to allow the joe role to connect  to the test database, try the following line as superuser:
Basically there  are two choices here:

You can allow the joe role directly so that only the joe role will be able to connect.
Alternatively, you can grant  permissions to the bookkeeper role. Remember, the joe role will inherit all the permissions from the bookkeeper role, so if you want  all accountants to be able to connect  to the database, assigning permissions to the bookkeeper role seems like an attractive idea.
If you grant  permissions to the bookkeeper role, it is not risky because the role is not allowed to log in to the instance in the first place; so it purely serves as a source  of permissions.
Adjusting schema level permissions
Once you are done  configuring the database level, it makes  sense to take a look at the schema  level.
As you can see, Joe is having a bad day and basically  nothing but connecting to the database is allowed. 
However, there  is a small exception, and it comes as a surprise to many  people:
By default, public is allowed to work  with the public  schema,  which  is always around. If you are seriously interested in securing your  database, make sure that this problem is taken care of. Otherwise, normal users  will potentially spam  your  public  schema  with all kinds  of tables and your  entire  setup might suffer. Also keep in mind that if somebody is allowed to create an object, this person is also its owner. Ownership means that there  are automatically all permissions available to the creator (this includes the destruction of the object).
To take away  those permissions from public, run the following line as superuser:
From now on, nobody can put things  into your  public  schema  without permissions anymore:
As you can see, the command will fail. The important thing  here is the error  message you will get; PostgreSQL does not know  where to put these tables. By default, it will try to put the table into one of the following schemas:
In this case, you will get the error  message you expect. PostgreSQL denies access to the public  schema.
The next logical question now is: which  permissions can be set at the schema  level to give some more power to joe role?
CREATE means that somebody can put objects into a schema.  USAGE means that somebody is allowed to enter  the schema.  Note that entering the schema  does not mean  that something inside  the schema  can actually be used  those permissions have not been defined yet. Basically, this just means the user can see the system catalog  for this schema.
To allow joe role to access the table he has created previously, the following line will be necessary (executed as superuser):
The joe role is also able to add  and modify rows  because he happens to be the owner of the table. However, although he can do quite a lot of things  already, the joe role is not yet almighty. Consider the following statement:
Let us take a closer look at the actual  error  message. As you can see, the message complains about  permissions on the schema,  not about  permissions on the table itself (remember, joe role owns  the table). To fix the problem, it has to be tackled on the schema  and not on the table level. Run the following line as superuser:
Keep in mind that this is necessary if DDLs are used.  In my daily work  as a PostgreSQL support service provider, I have seen a couple  of issues where this turned out to be a problem.
Working with tables
After taking  care of bind addresses, network authentication, users,  databases, and schemas, you finally made it to the table level. The following snippet shows  which  permissions can be set for a table:
Let me explain  those permissions one by one:

SELECT: allows  you to read  a table.
INSERT: allows  you to add  rows  to the table (this also includes copy and so on  it is not only about  the INSERT clause). Note that if you are allowed to insert  you are not automatically allowed to read.  SELECT and INSERT clauses  are needed to be able to read  the data  you have inserted.
UPDATE: modifies the content of a table.
DELETE: is used  to remove rows  from a table.
TRUNCATE: allows  you to use the TRUNCATE clause. Note that the DELETE and TRUNCATE clauses  are two separate permissions because TRUNCATE clause will lock the table, which  is not done  by the DELETE clause (not even if there  is no WHERE condition).
REFERENCES: allows  the creation of foreign  keys. It is necessary to have this privilege on both the referencing and referenced columns otherwise the creation of the key won't work.
TRIGGER: allows  for the creation of triggers.
The nice thing  about  the GRANT clause is that you can set permissions on all tables in a schema  at the same time.
It greatly simplifies the process  of adjusting permissions. It is also possible to use the WITH GRANT  OPTION clause. The idea is to ensure that normal users  can pass on permissions to others,  which  has the advantage of being able to reduce the workload of administrators quite a bit. Just imagine a system that provides access to hundreds of users  it can start to be a lot of work  to manage all those people, and therefore administrators can appoint people managing a subset  of the data  themselves.
Handling column level security
In some cases, not everybody is allowed to see all the data.  Just imagine a bank. Some people might see the entire  information about  a bank account, while others  might be limited to only a subset  of the data.  In a real world situation, somebody might not be allowed to read the balance  column or somebody might not see the interest rates of people's loans.
Another example would be that people are allowed to see people's profiles  but not their pictures or some other  private information. The question now is: how can column level security be used?
To demonstrate that, I will add  a column to the existing  table belonging to the joe role:
Configuring default privileges
So far, a lot of stuff has already been configured. The trouble naturally arising  now is: what happens if new tables are added to the system?  It can be quite painful and risky to process these tables one by one and to set proper permissions. Wouldn't it be nice if those things would just happen automatically? This ids exactly what  the ALTER  DEFAULT  PRIVILEGES clause does. The idea is to give users  an option to make PostgreSQL automatically set the desired permissions as soon as an object comes into existence.  It cannot  happen anymore that somebody simply  forgets  to set those rights.
The following listing  shows  the first part  of the syntax  specification:
Digging into row level security   RLS
Up to this point,  a table has always been shown as a whole.  When the table contained 1 million  rows, it was possible to retrieve 1 million  rows  from it. If somebody had the rights to read  a table, it was all about  the entire  table. In many  cases, this is not enough. Often it is desirable that a user is not allowed to see all the rows.
Consider the following real world example: an accountant is doing  accounting work  for many  people. The table containing tax rates should really be visible to everybody as everybody has to pay the same rates. However, when it comes to the actual  transactions, “you” might want  to ensure that everybody is only allowed to see his or her own transactions. Person  A should not be allowed to see person B's data.  In addition to that, it might also make sense that the boss of a division is allowed to see all the data  in his part  of the company.
Row level  security has been designed to do exactly this and enables  you to build  multi  tenant systems in a fast and simple  way. The way to configure those permissions is to come up with policies. The CREATE  POLICY command is here to provide you with a means to write  those rules:
Let us inspect the policy I have just created in a more detailed way. The first thing  you see is that a policy actually has a name.  It is also connected to a table and allows  for certain operations (in this case, the SELECT clause). Then comes the USING clause. It basically defines what  the joe role will be allowed to see. The USING clause is therefore a mandatory filter attached to every query  to only select the rows  our user is supposed to see.
There is also one important side node: If there is more than just a single policy, PostgreSQL will use an OR condition. In short: More policies will make you see more data by default. In PostgreSQL 9.6 this was always the case. However, with the introduction of PostgreSQL 10.0 the user can choose whether conditions should be OR and AND connected:
By default PostgreSQL is “PERMISSIVE” so “OR” connections are at work. If you decide to use “RESTRICTIVE” then those clauses will be connected with “AND”.
Now  suppose that, for some reason, it has been decided that the joe role is also allowed to see robots.  There are two choices to achieve  our goal. The first option is to simply  use the ALTER  POLICY clause to change  the existing  policy:
As you can see, both the USING clauses  have been added as mandatory filters to the query. You might have noticed in the syntax  definition that there  are two types  of clauses:
USING: This clause filters rows  that already exist. This is relevant to SELECT and UPDATE clauses,  and so on.
CHECK: This clause filters new rows  that are about  to be created; so they are relevant to INSERT and UPDATE clauses,  and so on.
Here is what  happens if we try to insert  a row:
Inspecting permissions
When all permissions are set, it is sometimes necessary to know  who has which permissions. It is vital for administrators to find out who is allowed to do what. Unfortunately, this process  is not so easy and requires a bit of knowledge. Usually I am a big fan of command line usage.  However, in the case of the permission system, it can really make sense to use a graphical user interface to do things.
Before I show  you how to read  PostgreSQL permissions, I will assign  rights  to the joe role so that we can inspect them  in the next step:
It will return all those policies along with information about  Access  privileges. Unfortunately, those shortcuts are hard to read  and I have the feeling that they are not widely understood by administrators. In our example, the joe role has gotten  arwdDxt from postgres. What do those codes actually mean?

a: appends for the INSERT clause
r: reads  for the SELECT clause 
w: writes  for the UPDATE clause 
d: deletes for the DELETE clause
D: is used  for the TRUNCATE clause (when  this was introduced, t was already taken)
x: is used  for references
t: is used  for triggers

If you don't know  those codes, there  is also a second way to make things  more readable. Consider the following function call:
As you can see, the set of permissions is returned as a simple  table, which  makes  life really easy.
Reassigning objects and dropping users
After assigning permissions and restricting access, it can happen that users  will be dropped from the system. Unsurprisingly, the commands to do that are the DROP  ROLE and DROP USER commands:
PostgreSQL will issue error  messages because a user can only be removed if everything has been taken  away  from him. This makes  sense for this reason:  just suppose somebody owns a table. What should PostgreSQL do with that table? Somebody has to own them.
To reassign tables from one user to the next, consider taking  a look at the REASSIGN clause:
As you can see, the list of problems has been reduced significantly. What we can do now is resolve  all of those problems one after the other  and drop the role. There is no shortcut I am aware of. The only way to make that more efficient is to make sure that as few permissions as possible are assigned to real people. Try to abstract as much  as you can into roles, which in turn  can be used  by many  people. If individual permissions are not  assigned to real people, things  tend  to be easier in general. s
Summary
Database security is a wide  field and a 30 page chapter can hardly cover all the aspects  of PostgreSQL security. Many things  such as SELinux, security definer invoker, and so on were left untouched. However, in this chapter, you learned the most common things  you will face as a PostgreSQL developer and DBA. You learned how to avoid  the basic pitfalls and how to make your  systems more secure.
In the next chapter, you will learn about  PostgreSQL streaming replication and incremental backups. The chapter will also cover failover  scenarios.

