{"id":605,"date":"2015-10-17T22:47:42","date_gmt":"2015-10-17T21:47:42","guid":{"rendered":"http:\/\/www.muratyaman.co.uk\/wp\/?p=605"},"modified":"2020-04-01T13:05:01","modified_gmt":"2020-04-01T12:05:01","slug":"full-text-search-using-postgresql-simple-fast-cached","status":"publish","type":"post","link":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/2015\/10\/full-text-search-using-postgresql-simple-fast-cached\/","title":{"rendered":"Full-text search using PostgreSQL: simple, fast, cached"},"content":{"rendered":"<p>Let&#8217;s find out what can we achieve; using what we already have, without adding other layers, servers or applications: PostgreSQL. It will help us keep the architecture simple, please be patient \ud83d\ude42<\/p>\n<p>We need to prepare records for full-text search using 1-to-1 relationship for relevant records in other tables with search term vector from one or more text fields (the assumption is that each of those tables has a unique integer field):<\/p>\n<p><code lang=\"sql\"><br \/>\nCREATE TABLE public.search_index (<br \/>\n  table_name VARCHAR(50) NOT NULL,<br \/>\n  id INTEGER NOT NULL,<br \/>\n  search_vector TSVECTOR,<br \/>\n  CONSTRAINT search_index_pkey PRIMARY KEY(table_name, id)<br \/>\n)<br \/>\nWITH (oids = false);<br \/>\n<\/code><\/p>\n<p>Here is the infamous full-text search index on the search term vector:<\/p>\n<p><code lang=\"sql\"><br \/>\nCREATE INDEX search_index_idx_search_vector ON public.search_index<br \/>\n  USING gin (search_vector);<br \/>\n<\/code><\/p>\n<p>We need to insert and update records in our search index table; for example, when we add a user record in the users table using the text fields such as address details and technical skills.<\/p>\n<p><code lang=\"sql\"><br \/>\nINSERT INTO search_index (table_name, id, search_vector)<br \/>\nSELECT<br \/>\n  'users' AS table_name,<br \/>\n  p.user_id AS id,<br \/>\n  to_tsvector(<br \/>\n    COALESCE(u.name, '') || ' ' ||<br \/>\n    COALESCE(u.surname, '') || ' ' ||<br \/>\n    COALESCE(u.address_1, '') || ' ' ||<br \/>\n    COALESCE(u.address_2, '') || ' ' ||<br \/>\n    COALESCE(u.city, '') || ' ' ||<br \/>\n    COALESCE(u.country, '') || ' ' ||<br \/>\n    COALESCE(u.skills, '')<br \/>\n ) AS search_vector<br \/>\nFROM users u;<br \/>\n<\/code><\/p>\n<p>NOTE: It is a good idea to write trigger function(s) (after insert, after update and after delete) on source table(s) to take care of synchronizing the records in search_index table.<\/p>\n<p>How can we use the search index table?<br \/>\nVery simple: using to_tsquery() function and powerful PostgreSQL operators:<\/p>\n<p><code lang=\"sql\"><br \/>\nSELECT<br \/>\n  i.id,<br \/>\n  ts_rank(i.search_vector, to_tsquery('london')) AS rank<br \/>\nFROM search_index i<br \/>\nWHERE i.table_name = 'users'<br \/>\n  AND i.search_vector @@ to_tsquery('london')<br \/>\nLIMIT 10<br \/>\nOFFSET 0<br \/>\n;<br \/>\n<\/code><\/p>\n<p>We can easily join this query with the source table and get the results straight away:<\/p>\n<p><code lang=\"sql\"><br \/>\nSELECT q.rank, u.*<br \/>\nFROM (<br \/>\n  SELECT<br \/>\n    i.id,<br \/>\n    ts_rank(i.search_vector, to_tsquery('london')) AS rank<br \/>\n  FROM search_index i<br \/>\n  WHERE i.table_name = 'users'<br \/>\n    AND i.search_vector @@ to_tsquery('london')<br \/>\n  LIMIT 10<br \/>\n  OFFSET 0<br \/>\n) q<br \/>\nINNER JOIN users u ON q.id = u.id<br \/>\n;<br \/>\n<\/code><\/p>\n<p>However, we can go further and add a table to cache the search terms for each table that we want to enable full-text search:<\/p>\n<p><code lang=\"sql\"><br \/>\nCREATE TABLE public.search_cache (<br \/>\n  search_cache_id SERIAL NOT NULL,<br \/>\n  table_name VARCHAR(50) NOT NULL, -- also used in search_index.table_name<br \/>\n  term VARCHAR(50) NOT NULL,<br \/>\n  created_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL,<br \/>\n  updated_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL,<br \/>\n  usage_count INTEGER DEFAULT 1 NOT NULL,<br \/>\n  PRIMARY KEY(search_cache_id)<br \/>\n)<br \/>\nWITH (oids = false);<br \/>\n<\/code><\/p>\n<p>We don&#8217;t want to duplicate values for table name and term; let&#8217;s create a unique index:<\/p>\n<p><code lang=\"sql\"><br \/>\nCREATE UNIQUE INDEX search_cache_unq_table_name_and_term ON public.search_cache<br \/>\n  USING btree (table_name COLLATE pg_catalog.\"default\", term COLLATE pg_catalog.\"default\");<br \/>\n<\/code><\/p>\n<p>Here is an example to populate it:<\/p>\n<p><code lang=\"sql\"><br \/>\nINSERT INTO search_cache (table_name, term)<br \/>\nVALUES ('users', 'london')<br \/>\nRETURNING search_cache_id; -- ==> e.g. 1<br \/>\n<\/code><\/p>\n<p>Then, we can store row IDs from the result of a search cache:<\/p>\n<p><code lang=\"sql\"><br \/>\nCREATE TABLE public.search_cache_row (<br \/>\n  search_cache_id INTEGER NOT NULL, -- refers to search_cache.search_cache_id<br \/>\n  id INTEGER NOT NULL, -- also used in search_index.id<br \/>\n  rank REAL NOT NULL,<br \/>\n  CONSTRAINT search_cache_row_pkey PRIMARY KEY(search_cache_id, id)<br \/>\n)<br \/>\nWITH (oids = false);<br \/>\n<\/code><\/p>\n<p>This is how we can populate that table:<\/p>\n<p><code lang=\"sql\"><br \/>\nINSERT INTO search_cache_row (search_cache_id, id, rank)<br \/>\nSELECT<br \/>\n  1 AS search_cache_id,<br \/>\n  i.id,<br \/>\n  ts_rank(i.search_vector, to_tsquery('london')) AS rank<br \/>\nFROM search_index i<br \/>\nWHERE i.table_name = 'users'<br \/>\n  AND i.search_vector @@ to_tsquery('london')<br \/>\nLIMIT 100<br \/>\nOFFSET 0<br \/>\n;<br \/>\n<\/code><\/p>\n<p>As long as the source records and search term do not change, the final result will not change, so we can retrieve the results faster. We can also store recent search terms in memory by using Memcached.<\/p>\n<p>Finally, the following code shows how to use the cache rows:<\/p>\n<p><code lang=\"sql\"><br \/>\nSELECT q.rank, u.*<br \/>\nFROM (<br \/>\n  SELECT<br \/>\n    c.id,<br \/>\n    c.rank<br \/>\n  FROM search_cache_row c<br \/>\n  WHERE c.search_cache_id = 1<br \/>\n  LIMIT 10<br \/>\n  OFFSET 0<br \/>\n) q<br \/>\nINNER JOIN users u ON q.id = u.id<br \/>\n;<br \/>\n<\/code><\/p>\n<p>Reference: <a href=\"http:\/\/www.postgresql.org\/docs\/9.4\/static\/textsearch-tables.html\">http:\/\/www.postgresql.org\/docs\/9.4\/static\/textsearch-tables.html<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Let&#8217;s find out what can we achieve; using what we already have, without adding other layers, servers or applications: PostgreSQL. It will help us keep the architecture simple, please be patient \ud83d\ude42<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[194,122,142,33],"class_list":["post-605","post","type-post","status-publish","format-standard","hentry","category-technology","tag-performance","tag-postgresql","tag-simplicity","tag-sql"],"_links":{"self":[{"href":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/wp-json\/wp\/v2\/posts\/605","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=605"}],"version-history":[{"count":10,"href":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/wp-json\/wp\/v2\/posts\/605\/revisions"}],"predecessor-version":[{"id":1001,"href":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/wp-json\/wp\/v2\/posts\/605\/revisions\/1001"}],"wp:attachment":[{"href":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=605"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=605"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.muratyaman.co.uk\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=605"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}