My Postgres cookbook

in #postgresql7 years ago

It’s not a secret that Postgres is a powerful database. After playing (some people call it using) I’ve found out that Postgres is not just amazing database but it can solve 99% of the problems you’ll encounter.


I want to share with you some of the capabilities that postgresql offer. I hope that you would like it and please feel free to share your feedback.

I’ve break down some of the features that help me during my work for the past couple of years, from search by tags, location, time series and etc… I think of this article as a quick cookbook for using postgres.

Tagging

Let’s say you build an app that helps you find the best restaurants in town and you would like to add a filter to the restaurant tags.
The schema should look something like this:
restaurant_id (PK) | Location {lat, lan} | tags {STRING} | …

Create:

insert into “restaurants” (“location”, tags”) values (‘{ “lat”: 12.12, “lng”: 12.12 }’, ‘{“sushi”}’);

Create will return the restaurant_id

*ignore the location column, I’ll cover it in a different post since location search is a big subject by itself.
Update: Now we would like to add a tag but keep the old tags, so we can use array_cat or ||

UPDATE “restaurants” SET tags = array_cat(topics, ‘{sea food, japanese}’);
UPDATE “restaurants” SET tags = topics || ‘{sea food, japanese, rice}’;

Remove: We realize that “rice” as tags doesn’t make sense let’s remove it.

UPDATE “restaurants” SET tags = array_remove(tags, ‘rice’);

Replace: We also realize we want better performance and would like to change ‘sea food’ to ‘seafood’

UPDATE “restaurants” SET tags = array_remove(tags, ‘sea food’) || ‘{seafood}’ WHERE topics @> ARRAY[‘sea food’];

Search: now you would like to filter the results by tags we can just use this query.

SELECT * FROM “restaurants” WHERE ‘sushi’ = ANY (tags)

Another way for searching is to search by specific tag and filter out values by another tag.

SELECT * FROM “restaurants” WHERE ‘romantic’ = ANY (tags) and ‘sushi’ != ALL (tags) 

In the last query we will get restaurants that tag as romantic but not tag as sushi

TimeSeries

Today IOT is defiantly a HOT term! storing your data in a time series format is crucial for success. We can find time series data everywhere from smart house devices up to self driving cars that send their data such as battery consumption, location, logs, etc…

In most cases before designing my data schema I think about the usage, meaning how what’s the trade-off between read/write is it 50/50 or 80/20 mostly write to the database.

This will effect how we design the schema, the indexes that we should use and how we’re going to shared our data across the databases.
I design this schema to be very generic and to be able to solve most of the issues with timeseries, so it might not fit to your problem 100% so feel free to customize the schema to be more fitted to your needs but the basic idea is here:

A schema example is:

Timestamp I would use ISO 8601 with milliseconds support, it’s human readable and support up to milliseconds so I’ll be able to support more than just seconds data. The timestamp column will help us to shared the data and search between the time range.

For the timestamp column, the data type should be timestamp without time zone this mean you’ll have to make sure that every timestamp you inserting to your database is a UTC timestamp and NOT GMT.

SOURCE this should be serial data type (4 bytes) bigserial (8 bytes) depends on how many sources you’ll have. both data types are an auto-increment integer

DATA column is your binary json (jsonb), this will help you to be schema-free and treat your PostgreSQL schema like a no-sql database. you’ll be able to move fast and never fail on an insert to your database. you’ll be able to insert any payload to this column. There are two type of json and jsonb the different between binary json and plain json is that json in it’s binary format supports indexing and searching by specific field the disadvantage of using JSONB is that it’s slower to insert (That’s one of the reason why you want to shared your data).

TYPE column can be STRING (varchar)or Number(integer) the type will help you to filter the data after sorting it by source and dates range.

Our database should be looking like this:

We’re able to search by date and filter by source and type.
For Example:

SELECT * FROM timeseries WHERE SOURCE=1 AND timestamp between20170101T00:00:00.000Z’ and20170201T00:00:00.000Z’ AND TYPE=’battery’

That’s a pretty basic and simple query and you might want to break your query to specific time interval for example break the data to 10 min interval (meaning equal time gap), We can use number of postgres native function to achieve it:
to_timestamp: convert string to time stamp
extract: extract(field from timestamp)
epoch: For date and timestamp values, the number of seconds since 1970–01–01 00:00:00–00
An example from Postgres site

SELECT EXTRACT(EPOCH FROM timestamptz ‘20130701 12:00:00’) -
 EXTRACT(EPOCH FROM timestamptz ‘20130301 12:00:00’);
Result: 10537200

time series query example, let’s say for example we want to print a battery report for a 10 min interval for a self driving car and let’s say that we have a fleet of self driving car, We would like to view the logs of car id 1 and view the battery data in a 10 min interval between a specific date range.

SELECT DISTINCT ON (time_series) time_series, data, type FROM (SELECT TO_TIMESTAMP((EXTRACT(EPOCH FROM timestamp) / (600))::int * 600) AS time_series, data, timestamp, type
FROM timeseries WHERE source=1 AND type = ‘battery’
AND timestamp BETWEEN20170101T00:00:00.000Z’ and20170125T00:10:00.000Z’) timeseries ORDER BY time_series, ABS(EXTRACT(EPOCH FROM timestamp) — EXTRACT(EPOCH FROM time_series)) limit 1000

Wait what?!? don’t worry let’s explain what’s happen in this query?!
I’ll start from the inner query

SELECT TO_TIMESTAMP((EXTRACT(EPOCH FROM timestamp) / (600))::int * 600)

This will return a list of dates break to 600 seconds interval (our 10 min interval)
AS time_series, data, timestamp, type

FROM timeseries WHERE source=1 AND type = ‘battery’

Here we define the columns and filter the data by the source to be 1 and the type.

AND timestamp BETWEEN20170101T00:00:00.000Z’ and20170125T00:10:00.000Z’)

Set the date range

timeseries ORDER BY time_series, ABS(EXTRACT(EPOCH FROM timestamp) — EXTRACT(EPOCH FROM time_series)

this will order our result by time_series. This part will make sure we get the "closest" time to our time interval, meaning it will choose timestamp=9:59:59 instead of choosing timestamp=10:02:00 for the 10:00 time interval.
Than the outer query will distinct the result by our time_series column.

If you like it please comment, share or just press the like button.

Also, if you find a bug or have question feel free to comment, or ping me.

cheers,
Doron.

Sort:  

Congratulations @doron2402! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Do not miss the last post from @steemitboard:

The new SteemFest⁴ badge is ready
Vote for @Steemitboard as a witness to get one more award and increased upvotes!