### Problem description

Assuming you have this database (using PostgreSQL syntax):```
create table transactions (
id bigint not null primary key,
lines bigint not null,
price numeric(18, 2) not null,
profit numeric(18, 2) not null
);
create table lines (
id bigint not null primary key,
transaction_id bigint not null references transactions,
total bigint not null,
quantity bigint not null,
profit numeric(18, 2) not null
);
```

`transactions.lines`

column. This will turn out to be quite useful for this calculation, but it isn’t strictly necessary.
Now, in the previously linked Stack Overflow question, a report was desired that would calculate:
- An aggregation of sums as provided by the line items
- An aggregation of averages as provided by the transactions

**Sums provided by the line items**

```
SELECT
sum(profit) AS total_profit,
sum(total) AS total_sales_amount,
sum(quantity) AS total_items_sold
FROM lines
```

**Averages provided by the transactions**

```
SELECT
avg(lines) AS avg_items_p_trx,
avg(price) AS avg_price_p_trx,
avg(profit) AS avg_profit_p_trx
FROM transactions
```

### Doing it in one query

Now, these queries are simplified from the original, which needed to join the two tables in order to add additional predicates. Also, let’s assume that these tables are quite large, so running two queries might lead to the report being too slow. A single query would be much better. We might be attempted to simply combined the two:```
-- Wrong query
SELECT
sum(l.profit) AS total_profit,
sum(l.total) AS total_sales_amount,
sum(l.quantity) AS total_items_sold,
avg(t.lines) AS avg_items_p_trx,
avg(t.price) AS avg_price_p_trx,
avg(t.profit) AS avg_profit_p_trx
FROM lines AS l
JOIN transactions AS t ON t.id = l.transaction_id
```

```
SELECT
l.id AS line_id,
t.id AS transaction_id,
t.lines,
t.price
FROM lines AS l
JOIN transactions AS t ON t.id = l.transaction_id
```

LINE_ID TRANSACTION_ID LINES PRICE ------------------------------------------- 1 1 3 20.00 2 1 3 20.00 3 1 3 20.00 4 2 5 100.00 4 2 5 100.00 4 2 5 100.00 4 2 5 100.00 4 2 5 100.00

- The average number of lines “avg_items_p_trx” should be 4 = (3 lines + 5 lines) / 2 transactions. But if we calculate
`avg(t.lines)`

over the entire data set, we get 4.25 (3×3 lines + 5×5 lines) / 8 items. - The average price “avg_price_p_trx” should be 60.00 = (20.00 + 100.00) / 2 transactions. But if we calculate
`avg(t.price)`

over the entire data set, we get 80.00 (3×20.00 + 5×100.00) / 8 items.

### How can this be fixed?

Given that each transaction is duplicated because of the join with lines, we have to calculate a weighted average, not an ordinary average. The idea is that instead of using the`AVG()`

aggregate function, we now have to divide the value we want to get an average of by the number of items (i.e. the number of times the value is repeated because of the join), and then divide the sum of that division by the number of transactions.
Prose never describes logic well, so let’s use code. The correct query is:
```
SELECT
sum(l.profit) AS total_profit,
sum(l.total) AS total_sales_amount,
sum(l.quantity) AS total_items_sold,
sum(t.lines / t.lines) / count(DISTINCT t.id) avg_items_p_trx,
sum(t.price / t.lines) / count(DISTINCT t.id) avg_price_p_trx,
sum(t.profit / t.lines) / count(DISTINCT t.id) avg_profit_p_trx
FROM lines AS l
JOIN transactions AS t ON t.id = l.transaction_id
```

LINE_ID TRANSACTION_ID LINES LINES/LINES PRICE PRICE/LINES ---------------------------------------------------------------- 1 1 3 1 20.00 6.66 2 1 3 1 20.00 6.66 3 1 3 1 20.00 6.66 4 2 5 1 100.00 20.00 4 2 5 1 100.00 20.00 4 2 5 1 100.00 20.00 4 2 5 1 100.00 20.00 4 2 5 1 100.00 20.00We now get the correct weighted averages:

- The average number of lines “avg_items_p_trx” is now 4 =

(3/3 + 3/3 + 3/3 + 5/5 + 5/5 + 5/5 + 5/5 + 5/5) / distinct transactions - The average price “avg_price_p_trx” is now 60.00 =

(20.00/3 + 20.00/3 + 20.00/3 + 100.00/5 + 100.00/5 + 100.00/5 + 100.00/5 + 100.00/5) / 2 distinct transactions

```
SELECT
sum(l.profit) AS total_profit,
sum(l.total) AS total_sales_amount,
sum(l.quantity) AS total_items_sold,
count(*) / count(DISTINCT t.id) avg_items_p_trx,
sum(t.price / t.lines) / count(DISTINCT t.id) avg_price_p_trx,
sum(t.profit / t.lines) / count(DISTINCT t.id) avg_profit_p_trx
FROM lines AS l
JOIN transactions AS t ON t.id = l.transaction_id
```

### Normalised version

Notice that this solution profited from the fact that the number of lines per transaction was pre-calculated. We can of course also calculate it on the fly, e.g. using window functions. If it weren’t available, we could do it like this:```
SELECT
sum(l.profit) AS total_profit,
sum(l.total) AS total_sales_amount,
sum(l.quantity) AS total_items_sold,
count(*) / count(DISTINCT t.id) avg_items_p_trx,
sum(t.price / l.lines) / count(DISTINCT t.id) avg_price_p_trx,
sum(t.profit / l.lines) / count(DISTINCT t.id) avg_profit_p_trx
FROM (
SELECT
l.*,
count(*) OVER (PARTITION BY l.transaction_id) lines
FROM lines AS l
) AS l
JOIN transactions AS t ON t.id = l.transaction_id
```

```
SELECT
sum(l.profit_per_transaction) AS total_profit,
sum(l.total_per_transaction) AS total_sales_amount,
sum(l.quantity_per_transaction) AS total_items_sold,
avg(l.lines_per_transaction) AS avg_items_p_trx,
avg(t.price) AS avg_price_p_trx,
avg(t.profit) AS avg_profit_p_trx
FROM (
SELECT
l.transaction_id
sum(l.profit) AS profit_per_transaction,
sum(l.total) AS total_per_transaction,
sum(l.quantity) AS quantity_per_transaction,
count(*) AS lines_per_transaction
FROM lines AS l
GROUP BY l.transaction_id
) AS l
JOIN transactions AS t ON t.id = l.transaction_id
```