in Technology

Help! – MySQL Aggregate functions and table joins

I am stuck onto a problem and would appreciate some help on this! Here’s the problem:

I am using aggregate functions (eg. SUM) of MySQL in a query where two tables are joined. I also need to filter the results based on some WHERE clause. Now the problem is, SUM sums up the values multiple times since one row from one table repeats due to the join.

Let me explain with an example. Suppose the two tables I am working with are transactions (T) and transaction_products (TP). T stores the details of the transaction (date, total price, discount, customerId etc) and TP stores line item details (productId, item price, item discount etc). And I want to get the total sales for an order where a particular product was sold.

Here’s what I do:
[sql]SELECT SUM(t.totalPrice), tp.productId FROM transactions t, transaction_products tp WHERE t.id = tp.transId GROUP BY tp.productId[/sql]

Now that does not work correctly. And even when I simply want to get the total sales for all orders, the query becomes
[sql]SELECT SUM(t.totalPrice) FROM transactions t, transaction_products tp WHERE t.id = tp.transId[/sql]
And this is wrong! It joins the two tables, and repeats the totalPrice in the result, and sums up those values. Resulting in a much larger sales than the actual 🙂

How to solve this?

You can say that we can avoid the join where we want to SUM on a field from T, but that would not help in a condition like this:
[sql]SELECT SUM(t.totalPrice), tp.catId FROM transactions t, transaction_products tp WHERE t.id = tp.transId AND t.transDate between “2007-01-01” and “2007-10-31” GROUP BY tp.catId[/sql]

If I sum up all the result rows of the above query, the result is about 3 times higher than this:
[sql]SELECT SUM(t.totalPrice) FROM transactions t WHERE t.transDate between “2007-01-01” and “2007-10-31″[/sql]

And the problem is that I need the category Id (or some other field from TP table)!

Any solutions to this?

I thought of derived tables (as mentioned here) but that does not solve the problem.

I thought of splitting the query in two parts. Storing the results of SUM in a temp table and then doing the WHERE clause – but that would fail outright since the results do not include the WHERE condition.

What else can I do? What do you think?

Write a Comment

Comment

  1. Let me talk about the reason why this problem occurs. (and this is not a MySQL bug, it’s standard SQL feature!)

    When I select fields from two different tables, the tables will be joined. If one table had rows X and Y, and another had 1, 2 and 3, a join will show results like:

    X, 1
    X, 2
    X, 3
    Y, 1
    Y, 2
    Y, 3

    Aggregate functions work on the final result. Which means SUM will happen on these six rows, not the three rows in second table. So the sum here will be 12, and not 6.

    Now this is how it should behave, but I want it to behave differently! So the question is, how can I do it? 🙂

  2. I am glad I have folks at office who are more intelligent than I am!

    Here’s a solution to the problem, I am currently verifying if it works in all conditions:

    SELECT SUM(T.totalPrice) AS total, P.catId
    FROM transactions AS T
    INNER JOIN (
    SELECT catId, transId
    FROM transaction_products tp
    GROUP BY tp.transId
    ) AS P ON P.transId = T.id
    AND transDate BETWEEN “2007-01-01” AND “2007-10-31”
    group by P.catId

  3. I’ve run into this issue myself. I’m hoping you’ve found the solution on your own by this point, but this is the solution, which works, that was provided for me.

    You will still need to do a table join, but the value you generate from the select column will need to be from a subquery, not an aggregate function.

    Instead of:
    SELECT count( transID ) as transactioncount, sum( partsOrdered ) as totalpartsordered, …

    Do this:
    SELECT count( transID ) as transactioncount, ( SELECT sum( partsOrdered ) FROM partstable ) as totalpartsordered, …

    This will get ugly very quickly since you’ll need to put conditions within your subquery based on the conditions for the global query.

    I hope this provides more answers than questions. If you are only looking for one line of data, this is great. If you are looking for multiple rows of results, email me and I’ll give you some more insight…

  • Related Content by Tag