Skip to content

Predict expenses for the next three months

Keyword

nested subquery, machine learning algorithm analysis, cost analysis, predictive analysis

correlation function

date_format,sum,from_unixtime,sls_inner_ts_regression,cast,max,to_unixtime,date_trunc,date_add,now

Aggregate and calculate the total expense by day and sort the data by time. Find the current date and time, and use the sls_inner_ts_regression function to predict the next 90 points. Aggregate the predicted points and time by month, and then calculate the predicted monthly expense after the current date and time. The results are displayed in a column chart, in which the x-axis represents the time and the y-axis represents the predicted expense.

SQL
source :bill |
select
  date_format(from_unixtime(time), '%Y-%m') as t,
  sum(predict) as "Predicted expense"
from(
    select
      *
    from
      (
        select
          res.stamp as time,
          res.real as src,
          res.pred as predict
        from
          (
            select
              sls_inner_ts_regression(
                cast(day as bigint),
                total,
                array ['total'],
                cast(86400 as bigint),
                cast(90 as bigint)
              ) as res
            from
              (
                select
                  *
                from
                  (
                    select
                      *,
                      max(day) over() as lastday
                    from
                      (
                        select
                          to_unixtime(date_trunc('day', __time__)) as day,
                          sum(PretaxAmount) as total
                        from
                          log
                        group by
                          day
                        order by
                          day
                      )
                  )
                where
                  day < lastday
              )
          )
      )
    where
      time >= to_unixtime(date_trunc('month', now()))
      and time < to_unixtime(date_add('month', 3, date_trunc('month', now())))
  )
group by
  t
order by
  t
limit
  100000