Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETA estimate not accurate on long-running processes when the rate changes over the time #247

Open
Jylpah opened this issue Jul 5, 2023 · 12 comments
Labels
feature request No promises...

Comments

@Jylpah
Copy link

Jylpah commented Jul 5, 2023

Hello,

first of all a big thanks for a great package. I love ❤️ it. I have noticed that alive-progress is very-very slow to react to rate changes in long running processes. I have a batch run that in the beginning runs 60/s, then after maybe 5 hours it drops to 40/s and then maybe few hours it drops to 20/s and continues still ~6 hours. When the real progress is ~20/s, alive-progress will report ~40/s rate and very-very slowly dropping it leading to false ETA.

Is there an option control the responsiveness of the rate estimator? Now the (Moving Average) rate estimator has way too long lead time.

If I am not mistaken, the culprit is this:

run.rate = gen_rate.send((processed(), run.elapsed))

This line calculates the rate of the cumulative total vs. total elapsed time vs. calculating it from the difference from the previous change.

@rsalmei
Copy link
Owner

rsalmei commented Jul 5, 2023

Hi @Jylpah, thanks, man!

Humm no, there isn't any option to control its responsiveness at the moment.
Actually, it is not a Moving Average estimator, it is a Simple Exponential Smoothing generator, where the ETA comes from two layers of it. What I display is a very complex calculation using both a smoothed rate and a smoothed ETA.

I made a study about it 3 years ago when I was implementing it. It is published here as an Apple Numbers document. Here is a screenshot of it:
image

As you can see, the goal was to minimize abrupt changes, so the ETA seems (and actually gets) more precise.
I have an internal alfa coefficient, which controls how much "influence" I get from the history and how much from the actual current value. For the ETA it is 0.5:

gen_eta = gen_simple_exponential_smoothing(.5, fn_simple_eta(logic_total))

Perhaps I could let one change this internal configuration? For example, if you set it to 0.8 to "use" much more of the current value, you'd get a curve like this, which bumps a lot more but does represent the actual values better:

image

What do you think? Thoughts?

@Jylpah
Copy link
Author

Jylpah commented Jul 5, 2023

Thank you for your detailed answer,

Yes, configurable alpha would be great. In my case the differences between the ETA estimate and reality is ~8 hours (total 18h).

@Jylpah
Copy link
Author

Jylpah commented Aug 15, 2023

Hi @rsalmei , Did you conclude whether the configurable alpha would be worth of the effort?

@rsalmei
Copy link
Owner

rsalmei commented Nov 8, 2023

Hi @Jylpah,
Sorry man, I completely forgot about this one.
I'll think it better for the next version.

@AlberLC
Copy link

AlberLC commented Jan 15, 2024

Hello, in the tqdm package they have a parameter to configure that. From the documentation:

smoothing: float, optional
Exponential moving average smoothing factor for speed estimates (ignored in GUI mode). Ranges from 0 (average speed) to 1 (current/instantaneous speed) [default: 0.3].

I switched to alive-progress just for the animation
gif, since the other package remained too static while processing very large files that took a long time to finish, but I missed the option to adjust the average. In my program the eta can stay in ~20s for half an hour...

@rsalmei
Copy link
Owner

rsalmei commented Jan 15, 2024

Yikes, really? Half an hour showing ~20s?
This means your processing should start really really fast, then comes into a nearly halt, isn't it?
Well, I do use an Exponential Smoothing, I just do not externalize its factor. The default is 0.5 I think. What number would be good for you?

@AlberLC
Copy link

AlberLC commented Jan 15, 2024

Half an hour showing ~20s?

Yep, maybe after 10 min it changes to ~21s, but approximately it stays around the same number for too long, which is not a good time estimate

This means your processing should start really really fast, then comes into a nearly halt, isn't it?

Correct. I process a lot of files and generally there are more files of a few KB or MB than GB. When it's the turn of a long and heavy video file...

The default is 0.5 I think. What number would be good for you?

I don't know, with the average that tqdm made I used to like between 0.1 and 0.3, but it's trying and trying until I find an estimate that I like, there is never a perfect one.

That's why I missed being able to adjust it to my particular case here 😢 but very good library ❤️

@rsalmei
Copy link
Owner

rsalmei commented Jan 16, 2024

Thank you, man.
I see.. Well, I can include this config for you. Don't worry.
Actually, you're the second one that asks for this, so I'll try to prioritize it.

@rsalmei
Copy link
Owner

rsalmei commented Jan 18, 2024

Hey @AlberLC, I'm working on this today, but unfortunately, it is not as easy as it seemed.

The ETA is based on the Rate value, and both use Exponential Smoothing to get their final values.
I tried configuring both alphas but couldn't fix the case you presented, which made me think...

The rate is actually calculated with:

gen_rate = gen_simple_exponential_smoothing(config.rate_alpha, lambda pos, elapsed: pos / elapsed)

Can you see it? The input value for the smoothing is "the current position divided by the elapsed time", i.e., I do not capture the instantaneous values at all, so I can't really detect that the current rate is much different than the whole average processing one... 😕
I'm not sure how I can fix this.

A practical example:

with alive_bar(1000) as bar:
    for i in range(500):
        time.sleep(0.005)
        bar()

Running this, we can see that the Rate is ~160/s.

But if I run this:

with alive_bar(1000, rate_alpha=1) as bar:
    for i in range(500):
        bar()
    for i in range(500):
        time.sleep(0.005)
        bar()

We can never receive an instantaneous rate value of ~160/s, because the whole current position and the whole elapsed time are always taken into account. This means that, even with an alpha value of 1.0 (which discards the history completely), at the end of the processing we will see ~320/s (because half of it was in a snap it makes total sense).

Any idea how I could make this happen?

@AlberLC
Copy link

AlberLC commented Jan 19, 2024

I have spent more time than I would dare say to understand the problem and the code because my mathematical brain is a little rusty and I am not used to such a functional code (in fact you have taught me that yield can receive values ​​with send 😅) but now I understand it.

At first I thought that the alpha you were telling me meant that:

  • 0 -> average (iterations / elapsed)
  • 1 -> instantaneous speed based on how long the last iteration took

tqdm works like this and I thought it would be the same case but after testing and debugging internally I understood that the line

y_hat += alfa * (y - y_hat)

in alive-progress makes it like:

  • 0 -> the ratio would never be updated so it would take the first value calculated forever
  • 1 -> average (iterations / elapsed)

That is, the 1 of alive-progress is the 0 of tqdm.

alive-progress:

import time

from alive_progress import alive_bar

with alive_bar(1000, rate_alpha=1) as bar:
    for i in range(500):
        bar()
    for i in range(500):
        time.sleep(0.005)
        bar()

tqdm:

import time

from tqdm import tqdm

with tqdm(total=1000, smoothing=0, mininterval=0) as bar:
    for i in range(500):
        bar.update()
    for i in range(500):
        time.sleep(0.005)
        bar.update()

Above our alpha is smoothing. mininterval=0 is to make the animation look as smooth as alive_progress 😌.

Therefore, what I missed about tqdm is not possible since alpha is used here to smooth out the changes (as you showed in your charts) and the whole elapsed time are always taken into account.

For my case it would be best to always use 1 because I have so many files that the average calculated continuously on the current number of iterations would never cause any changes that were too abrupt, in fact I need it to be even more abrupt to provide an eta that makes sense.

Perhaps the way forward would be to merge both methods: apply something similar to what tqdm does first, that is, being able to configure an average-instantaneous value and then, if you want the resulting value to have less impact in each iteration, use your algorithm. It would be like having "2 alphas".

Now, the problem is that alive-progress does not capture how long each iteration takes to get the instantaneous speed (I also don't know what tqdm does exactly but I assume it does that). However, seeing the complexity of the project and what you can do, I have no doubt that you will find a solution😉

@rsalmei rsalmei added the feature request No promises... label Jan 28, 2024
@rsalmei rsalmei added this to the 3.2 milestone Jan 28, 2024
@rsalmei
Copy link
Owner

rsalmei commented Jan 28, 2024

Hey @AlberLC,

I've been thinking about this and made some tests.
I've got the instantaneous values working! But I'm apprehensive...
Since an element will always be ongoing when a refresh occurs, it gets very inaccurate just counting how many items have finished since the last refresh... I did that for now but the numbers vary wildly, even with the smoothing. It is weird.
But the bright side is that it converges very very fast to the current rate, even with the default alphas!! Look at this:

Screen.Recording.2024-01-28.at.04.47.06.mov

So, I'm not sure what to think. It correctly and quickly gets to ~160/s now, and correctly shows ~320/s on the final receipt so it seems correct, but it jiggles a lot...

Regarding the interpretation of the alphas, I'm confused.
You're right that alive-progress will keep the first number forever when 0, and the average when 1. But this is exactly what Simple Exponential Smoothing is. Look here on Wikipedia. You can see in the very first equation that, if the alpha is 0, xt is not considered anymore (the input), thus the output will always be the previous value. That's the equation I've implemented.

If I do keep it working with current/instantaneous values, I'd have 1 as the current like tqdm, but never a number that is not static on the 0. I could calculate a moving average and input it to the smoothing generator, but then the 1 would be it, 1 is always the exact input without considering the history.
So, I'm not sure how they do it, how would I range from a "moving average" to the "current"??
Would you know if there is another method I'm not aware of?

@Jylpah
Copy link
Author

Jylpah commented May 23, 2024

Hi @rsalmei ,

I finally got my head around the code and attempted to implement the rate_alpha option myself. See @dev branch in my fork.

I noticed the "jitter" effect in rate as you mentioned and ended up reducing it by setting a minimum sample for rate update (currently fixed value 3). Based on quick test it is quite promising. However, I think the current version works only with manual=False. I have not tested manual=True yet.

Proposal

Use delta values and set a minimum change for updating the range. Summary of the changes (not a full diff).

utils/timing.py

https://github.com/Jylpah/alive-progress/blob/5a1037e813f49b844991234884bca87dfe2579ef/alive_progress/utils/timing.py#L27C1-L46C1

elapsed_prev: float = 0.0
total_prev: int = 0
rate_prev: float = 0.0
min_change: int = 3

def calc_rate(total: int, elapsed: float) -> float:
    """
    Calculate periodic rate out of total values
    """
    global elapsed_prev, total_prev, rate_prev
    time: float = elapsed - elapsed_prev
    change: int = total - total_prev
    if change > min_change:  # This is needed to avoid jitter
        elapsed_prev = elapsed
        total_prev = total
        rate_prev = change / time
    return rate_prev

core/progress.py

https://github.com/Jylpah/alive-progress/blob/5a1037e813f49b844991234884bca87dfe2579ef/alive_progress/core/progress.py#L239C1-L242C1

    gen_rate = gen_simple_exponential_smoothing(
        config.rate_alpha, lambda pos, elapse: calc_rate(pos, elapse)
    )

Any comments? If you think the solution has some potential, could you please advice me what parts I am missing still. I am more than happy to create a PR.

About alphas

The alphas 0 and 1 are bit theoretical with Simple Exponential smoothing. I would focus that the smoothing works with values ] 0, 1 [ (exclusive range).

@rsalmei rsalmei removed this from the 3.2 milestone Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request No promises...
Projects
None yet
Development

No branches or pull requests

3 participants