Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update click-elements.ts - fix uncaught exception issue #2794

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Gilc83
Copy link

@Gilc83 Gilc83 commented Jan 6, 2025

If the history is empty because of a navigation (for example: clicking a link - < a href=...>), then this prevents an empty state to cause an exception on line 549 in the same function.

If the history is empty because of a navigation (for example: clicking a link - <a href=...>), then this prevents an empty state to cause an exception on line 549 in the same function.
@Gilc83 Gilc83 changed the title Update click-elements.ts - fix uncaught exception issue fix: Update click-elements.ts - fix uncaught exception issue Jan 6, 2025
@janbuchar janbuchar self-requested a review January 8, 2025 08:55
@janbuchar
Copy link
Contributor

Hi @Gilc83 and thank you for your contribution! Could you please run yarn format to make sure that the code passes the lint check?

@B4nan
Copy link
Member

B4nan commented Jan 8, 2025

few comments from my end:

@Gilc83
Copy link
Author

Gilc83 commented Jan 8, 2025

Happy to help @janbuchar :)
I uses npm so if you could take care of the linting that would be great.
It is also important to note that this issue is a sign of a bigger problem @B4nan:
When the enqueueLinksByClickingElements if called, your code is calling the internal function clickElements which go through all elements and tries to click it in the loop - for (const handle of elementHandles)...

If a navigation occurs due to one of the clicks - a link, form submission etc., the loop will fail for all remaining element handlers, saying that the context has been changed.
From what I see there are 2 possible options:

  1. Identify a navigation event and stop the loop and the whole process of enqueueLinksByClickingElements in general
  2. Disable the navigation and continue clicking on other elements. ChatGPT consulted doing something like this:
    await page.evaluate(() => {
    document.querySelectorAll('a[href]').forEach((el) => {
    el.addEventListener('click', (event) => {
    event.preventDefault(); // Stop the normal navigation
    // Possibly open in a new tab, or handle the link differently
    // window.open(el.href, '_blank');
    });
    });
    });

If you ask me for my opionion, I'm using the enqueuLinks function and then immediately the enqueueLinksByClickingElements, in order to get the maximum amounts of links in the website, so I believe the second option is better. However, the temporary fix I was adding, that prevents the whole function from crashing and the current Crawlee request to fail, is essentially the first option - the clicking is ceased. So I think my fix can be applied for the short term, but navigation disabling of any kind should be considered in the future.

PS
If I got your attention I'll be happy if you could also have a look at my feature request when you have the time.
#2786
I'm writing a web spider based on your awsome library and it would really help me (I did an ugly workaround for now).
Thank you very much and love your work :) 🥇 💯

@janbuchar
Copy link
Contributor

Happy to help @janbuchar :) I uses npm so if you could take care of the linting that would be great.

Don't worry, npm run format will work just as fine. Also, is there a chance you could write a test that fails before the fix and succeeds after it? Plus we need a better description of the issue that you fixed for the title (and changelog) 🙂

It is also important to note that this issue is a sign of a bigger problem @B4nan: When the enqueueLinksByClickingElements if called, your code is calling the internal function clickElements which go through all elements and tries to click it in the loop - for (const handle of elementHandles)...

If you ask me for my opionion, I'm using the enqueuLinks function and then immediately the enqueueLinksByClickingElements, in order to get the maximum amounts of links in the website, so I believe the second option is better. However, the temporary fix I was adding, that prevents the whole function from crashing and the current Crawlee request to fail, is essentially the first option - the clicking is ceased. So I think my fix can be applied for the short term, but navigation disabling of any kind should be considered in the future.

Right, would you mind opening an issue that describes the problem and the possible solution so that we can come back to it later?

PS If I got your attention I'll be happy if you could also have a look at my feature request when you have the time. #2786 I'm writing a web spider based on your awsome library and it would really help me (I did an ugly workaround for now). Thank you very much and love your work :) 🥇 💯

Don't worry, we go through all new issues and we will eventually resolve this one, too 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants