Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentry stopped accepting transaction data #2876

Open
ingria opened this issue Mar 10, 2024 · 65 comments
Open

Sentry stopped accepting transaction data #2876

ingria opened this issue Mar 10, 2024 · 65 comments

Comments

@ingria
Copy link

ingria commented Mar 10, 2024

Self-Hosted Version

24.3.0.dev0

CPU Architecture

x86_x64

Docker Version

24.0.4

Docker Compose Version

24.0.4

Steps to Reproduce

Update to the latest master

Expected Result

Everything works fine

Actual Result

Performance page shows zeros for the time period since the update and until now:

image

Project page shows the correct info about transactions and errors:

image

Stats page shows 49k transactions of which 49k are dropped:

image

Same for errors:

image

Event ID

No response

UPD

there are a lot of errors in clickhouse container:

2024.03.10 23:40:34.789282 [ 46 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):

0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x13c4ee8e in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::peerAddress() @ 0x13c510d6 in /usr/bin/clickhouse
2. DB::ReadBufferFromPocoSocket::ReadBufferFromPocoSocket(Poco::Net::Socket&, unsigned long) @ 0x101540cd in /usr/bin/clickhouse
3. DB::HTTPServerRequest::HTTPServerRequest(std::__1::shared_ptr<DB::Context const>, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0x110e6fd5 in /usr/bin/clickhouse
4. DB::HTTPServerConnection::run() @ 0x110e5d6e in /usr/bin/clickhouse
5. Poco::Net::TCPServerConnection::start() @ 0x13c5614f in /usr/bin/clickhouse
6. Poco::Net::TCPServerDispatcher::run() @ 0x13c57bda in /usr/bin/clickhouse
7. Poco::PooledThread::run() @ 0x13d89e59 in /usr/bin/clickhouse
8. Poco::ThreadImpl::runnableEntry(void*) @ 0x13d860ea in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.8.13.1.altinitystable (altinity build))
@ingria
Copy link
Author

ingria commented Mar 10, 2024

Also, for some reason Sentry started dropping incoming errors some time ago (as if I was using saas sentry):

image

@barisyild
Copy link

Did you change the port?
I had the same situation when I changed the port.

@ingria
Copy link
Author

ingria commented Mar 10, 2024

Yes, I have the relay port exposed to the host network. How did you manage to fix the problem?

@barisyild
Copy link

Yes, I have the relay port exposed to the host network. How did you manage to fix the problem?

When I reverted the port change the problem was resolved.

@ingria
Copy link
Author

ingria commented Mar 10, 2024

Nope, didn't help. Doesn't work even with default config. Thanks for the tip though

@hubertdeng123
Copy link
Member

Are there any logs in your web container that can help? Are you sure you are receiving the event envelopes? You should be able to see that activity in your nginx container.

@linxiaowang
Copy link

Same here, on the browser side, there is a request sent with an event type of "transaction", but there is no data displayed under "performance", and the number of transactions in the project is also 0.

@getsantry getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 2 Mar 13, 2024
@linxiaowang
Copy link

Same here, on the browser side, there is a request sent with an event type of "transaction", but there is no data displayed under "performance", and the number of transactions in the project is also 0.

Problem solved, server time not match the sdk time.

@ingria
Copy link
Author

ingria commented Mar 14, 2024

I can see that there are successful requests to /api/2/envelope:

image

Also I can see transaction statistics on the projects page:

Number 394k for the last 24 hours is about right.

@hubertdeng123
Copy link
Member

Are you on a nightly version of self-hosted? What does your sentry.conf.py look like? We've added some feature flags there to support the new performance features

@ingria
Copy link
Author

ingria commented Mar 15, 2024

I'm using docker with the latest commit from this repository. Bottom of the page says Sentry 24.3.0.dev0 unknown. So I guess that's nightly.

I've updated sentry.conf.py to match the most recent version from this repo - now the only difference is in SENTRY_SINGLE_ORGANIZATION and CSRF_TRUSTED_ORIGINS variables.

After that, errors have also disappeared:

image

@getsantry getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 2 Mar 15, 2024
@williamdes
Copy link
Contributor

williamdes commented Mar 16, 2024

I can confirm that the clickhouse errors are due to the Rust workers, reverting the workers part of #2831 and #2861
made the errors disappear.
But still I have a too high dropping of transactions since the upgrade.

Worker code: https://github.com/getsentry/snuba/blob/359878fbe030a63945914ef05e705224680b453c/rust_snuba/src/strategies/clickhouse.rs#L61

Workers logs show that insert is done (is it ?): "timestamp":"2024-03-16T11:40:52.491448Z","level":"INFO","fields":{"message":"Inserted 29 rows"},

natefoo added a commit to natefoo/sentry-self-hosted that referenced this issue Sep 12, 2024
@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Sep 18, 2024
@DarkByteZero
Copy link

I had issues with stopped ingestion, but my issue was that I didn't have COMPOSE_PROFILES=feature-complete in my custom env

@hubertdeng123
Copy link
Member

I had issues with stopped ingestion, but my issue was that I didn't have COMPOSE_PROFILES=feature-complete in my custom env

Ah yeah, that'll do it. Without that you'll only be ingesting errors.

@liukch
Copy link

liukch commented Sep 23, 2024

This issue has been present for several months and remains unresolved.
Do we have a schedule to fix this issue?
Due to this issue, we are experiencing significant difficulties with the upgrade of our self-hosted Sentry version.
@hubertdeng123

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Sep 23, 2024
@aldy505
Copy link
Collaborator

aldy505 commented Sep 23, 2024

---except that reverting to python snuba does not work. No more errors but still does not work.

@hheexx I helped someone on Discord a few days ago, both regular snuba consumer and snuba rust-consumer didn't work for him. He tried upgrading their server instance to a higher spec (previously 4 cores CPU + 16 GB RAM [AWS EC2 m6a.xlarge] --> 8 cores CPU + 32 GB RAM [AWS EC2 m6a.2xlarge]). See the Discord thread here: https://discord.com/channels/621778831602221064/1286099840480182272

Obviously I know bumping their server spec is not for everyone, even my initial hunch was on the IO/s (or IOps) limit.

@hheexx
Copy link

hheexx commented Sep 30, 2024

thanks @aldy505, you are right. I fixed it by moving msl to seperate ssd storage (vm is on hdd)

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Sep 30, 2024
@klemen-df
Copy link

Same issues, nothing helps :(

@Mordreak
Copy link

Mordreak commented Oct 31, 2024

Same issue here, on fresh install and latest commit on master, I do get a 200 response with ID of the transaction, but nothing shows in the performance tab panel.
Tried several proposed solutions without any luck

@ethrgeist
Copy link

For my installation it looks like metrics are collected and displayed fine, still logs are flooded with:

clickhouse-1                                    | 2024.11.12 08:16:23.789391 [ 47 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):
clickhouse-1                                    | 
clickhouse-1                                    | 0. Poco::Net::SocketImpl::error(int, String const&) @ 0x0000000015b3dbf2 in /usr/bin/clickhouse
clickhouse-1                                    | 1. Poco::Net::SocketImpl::peerAddress() @ 0x0000000015b40376 in /usr/bin/clickhouse
clickhouse-1                                    | 2. DB::HTTPServerRequest::HTTPServerRequest(std::shared_ptr<DB::IHTTPContext>, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0x0000000013154417 in /usr/bin/clickhouse
clickhouse-1                                    | 3. DB::HTTPServerConnection::run() @ 0x0000000013152ba4 in /usr/bin/clickhouse
clickhouse-1                                    | 4. Poco::Net::TCPServerConnection::start() @ 0x0000000015b42834 in /usr/bin/clickhouse
clickhouse-1                                    | 5. Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b43a31 in /usr/bin/clickhouse
clickhouse-1                                    | 6. Poco::PooledThread::run() @ 0x0000000015c7a667 in /usr/bin/clickhouse
clickhouse-1                                    | 7. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000015c7893c in /usr/bin/clickhouse
clickhouse-1                                    | 8. ? @ 0x00007fa3e4e25609 in ?
clickhouse-1                                    | 9. ? @ 0x00007fa3e4d4a353 in ?
clickhouse-1                                    |  (version 23.8.11.29.altinitystable (altinity build))

It's a 2 week old install, HEAD detached at 24.9.0

@aldy505
Copy link
Collaborator

aldy505 commented Nov 12, 2024

@ethrgeist see getsentry/snuba#5707

@jamespanic
Copy link

jamespanic commented Dec 13, 2024

After updating to 24.11.2 from 24.7.1 I'm unable to login and I'm seeing these same errors in the logs. My configuration is practically a copy of the defaults.

Server specs are 8-core, 24GB RAM

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Dec 13, 2024
@jamespanic
Copy link

After updating to 24.11.2 from 24.7.1 I'm unable to login and I'm seeing these same errors in the logs. My configuration is practically a copy of the defaults.

Server specs are 8-core, 24GB RAM

I changed rust-consumer to consumer in my docker-compose.yml as was suggested and I'm no longer seeing the clickhouse errors in the logs but I still can't log in.

@yanghua-ola
Copy link

See similar issue in version 24.12.1. In our case, both stats page and project overview showed no transactions; but project detail view and performance page had them.

❌ stats page: 0 accepted transaction
Image
❌ project overview: 0 transactions
Image
✅ project detail view, number or transactions ~1.5k per hour
Image
✅ performance page, TPM around 40
Image

1K ~ 2K transactions per hour is the expected value based on our event volume and sampling configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Status: Waiting for: Product Owner
Status: No status
Development

No branches or pull requests