- About The Project
- Getting Started
- Usage
- Features
- System Design
- Some QnA
- Contributing
- License
- Contact
This project is created in order checkout logs in a more organised manner. We have all been in a position where there are too many logs and we are not able to filter them out. This project tries to solve this issue. It has both a frontend and a backend which is highly scalable and can handle millions of requests. More of the systen design in the later section.
This project is built with
Frontend
- ReactJS
- TypeScript
- Vite
- TailwindCSS
Backend
- Golang (GIN Framework)
Databases
- MongoDB
- Redis
Getting the project running is our first task and there are 2 ways to do so. One is manual installation and the other is dockerised installation (recommended).
The following things are required to get started
- Docker ( link )
- NodeJS ( link )
- Yarn. ( If not installed then run
npm i -g yarn
) - Golang ( link ) ( Not Required in Dockerised Installation )
- MongoDB ( link ) ( Not Required in Dockerised Installation )
- Redis Stack Server ( link ) ( Not Required in Dockerised Installation )
This will be actually very simple
- Clone the repo
git clone https://github.com/dyte-submissions/november-2023-hiring-Shubhrajyoti-Dey-FrosTiK.git
cd november-2023-hiring-Shubhrajyoti-Dey-FrosTiK
sudo docker-compose up
And thats it !!!
It will take time to setup as it will download every dependency and all but thats it. You dont need to worry about anything. Not even env
.
Note
In this dockerised setup it doesnot consider that you have MongoDB
or Redis
running locally. If you have then also it will work as both are running on different ports.
Also note that initially the backend
or the job
service may not start but just restart it once every other service is running. (This happens as we are all dependent on MongoDB and Redis to start up before we start the servers)
These are the PORT
mappings I have done for you
REDIS 8888
MONGODB 9999
Backend 3000
Job No Port
First clone the repo
git clone https://github.com/dyte-submissions/november-2023-hiring-Shubhrajyoti-Dey-FrosTiK.git
Now lets understand what is going on.
These are the dependencies:
- MongoDB Server running
- Redis Stack Server running
And we need the port numbers for both so check now what ports are they running
The default ports are as follows:
MongoDB 27017
Redis 6379
Now there are 2 servers in the backend to run.
Backend
(/backend
)Job
(/job
)
Both are imdependent jobs and none require the other to be on to startup.
Running the Backend:
cd backend
- Now set 2
env
variables with nameATLAS_URI
andREDIS_HOST
. I generally run these commands to set it up.
export ATLAS_URI=mongodb://localhost:27017
export REDIS_HOST=localhost:6379
- Now lets install the dependencies
go mod install
- Now lets run the server
go run .
Running the Job
Go to the root of the project and then follow
- Lets get in the directory first
cd job
- Set the env. Same as of the
backend
. (It should be exactly the same)
export ATLAS_URI=mongodb://localhost:27017
export REDIS_HOST=localhost:6379
- Now lets install the dependencies
go mod install
- Start the runner
go run .
- Job should be running now
First go to the root of the project then follow the steps
- Go into the directory
cd frontend
- Install the dependencies. Do note that you should use
yarn
yarn install
- Set up the env variables. Make a
.env
inside the/frontend
folder (our active directory) and paste exactly this. Note: The backend should be running by now.
VITE_BACKEND=localhost:3000
- Start the server
yarn dev
- The server should start at PORT
3333
Backend 3000
Frontend 3333
The project is actually very simple to use.
To insert logs
curl -XPOST -H "Content-type: application/json" -d '{
"level": "error",
"message": "Failed to Redis",
"resourceId": "server-1234",
"timestamp": "2023-09-15T08:00:00Z",
"traceId": "abc-xyz-123",
"spanId": "span-456",
"commit": "5e5342f",
"metadata": {
"parentResourceId": "server-0987"
}
}' 'http://localhost:3000'
Other Endpoints
GET / Gets the latest [X] amount of logs
GET /search Returns the latest [X] logs with search filters
GET Latest 10
curl -XGET -H 'page-size: 10' -H 'page-number: 0' -H 'Cache-Control: no-cache' 'http://localhost:3000'
GET Search Listing
There are several options here and these are query params
which are accepted here
{
"level": "",
"levelRegex": "",
"message": "",
"messageRegex": "",
"resourceId": "",
"resourceIdRegex": "",
"timestamp": "",
"timestampRegex": "",
"traceId": "",
"traceIdRegex": "",
"spanId": "",
"spanIdRegex": "",
"commit": "",
"commitRegex": "",
"parentResourceId": "",
"parentResourceIdRegex": "",
"timeStart": "",
"timeEnd": "",
"fullTextSearch": "",
"pageNumber": 0,
"pageSize": 60
}
Note
Here all the above params are optional
and the type of the params are depicted and should not be changed. Also not that any time parameter needs to be in ISOString
format eg 2029-11-12T11:45:26.371Z
Also note that the filters follow a &
relationship i.e if you apply 2 filters only the data which satisfies bothe the filters
will be returned
This is a sample curl request
curl -XGET 'http://localhost:3000?fullTextSearch="db"&level="error"&timeStart="2020-11-21T18:30:00.000Z"'
Frontend
Open http://localhost:3333
in your browser.
There are 2 tabs.
- Real Time Logs: This is a list of real time logs. Any filters applied on it will also be real time.
- Full Text Search: This is not a real time log. You can apply full text search over here but the data will not be updated real time.
Filtering can be done by 2 ways:
- Server side filtering: This will filter all the logs available and get you the output. Just press on the filter button on the right side of the webpage and the modal will open to set your filters. Here in this modal you can also choose to search via
regex
by clicking on the toggle.
- Client side filtering: This will filter only the data which has been already fetched from the server. This is very fast and can be used to filter data quicky from the available logs.
Click on the 3 dots in the side of the column
Now click on filter
Now the column filters will appear and you can apply filters.
- Log Ingestor
- Mechanism to ingest logs in the provided format.
- Ensure scalability to handle high volumes of logs efficiently
- Mitigate potential bottlenecks such as I/O operations, database write speeds, etc.
- Logs are ingested via an HTTP server, which runs on port
3000
by default.
- Query Interface (WEB UI)
- Include filters based on
- level
- message
- resourceId
- timestamp
- traceId
- spanId
- commit
- metadata.parentResourceId
- Efficient and quick search results.
- Include filters based on
- Extra Features
- Implement search within specific date ranges.
- Log count filter to reduce DB load and increase filter flexibility
- Utilize regular expressions for search.
- Allow combining multiple filters.
- Provide real-time log ingestion and searching capabilities.
- role-based access to the query interface. [ NOT IMPLEMENTED ]
- Both Client + Server side filtering
- Advanced options for client side sorting + column manipulation
- Advanced caching used for more optimal performance
This section will talk about the decisions taken and why have these decisions taken.
By looking at the problem statement for the first time a very simple architecture comes in mind which is just a client server REST architecture like this
These are the flaws of the architecture if we look on a higher level:
- If there are millions of request for log entry our DB will be a bottleneck and we would have massive costing.
- As our DB is clogged we would also have worse response times with data entry / fetch.
- Now suppose our frontend takes the data from the backend. But if there is any entry our data becomes stale and we again need to fetch it. One thing which could be done here is making a poller but thats not optimal.
- Now suppose you have millions of clients who want the logs with different queries. So you will do million queries which will be again very costly.
We have changed the architecture significantly to solve the above mentioned problems. The current architecture looks like this
Lets dive a bit deep in this
- Backend takes all the logs submitted and appends it to a
RedisQueue
. AsRedis
is ain-memory
store the operations are super fast. - Now there is different service called
job
whose job is to empty theRedisQueue
by pushing the data toMongoDB
in batch. This minizes theDB
traffic significantly as millions of round trips and for loops are saved here. This will also decrease the costing of the solution as it significantly decreases theDB
calls. - So now the data ingestion is handled at scale but we also need to show the data real time. So
WebSocket
is chosen here as a medium of communication between frontend and backend to minimizeexpensive polling
. - The
job
also pushes a messege inRedisPubSub
forbackend
server to consume. - One thing to keep in mind that we also need to support real time filtering of data. So we also need to keep track of which client has which filters applied. So
backend
server mantains thismapping
offilter
accross connections. MongoDB
has been indexed for better performance.- Whenever the
backend
server starts it spawns agoRoutine
whose task is to check theRedisPubSub
and send updated filtered query output to each connected client. It takes the updated connection list by thepointer reference
to a global variable inbackend
. - Whenver the
backend
receives awebsocket
connection anothergoRoutine
is spawn whole only task is to interact with the client and update thefilters
stored according to what the client wants. This allows thebackend
to sendreal-time
filtered data. - The
backend
also uses acustom wrapper on MongoDB
namedMongik
writen by me iteself ( link ) which reduces DB calls by caching data and managing cache invalidation and aggregate pipelines all by itself. Thus the performance is improved and theDB
calls are also reduced significantly.Mongik
works like this.
Mongik also invalidates cache efficiently to ensure that stale data is not served (Not shown in the diagram)
Q. In this architecture also there will be millions of call in the DB when updating the client ?
The answer is no. When there is an update in the DB and there are millions of client subscribed
, the goRoutine 1
(described earlier) combines all the filters
of the connected clients and make a single DB call and sends the result to the client which improves performance and reduces DB calls.
Q. What if the database is down ?
No problem. As job
is using RedisQueue
to push to DB
it will retry when DB
is up.
Q. What if backend is down ?
If the backend is down then also job
will continue working to push the remaining logs
to the DB
as both are independent
services.
This type of an architecture is also very scalable and fault tollerant as each component of the architecture can be scaled according to the needs which also makes the architecture cost effective and flexible.
In this way all the pain points are addressed which makes the architecture so scalable.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License
. See LICENSE.txt
for more information.
Shubhrajyoti Dey - [email protected]
Project Link: https://github.com/dyte-submissions/november-2023-hiring-Shubhrajyoti-Dey-FrosTiK.git