Skip to content

Shubhrajyoti-Dey-FrosTiK/logs-explorer

Repository files navigation


PowerLog Explorer

A complete full stack log explorer

Table of Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Features
  5. System Design
  6. Some QnA
  7. Contributing
  8. License
  9. Contact

About The Project

Capture-2023-11-19-004922

This project is created in order checkout logs in a more organised manner. We have all been in a position where there are too many logs and we are not able to filter them out. This project tries to solve this issue. It has both a frontend and a backend which is highly scalable and can handle millions of requests. More of the systen design in the later section.

Built With

This project is built with

Frontend

  1. ReactJS
  2. TypeScript
  3. Vite
  4. TailwindCSS

Backend

  1. Golang (GIN Framework)

Databases

  1. MongoDB
  2. Redis

Getting Started

Getting the project running is our first task and there are 2 ways to do so. One is manual installation and the other is dockerised installation (recommended).

Prerequisites

The following things are required to get started

  1. Docker ( link )
  2. NodeJS ( link )
  3. Yarn. ( If not installed then run npm i -g yarn )
  4. Golang ( link ) ( Not Required in Dockerised Installation )
  5. MongoDB ( link ) ( Not Required in Dockerised Installation )
  6. Redis Stack Server ( link ) ( Not Required in Dockerised Installation )

Setting Up Backend [ Dockerised ] [Recommended]

This will be actually very simple

  1. Clone the repo
git clone https://github.com/dyte-submissions/november-2023-hiring-Shubhrajyoti-Dey-FrosTiK.git
  1. cd november-2023-hiring-Shubhrajyoti-Dey-FrosTiK
  2. sudo docker-compose up

And thats it !!!

It will take time to setup as it will download every dependency and all but thats it. You dont need to worry about anything. Not even env.

Note

In this dockerised setup it doesnot consider that you have MongoDB or Redis running locally. If you have then also it will work as both are running on different ports.

Also note that initially the backend or the job service may not start but just restart it once every other service is running. (This happens as we are all dependent on MongoDB and Redis to start up before we start the servers)

These are the PORT mappings I have done for you

REDIS      8888
MONGODB    9999
Backend    3000
Job        No Port

Setting up Backend [Manual Setup]

First clone the repo

git clone https://github.com/dyte-submissions/november-2023-hiring-Shubhrajyoti-Dey-FrosTiK.git

Now lets understand what is going on.

These are the dependencies:

  1. MongoDB Server running
  2. Redis Stack Server running

And we need the port numbers for both so check now what ports are they running

The default ports are as follows:

MongoDB    27017
Redis      6379

Now there are 2 servers in the backend to run.

  1. Backend (/backend)
  2. Job (/job)

Both are imdependent jobs and none require the other to be on to startup.

Running the Backend:

  1. cd backend
  2. Now set 2 env variables with name ATLAS_URI and REDIS_HOST. I generally run these commands to set it up.
export ATLAS_URI=mongodb://localhost:27017
export REDIS_HOST=localhost:6379
  1. Now lets install the dependencies
go mod install
  1. Now lets run the server
go run .
  1. The backend server should start at PORT 3000 Screenshot 2023-11-19 at 1 34 33 AM

Running the Job

Go to the root of the project and then follow

  1. Lets get in the directory first
cd job
  1. Set the env. Same as of the backend. (It should be exactly the same)
export ATLAS_URI=mongodb://localhost:27017
export REDIS_HOST=localhost:6379
  1. Now lets install the dependencies
go mod install
  1. Start the runner
go run .
  1. Job should be running now
Screenshot 2023-11-19 at 1 35 29 AM

Setting up Frontend

First go to the root of the project then follow the steps

  1. Go into the directory
cd frontend
  1. Install the dependencies. Do note that you should use yarn
yarn install
  1. Set up the env variables. Make a .env inside the /frontend folder (our active directory) and paste exactly this. Note: The backend should be running by now.
VITE_BACKEND=localhost:3000
  1. Start the server
yarn dev
  1. The server should start at PORT 3333

Port Mappings

Backend    3000
Frontend   3333

Usage

The project is actually very simple to use.

To insert logs

curl -XPOST -H "Content-type: application/json" -d '{
  "level": "error",
  "message": "Failed to Redis",
  "resourceId": "server-1234",
  "timestamp": "2023-09-15T08:00:00Z",
  "traceId": "abc-xyz-123",
  "spanId": "span-456",
  "commit": "5e5342f",
  "metadata": {
    "parentResourceId": "server-0987"
  }
}' 'http://localhost:3000'
Screenshot 2023-11-19 at 1 49 30 AM

Other Endpoints

GET    /        Gets the latest [X] amount of logs
GET    /search  Returns the latest [X] logs with search filters

GET Latest 10

curl -XGET -H 'page-size: 10' -H 'page-number: 0' -H 'Cache-Control: no-cache' 'http://localhost:3000'
Screenshot 2023-11-19 at 1 59 08 AM

GET Search Listing

There are several options here and these are query params which are accepted here

{
  "level": "",
  "levelRegex": "",
  "message": "",
  "messageRegex": "",
  "resourceId": "",
  "resourceIdRegex": "",
  "timestamp": "",
  "timestampRegex": "",
  "traceId": "",
  "traceIdRegex": "",
  "spanId": "",
  "spanIdRegex": "",
  "commit": "",
  "commitRegex": "",
  "parentResourceId": "",
  "parentResourceIdRegex": "",
  "timeStart": "",
  "timeEnd": "",
  "fullTextSearch": "",
  "pageNumber": 0,
  "pageSize": 60
}

Note

Here all the above params are optional and the type of the params are depicted and should not be changed. Also not that any time parameter needs to be in ISOString format eg 2029-11-12T11:45:26.371Z

Also note that the filters follow a & relationship i.e if you apply 2 filters only the data which satisfies bothe the filters will be returned

This is a sample curl request

curl -XGET 'http://localhost:3000?fullTextSearch="db"&level="error"&timeStart="2020-11-21T18:30:00.000Z"'
Screenshot 2023-11-19 at 2 05 38 AM

Frontend

Open http://localhost:3333 in your browser.

Capture-2023-11-19-004922

There are 2 tabs.

  1. Real Time Logs: This is a list of real time logs. Any filters applied on it will also be real time.
  2. Full Text Search: This is not a real time log. You can apply full text search over here but the data will not be updated real time.

Filtering can be done by 2 ways:

  1. Server side filtering: This will filter all the logs available and get you the output. Just press on the filter button on the right side of the webpage and the modal will open to set your filters. Here in this modal you can also choose to search via regex by clicking on the toggle.

Screenshot 2023-11-19 at 2 05 38 AM

  1. Client side filtering: This will filter only the data which has been already fetched from the server. This is very fast and can be used to filter data quicky from the available logs.

Click on the 3 dots in the side of the column

Screenshot 2023-11-19 at 2 14 00 AM

Now click on filter

Screenshot 2023-11-19 at 2 14 00 AM

Now the column filters will appear and you can apply filters.

Screenshot 2023-11-19 at 2 14 00 AM

Featires

  • Log Ingestor
    • Mechanism to ingest logs in the provided format.
    • Ensure scalability to handle high volumes of logs efficiently
    • Mitigate potential bottlenecks such as I/O operations, database write speeds, etc.
    • Logs are ingested via an HTTP server, which runs on port 3000 by default.
  • Query Interface (WEB UI)
    • Include filters based on
      • level
      • message
      • resourceId
      • timestamp
      • traceId
      • spanId
      • commit
      • metadata.parentResourceId
    • Efficient and quick search results.
  • Extra Features
    • Implement search within specific date ranges.
    • Log count filter to reduce DB load and increase filter flexibility
    • Utilize regular expressions for search.
    • Allow combining multiple filters.
    • Provide real-time log ingestion and searching capabilities.
    • role-based access to the query interface. [ NOT IMPLEMENTED ]
    • Both Client + Server side filtering
    • Advanced options for client side sorting + column manipulation
    • Advanced caching used for more optimal performance

System Design

This section will talk about the decisions taken and why have these decisions taken.

What is NOT done

By looking at the problem statement for the first time a very simple architecture comes in mind which is just a client server REST architecture like this

Untitled-2023-11-19-0242

These are the flaws of the architecture if we look on a higher level:

  1. If there are millions of request for log entry our DB will be a bottleneck and we would have massive costing.
  2. As our DB is clogged we would also have worse response times with data entry / fetch.
  3. Now suppose our frontend takes the data from the backend. But if there is any entry our data becomes stale and we again need to fetch it. One thing which could be done here is making a poller but thats not optimal.
  4. Now suppose you have millions of clients who want the logs with different queries. So you will do million queries which will be again very costly.

What is done

We have changed the architecture significantly to solve the above mentioned problems. The current architecture looks like this

Untitled-2023-11-19-0242

Lets dive a bit deep in this

  1. Backend takes all the logs submitted and appends it to a RedisQueue. As Redis is a in-memory store the operations are super fast.
  2. Now there is different service called job whose job is to empty the RedisQueue by pushing the data to MongoDB in batch. This minizes the DB traffic significantly as millions of round trips and for loops are saved here. This will also decrease the costing of the solution as it significantly decreases the DB calls.
  3. So now the data ingestion is handled at scale but we also need to show the data real time. So WebSocket is chosen here as a medium of communication between frontend and backend to minimize expensive polling.
  4. The job also pushes a messege in RedisPubSub for backend server to consume.
  5. One thing to keep in mind that we also need to support real time filtering of data. So we also need to keep track of which client has which filters applied. So backend server mantains this mapping of filter accross connections.
  6. MongoDB has been indexed for better performance.
  7. Whenever the backend server starts it spawns a goRoutine whose task is to check the RedisPubSub and send updated filtered query output to each connected client. It takes the updated connection list by the pointer reference to a global variable in backend.
  8. Whenver the backend receives a websocket connection another goRoutine is spawn whole only task is to interact with the client and update the filters stored according to what the client wants. This allows the backend to send real-time filtered data.
  9. The backend also uses a custom wrapper on MongoDB named Mongik writen by me iteself ( link ) which reduces DB calls by caching data and managing cache invalidation and aggregate pipelines all by itself. Thus the performance is improved and the DB calls are also reduced significantly. Mongik works like this.

Mongik

Mongik also invalidates cache efficiently to ensure that stale data is not served (Not shown in the diagram)

Some QnA

Q. In this architecture also there will be millions of call in the DB when updating the client ?

The answer is no. When there is an update in the DB and there are millions of client subscribed, the goRoutine 1 (described earlier) combines all the filters of the connected clients and make a single DB call and sends the result to the client which improves performance and reduces DB calls.

Q. What if the database is down ?

No problem. As job is using RedisQueue to push to DB it will retry when DB is up.

Q. What if backend is down ?

If the backend is down then also job will continue working to push the remaining logs to the DB as both are independent services.

This type of an architecture is also very scalable and fault tollerant as each component of the architecture can be scaled according to the needs which also makes the architecture cost effective and flexible.

In this way all the pain points are addressed which makes the architecture so scalable.

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Shubhrajyoti Dey - [email protected]

Project Link: https://github.com/dyte-submissions/november-2023-hiring-Shubhrajyoti-Dey-FrosTiK.git

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published