Skip to content

Sanitize markup by adding, changing or removing tags.

License

Notifications You must be signed in to change notification settings

socketry/xrb-sanitize

Repository files navigation

XRB::Sanitize

Sanitize markup by adding, changing or removing tags, using the xrb stream processor (which has a naive C implementation).

Development Status

Motivation

I use the sanitize gem and generally it's great. However, it's performance can be an issue and additionally, it doesn't preserve tag namespaces when parsing fragments due to how Nokogiri works internally. This is a problem when processing content destined for utopia since it heavily depends on tag namespaces.

Is it fast?

In my informal testing, this gem is about ~50x faster than the sanitize gem when generating plain text.

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
Warming up --------------------------------------
            Sanitize   438.000 i/100ms
       XRB::Sanitize     7.935k i/100ms
Calculating -------------------------------------
            Sanitize      4.365k (± 0.1%) i/s -     21.900k in   5.017157s
       XRB::Sanitize     78.670k (± 0.1%) i/s -    396.750k in   5.043233s

Comparison:
       XRB::Sanitize:    78669.9 i/s
            Sanitize:     4365.0 i/s - 18.02x  slower

Usage

Please see the project documentation for more details.

  • Getting Started - This guide explains how to get started with the XRB::Sanitize gem.

Contributing

We welcome contributions to this project.

  1. Fork it.
  2. Create your feature branch (git checkout -b my-new-feature).
  3. Commit your changes (git commit -am 'Add some feature').
  4. Push to the branch (git push origin my-new-feature).
  5. Create new Pull Request.

Developer Certificate of Origin

This project uses the Developer Certificate of Origin. All contributors to this project must agree to this document to have their contributions accepted.

Contributor Covenant

This project is governed by the Contributor Covenant. All contributors and participants agree to abide by its terms.