-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced conversion techniques: Experiments and Discussion #150
Comments
Great find! I love it. I've been thinking about flexible grid alignment and tested a naïve (but fast) approach -- basically shift the grid of each cell a few ways around the center and look for matches that are better by some factor. Unfortunately that didn't improve the output very much and made it worse in many cases (especially with connective characters). The paper has a much better approach, but it's also much more complex. But from a quick skim, I think there's nothing stopping us from implementing it or something similar. A couple of things we'd need to resolve, though:
We could iterate on it in order to get some modest improvements more quickly. A tentative plan could look like this:
I'm going to dump a few loosely related ideas below. This issue seems like a good place, hope you don't mind :-) |
OK i found two more papers! ASCII ART SERIOUS BUSINESS 2019: https://ieeexplore.ieee.org/document/7491376 2022: https://gwern.net/doc/design/typography/2022-chung.pdf A git repo with some kind of implementation of the 2010 paper out of time for more investigation. CHEERS! |
Blocky look when using background colorsAnother thing I've been thinking about is how to improve on the blocky look caused by variable background colors. I remember you pointing this out in another bug, but I don't think it would work to use bright colors for foreground details and dark colors as background. Imagine a blue sky gradient with a black bird and a white bird -- you'd want the blue gradient to be drawn in background color, and the details for both birds to be drawn as foreground symbols (e.g. ASCII letters). The solution to this might be to use frequency analysis and draw low-frequency features (e.g. gradients) using background and high-frequency features (details, outlines, etc) as foreground. We could also use an iterative approach where cells' background colors are averaged with neighboring cells, or come up with some other fancy scheme. |
Deep learningAkiyama 2017: https://github.com/OsciiArt/DeepAA That project is for Shift-JIS with a specific variable-width font, but we could use a similar approach for other forms of character art. It might be even simpler to do for fixed-width fonts. Neural nets are all the rage, and I think it'd be possible to write a no-dependencies CNN that could run on the CPU with a pre-trained model. We'd need a model for each specific kind of character art, though. I thought about using it for PETSCII; you could generate training data by taking PETSCII images with 8x8-pixel cells and downsampling them by a factor of 4 and augmenting (small shifts pre-sample, using different downsamplers, adding noise, modulating colors) to make image pairs with an image that resembles a "natural" image and its PETSCII equivalent for training. PETSCII images are typically 40x25 cells, where each cell is 8x8 pixels. That's 320x200 pixels. After downsampling by a factor of four, that's 80x50 pixels, which should make for a fairly small model. |
That instance at least would have a large number of pixels with color vectors clustered around some mean (sky) in the regions around the currently processed character, with some outliers. The key for those braille characters would be to not be randomly flipping fg/bg assignments.
Ah yeah. Yes. Experiments needed. Funding needed for experiments. |
We need to write a grant proposal, should be a slam dunk with its implications for global GDP. |
And another paper from Japan! http://nishitalab.org/user/nis/cdrom/iccg/miyake_nico.pdf |
Leicht looks like a good starting point for minimalist NN experimentation. |
Well, my Leicht project is largely educational. Its performance is almost not optimized at all, although it should run on all architectures. Intel has a highly optimized oneMKL library for amd64. A couple of high performance C libraries exist as well. |
That's good to know. I think an NN approach made to fit within the scope of Chafa would have to be pretty naïve, ideally a plain C implementation with no external dependencies, with each layer of weights being just an array of packed floats, and one or two hardcoded activation functions (ReLU + softmax?). We could borrow inspiration from Akiyama et al, but make it more lightweight computationally since we're using a fixed grid, lower symbol resolution and much fewer symbols (in the ASCII and PETSCII cases, <= 128). It could have AVX and multithread optimizations, since it's not too complex to do and we're already using those approaches elsewhere in the library. It might not be feasible to make something so simple and retain a measure of usefulness, so I think it's nice to have a small implementation to play around with to see what works (and if it can't be made to work, failing fast) and not having to get into all the care and feeding of a bigger framework :-) I can easily be wrong/dumb in my assumptions, though, since it's not really my field. I'm very glad to have your input @cdluminate. |
OK. My suggestion is to separate the training and inference code for Chafa. Training scripts are suggested to be implemented in PyTorch. So that you can try to train different NNs in a very fast pace. Implementing this in C/C++ does not worth the time cost at all. We can export the trained NN into some common formats like ONNX, or just some self-defined json/binary dump format, as long as it is easy to load in ansi C. As for inference code, you may take a look at https://github.com/BVLC/caffe, which is an obsolete but high-quality C++ code base widely used in many commercial products. Once we have an concrete idea on how the NN would look like, we can start to borrow code from Caffe for the implementation. My Leicht is not as mature as Caffe but it could serve as a reference if you would like. In terms of external dependency. I'd suggest at least incorporating BLAS for accelerating the basic linear algebra routines. It deals with many of the computational bottlenecks in NN computation. |
It makes no sense to optimize BLAS on our own. It's a very complicated work. See BLAS implementations in the following links: In some distributions like Debian and Gentoo, the BLAS backend is switchable at run-time. So compiling against the generic BLAS and running on an optimized BLAS is not an issue. |
One of the most significant performance bottleneck is Matrix add and matrix multiplication are exactly the functionality of BLAS. Existing libraries like OpenBLAS and BLIS are already well-optimized. Don't ever try to re-invent the wheel because it is a very complicated thing of you want to reach high performance. FYI, |
Great pointers! Yes, we'd definitely want to keep all the training stuff separate, and only implement inference in Chafa proper. My speculation around wheel-inventing was based on the need to do forward passes only. Another obstacle is model distribution; I think anything over 10MB ought to be kept out of direct distribution. For comparison, the DeepAA model is 666MB. Ours would likely have to be download-on-demand from private hosting, fetched via libsoup or something like that. We could alleviate it somewhat by making a simpler model and maybe quantizing the weights. There may be non-NN data-driven approaches with more modest requirements. For instance, it would be possible to build a frequency table of symbols' spatial relationships in art and use that to "repair" high-loss cells by predicting their contents from their local neighborhood after the currently implemented MSE-minimizing pass. With a kernel size of 5x5 cells and 128 possible symbols we'd need |
One more thing to mention is that the licensing of deep neural networks is still an unclear area, unless the training dataset is fully open source. E.g., all training data are licensed under CC-BY-SA 4.0 or CC-0 or something alike. Debian has an unofficial document on the policy for distributing AI stuff: https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/master/ML-Policy.rst (I drafted it). This policy is strict, in exchange of no loss of software freedom (at a cost of usefulness, as usual). OSI is already working on this direction: https://deepdive.opensource.org/ |
My position is that we definitely want to err on the side of legal and ethical use, and speculation here on possible ML approaches is subject to that. Additionally we'd want to adhere to Debian's official policy wherever that's stricter. I noticed you made an exception for simpler statistical models; you're more up to date on the legal details than I am, so I'll defer to your judgement as to what is permissible. The idea is to make good terminal-printable facsimiles of the user's own images, not wholesale generation (e.g. with text embedding) like Adobe, SD, OpenAI, Midjourney, Microsoft, etc. are doing. Licensing is also a conundrum for ANSI art archives, since almost none of the art has historically been released with a license. A side effect of that is that this art form is a lot harder to find now than it was just 10 years ago. |
Well. This ML-Policy draft is a result of a very lengthy discussion in debian-devel mailing list. So it has gone through some kind of review and reached somewhat a consensus. Simpler statistical models already widely exist in the open-source world. Posing restrictions on them could be overkill. For instance, input methods are already using that for decades. Simpler models can be interpreted by humans very well. But the deep neural networks cannot be well understood yet. So, what the model does won't matter much here. My point here is just to provide information so we can avoid complicating the distribution (in terms of licensing). I actually care more about latency instead of wether deep learning can improve the display quality a little bit. Here are some of my thoughts about the previously mentioned issues:
|
Yeah. Anyway, CNNs aren't exactly high on the agenda :-) The top contender for "really fancy processing" is still the first paper @clort81 posted. Its technique is old-school (edge thinning, path tracing, iterative deformation), and it seems to produce great results, though it'd still be a lot of work to implement.
Absolutely. I went to some lengths to make it fast, and I'd like to keep it that way too. The The machinery that makes it work well on low-end terminals (e.g. Linux console or fbterm running on a tiny screen glued to some DIY project) provides an opening for more "artsy" applications, for lack of a better word. It's fun to explore those, and I want to do more of it, but it shouldn't come at the cost of more pragmatic use cases.
Yes, shifting the entire image and selecting the offset that results in the lowest error is a good idea. I think we could also revisit my attempt at shifting cells on an individual basis, but modify it to only apply when a non-connective character was chosen. That may make small details look better without tearing up connective cells. But as you point out, it probably wouldn't save you those keypresses :-)
The image is already scaled with anti-aliasing before applying the rest of the algorithm, but it may be possible to do something, e.g. with a median filter to smooth things out while preserving edges. I'm not sure I understand 100% what you're suggesting, though. I think adding some kind of spatial hysteresis (so BG/FG colors don't flip as demonstrated in #127) could also work. |
Another lead one could follow would be to define and apply different kinds of parametric shape grammar. Just jotting it down here before I forget. |
How about a mesh transform with a fixed image size? Place control points where cell corners meet and have a solver move them around. In each step it'd resample the image according to the distorted grid, then calculate cell MSE as usual with a distortion penalty. I think it'd be somewhat analogous to the first paper posted here, except simpler (e.g. no need to do path tracing), and it'd work with colors and areas, not just line art. It would blur the image, though, so it could be necessary to do contrast enhancement/line thickening pre-transform to preserve details. But that's not hard to do either. |
[
{
"title": "Structure-based ASCII art",
"doi": "10.1145/1833349.1778789",
"year": 2010
},
{
"title": "ASCII Art Generation Using the Local Exhaustive Search on the GPU",
"doi": "10.1109/CANDAR.2013.35",
"year": 2013
},
{
"title": "A character art generator using the local exhaustive search, with GPU acceleration",
"doi": "10.1080/17445760.2014.962026",
"year": 2014
},
{
"title": "Texture-aware ASCII art synthesis with proportional fonts",
"doi": "10.2312/EXP.20151191",
"year": 2015
},
{
"title": "Fast Rendering of Image Mosaics and ASCII Art",
"doi": "10.1111/cgf.12597",
"year": 2015
},
{
"title": "Automatic ASCII Art conversion of binary images using non-negative constraints",
"doi": "10.1049/CP:20080660",
"year": 2015
},
{
"title": "COMPARISON OF TWO ASCII ART EXTRACTION METHODS: A RUN-LENGTH ENCODING BASED METHOD AND A BYTE PATTERN BASED METHOD",
"doi": "10.2316/P.2015.827-026",
"year": 2015
},
{
"title": "ASCII Art Synthesis from Natural Photographs",
"doi": "10.1109/TVCG.2016.2569084",
"year": 2016
},
{
"title": "ASCII Art Classification based on Deep Neural Networks Using Image Feature of Characters",
"doi": "10.17706/jsw.13.10.559-572",
"year": 2018
},
{
"title": "Generating ASCII-Art: A Nifty Assignment from a Computer Graphics Programming Course",
"doi": "10.2312/EGED.20171021",
"year": 2017
},
{
"title": "ASCII Art Classification Model by Transfer Learning and Data Augmentation",
"doi": "10.3233/faia200738",
"year": 2020
},
{
"title": "Fast Text Placement Scheme for ASCII Art Synthesis",
"doi": "10.1109/ACCESS.2022.3167567",
"year": 2022
},
{
"title": "An Autoencoder Based ASCII Art Generator",
"doi": "10.1145/3591569.3591587",
"year": 2023
}
] |
Cool, some of those look interesting. The GPU/autoencoder/NN ones are probably out of scope for Chafa (too heavy/training data reliant), but worth a look to see what's possible. Relatedly, I came across https://github.com/theAdamColton/ascii-autoencoder and https://github.com/theAdamColton/ascii-unmasked when looking for open access sources for one of the papers (I didn't find one, unfortunately). |
I have implemented the image processing steps for a of these papers then sent them to chafa and had things work out great a few years ago when first getting into this topic: I am really sleepy now but the one I can’t recall at the moment that does a vectorization / SVG step after pre processing was really nice for little code. Still gotta try it out with big zoomed out mode terminal and chafa as is piped into a file nice then just tiny font them into a html file in a block after for responsive ascii art CSS |
I just pushed the structural-art branch. It's work in progress and includes a "facet" shape matcher and mesh solver. Only the facets are working correctly at the moment. It can be enabled like this:
The output isn't "better" in a conventional sense, but it'll look more character-artsy. The mesh solver can be enabled like this:
But the solver is trash at the moment. You can also enable both at once if you've got all day to wait for the output. |
here is an implementation of the fast text placement paper
|
More so focused on the edge thinning than the ascii part |
Sweet! How's the output? |
One thing human ascii artists do, is shift input image regions to adapt to the text grid alignment and available character set. Finally i found smart people who found a way to emulate this.
http://www.cse.cuhk.edu.hk/~ttwong/papers/asciiart/asciiart.pdf
I am mentally underpowerded to implement the described algorithm properly, and leave this here as a reference for future improvements to chafa.
The text was updated successfully, but these errors were encountered: