Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better font-finding heuristics, with a shortcut and a few caches. #84

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

tecosaur
Copy link

@tecosaur tecosaur commented Jan 31, 2024

I initially started looking at this package when I found that it doesn't always look in the right directories for fonts (see #82). However, I recently found that the heuristic for font finding itself is both slow (#67) and a bit dodgy (#83).

So, I've gone through the font-finding code and made a few changes:

  • The scoring heuristic takes a few more factors into account
    • Earlier components of a search string are now weighted slightly higher (e.g. the plex in ibm plex sans italic is worth more than sans)
    • Particular regular styles will be picked over others (e.g. regular over medium)
    • Certain font formats are prioritised (e.g. otf over pfb)
  • Introduced a considered shortcut when scoring each font file
    • The list of matching font-file is pre-sorted according to matches of the search string in the font file name
    • We calculate the maximum possible family and style score given the search string
    • We return the current best font early when:
      • We have found a font that maximum score
      • We have seen more than twice as many fonts as the last font with the maximum score seen
  • Introduced a few (runtime) caches
    • A cache of the font file names, without the extension, as lowercase
    • A cache of the family, style, and extension of font files
    • A cache of the resolved font for given searches (invalidated by directory modification times)

From the test that I've performed locally, this batch of changes results in faster, better initial lookups, and faster (again) subsequent lookups.


Here are some test results on my machine:

Search string Current time PR time Current result (family, style) PR result
ibm plex sans bold italic 1.5s 0.06s AlegreyaSans-BoldItalic, Bold IBM Plex Sans, Bold Italic
ibm sans 1.5s 0.04s IBM Plex Sans, Regular (same)
sans 1.5s 0.02s KpSans, Regular DejaVu Sans, Book
hack 1.5s 0.02s Hack, Regular (same)
computer modern 1.5s 0.03s Computer Modern, Roman Computer Modern, Medium
schoolbook 1.5s 0.03s Century Schoolbook L, Roman Adobe New Century Schoolbook, Medium
medium euler 1.5s 0.08s Alegreya-Medium, Medium Euler, Medium
bold slanted roman 1.5s 0.04s Latin Modern Roman Slanted, 10 Bold (same)

Subsequent identical searches take ~0.0002s.

If more people could perform adversarial (but not pathological) tests with this scheme, that would be much appreciated.

@tecosaur
Copy link
Author

There was some concern raised about using the file-sorting based shortcut. I've just had a look at instead baking the fontfile sort-info into the precompilation step, and that seems to work a treat.

Some fonts may have multiple styles that count as "regular", in which
case the chosen variant will just be the first seen (whichever that is).

We might as well be deliberate about this, and go for "more
regular" (like regular) styles over "less regular" ones (like book and
medium).
It's entirely possible that multiple forms of a font may be present on
one system, for instance an OTF and also a PFB from texlive. We care
about this because different formats have different levels of
capability, and we don't want to choose a worse one simply because it's
the first format considered.

As such, a format component is added to the font scoring heuristic.
Copy link

codecov bot commented Jan 31, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (077003e) 95.26% compared to head (3ce7edf) 95.53%.

Files Patch % Lines
src/findfonts.jl 95.74% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #84      +/-   ##
==========================================
+ Coverage   95.26%   95.53%   +0.26%     
==========================================
  Files           6        6              
  Lines         317      336      +19     
==========================================
+ Hits          302      321      +19     
  Misses         15       15              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jkrumbiegel
Copy link
Collaborator

If we're caching paths at precompilation time, that will pose issues for relocatability I think.. Maybe this should not be done at that point but at init. It could be made async so that it doesn't delay package loading, but then some locking mechanism would need to be in place for usage.

@tecosaur
Copy link
Author

If we're caching paths at precompilation time, that will pose issues for relocatability I think...

Relocatability does make this interesting, but I think we need to be clearer on what the potential "issues" might be. In this case, the primary consequence of a precompiled value that doesn't match the actual system is stale cache entries which will be replaced/ignored at runtime. I.e. this doesn't affect correctness, but might affect some performance considerations.

Do you know of any precompile result relocation happening in practice? It could help to have a more concrete example to discuss this in the context of.

@jkrumbiegel
Copy link
Collaborator

Yes the issue would be that the baked in fonts in the dict probably do not transfer to another system. So you wouldn't gain much speed there.

Do you know of any precompile result relocation happening in practice? It could help to have a more concrete example to discuss this in the context of.

At work, we use sysimages compiled on CI runners and transferred to other runners. There we run into relocatability issues all the time.

@tecosaur
Copy link
Author

tecosaur commented Feb 1, 2024

Right. So it sounds like what we really want is a per-system font cache file?

Would Scatch.jl be the way to go then?

@tecosaur
Copy link
Author

tecosaur commented Feb 7, 2024

@jkrumbiegel any further thoughts on this?

@jkrumbiegel
Copy link
Collaborator

Not really further, I also think using a scratch space might be the reasonable way to go. Store a file with all the names and paths there and only repopulate the info when files change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants