Better font-finding heuristics, with a shortcut and a few caches. #84

tecosaur · 2024-01-31T11:19:46Z

I initially started looking at this package when I found that it doesn't always look in the right directories for fonts (see #82). However, I recently found that the heuristic for font finding itself is both slow (#67) and a bit dodgy (#83).

So, I've gone through the font-finding code and made a few changes:

The scoring heuristic takes a few more factors into account
- Earlier components of a search string are now weighted slightly higher (e.g. the plex in ibm plex sans italic is worth more than sans)
- Particular regular styles will be picked over others (e.g. regular over medium)
- Certain font formats are prioritised (e.g. otf over pfb)
~~Introduced a considered shortcut when scoring each font file~~
- ~~The list of matching font-file is pre-sorted according to matches of the search string in the font file name~~
- ~~We calculate the maximum possible family and style score given the search string~~
- ~~We return the current best font early when:~~
  - ~~We have found a font that maximum score~~
  - ~~We have seen more than twice as many fonts as the last font with the maximum score seen~~
Introduced a few (runtime) caches
- A cache of the font file names, without the extension, as lowercase
- A cache of the family, style, and extension of font files
- A cache of the resolved font for given searches (invalidated by directory modification times)

From the test that I've performed locally, this batch of changes results in faster, better initial lookups, and faster (again) subsequent lookups.

Here are some test results on my machine:

Search string	Current time	PR time	Current result (family, style)	PR result
ibm plex sans bold italic	1.5s	0.06s	AlegreyaSans-BoldItalic, Bold	IBM Plex Sans, Bold Italic
ibm sans	1.5s	0.04s	IBM Plex Sans, Regular	(same)
sans	1.5s	0.02s	KpSans, Regular	DejaVu Sans, Book
hack	1.5s	0.02s	Hack, Regular	(same)
computer modern	1.5s	0.03s	Computer Modern, Roman	Computer Modern, Medium
schoolbook	1.5s	0.03s	Century Schoolbook L, Roman	Adobe New Century Schoolbook, Medium
medium euler	1.5s	0.08s	Alegreya-Medium, Medium	Euler, Medium
bold slanted roman	1.5s	0.04s	Latin Modern Roman Slanted, 10 Bold	(same)

Subsequent identical searches take ~0.0002s.

If more people could perform adversarial (but not pathological) tests with this scheme, that would be much appreciated.

tecosaur · 2024-01-31T11:38:31Z

There was some concern raised about using the file-sorting based shortcut. I've just had a look at instead baking the fontfile sort-info into the precompilation step, and that seems to work a treat.

Some fonts may have multiple styles that count as "regular", in which case the chosen variant will just be the first seen (whichever that is). We might as well be deliberate about this, and go for "more regular" (like regular) styles over "less regular" ones (like book and medium).

It's entirely possible that multiple forms of a font may be present on one system, for instance an OTF and also a PFB from texlive. We care about this because different formats have different levels of capability, and we don't want to choose a worse one simply because it's the first format considered. As such, a format component is added to the font scoring heuristic.

codecov · 2024-01-31T11:46:04Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (077003e) 95.26% compared to head (3ce7edf) 95.53%.

Files	Patch %	Lines
src/findfonts.jl	95.74%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #84      +/-   ##
==========================================
+ Coverage   95.26%   95.53%   +0.26%     
==========================================
  Files           6        6              
  Lines         317      336      +19     
==========================================
+ Hits          302      321      +19     
  Misses         15       15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/findfonts.jl

jkrumbiegel · 2024-01-31T20:15:12Z

If we're caching paths at precompilation time, that will pose issues for relocatability I think.. Maybe this should not be done at that point but at init. It could be made async so that it doesn't delay package loading, but then some locking mechanism would need to be in place for usage.

tecosaur · 2024-01-31T21:16:57Z

If we're caching paths at precompilation time, that will pose issues for relocatability I think...

Relocatability does make this interesting, but I think we need to be clearer on what the potential "issues" might be. In this case, the primary consequence of a precompiled value that doesn't match the actual system is stale cache entries which will be replaced/ignored at runtime. I.e. this doesn't affect correctness, but might affect some performance considerations.

Do you know of any precompile result relocation happening in practice? It could help to have a more concrete example to discuss this in the context of.

jkrumbiegel · 2024-02-01T07:35:15Z

Yes the issue would be that the baked in fonts in the dict probably do not transfer to another system. So you wouldn't gain much speed there.

Do you know of any precompile result relocation happening in practice? It could help to have a more concrete example to discuss this in the context of.

At work, we use sysimages compiled on CI runners and transferred to other runners. There we run into relocatability issues all the time.

tecosaur · 2024-02-01T07:49:28Z

Right. So it sounds like what we really want is a per-system font cache file?

Would Scatch.jl be the way to go then?

tecosaur · 2024-02-07T03:38:02Z

@jkrumbiegel any further thoughts on this?

jkrumbiegel · 2024-02-07T16:45:45Z

Not really further, I also think using a scratch space might be the reasonable way to go. Store a file with all the names and paths there and only repopulate the info when files change.

Search for fonts in heuristic order, with bailout

27770ad

tecosaur force-pushed the findfont-faster branch from 2d911ef to dcaab0e Compare January 31, 2024 11:43

tecosaur added 3 commits January 31, 2024 19:45

Add a two-level cache to font searching

fc406d0

Stop sorting font files, cache info in precompile

3ce7edf

tecosaur force-pushed the findfont-faster branch from dcaab0e to 3ce7edf Compare January 31, 2024 11:46

jkrumbiegel reviewed Jan 31, 2024

View reviewed changes

src/findfonts.jl Show resolved Hide resolved

jkrumbiegel reviewed Jan 31, 2024

View reviewed changes

src/findfonts.jl Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better font-finding heuristics, with a shortcut and a few caches. #84

Better font-finding heuristics, with a shortcut and a few caches. #84

tecosaur commented Jan 31, 2024 •

edited

Loading

tecosaur commented Jan 31, 2024

codecov bot commented Jan 31, 2024 •

edited

Loading

jkrumbiegel commented Jan 31, 2024

tecosaur commented Jan 31, 2024

jkrumbiegel commented Feb 1, 2024

tecosaur commented Feb 1, 2024

tecosaur commented Feb 7, 2024

jkrumbiegel commented Feb 7, 2024

Better font-finding heuristics, with a shortcut and a few caches. #84

Are you sure you want to change the base?

Better font-finding heuristics, with a shortcut and a few caches. #84

Conversation

tecosaur commented Jan 31, 2024 • edited Loading

tecosaur commented Jan 31, 2024

codecov bot commented Jan 31, 2024 • edited Loading

Codecov Report

jkrumbiegel commented Jan 31, 2024

tecosaur commented Jan 31, 2024

jkrumbiegel commented Feb 1, 2024

tecosaur commented Feb 1, 2024

tecosaur commented Feb 7, 2024

jkrumbiegel commented Feb 7, 2024

tecosaur commented Jan 31, 2024 •

edited

Loading

codecov bot commented Jan 31, 2024 •

edited

Loading