Tokenize

Tokenize is a Julia package that serves a similar purpose and API as the tokenize module in Python but for Julia. This is to take a string or buffer containing Julia code, perform lexical analysis and return a stream of tokens.

The goals of this package is to be

Fast, it currently lexes all of Julia source files in ~0.25 seconds (580 files, 2 million Tokens)
Round trippable, that is, from a stream of tokens the original string should be recoverable exactly.
Non error throwing. Instead of throwing errors a certain error token is returned.

API

Tokenization

The function tokenize is the main entrypoint for generating Tokens. It takes a string or a buffer and creates an iterator that will sequentially return the next Token until the end of string or buffer. The argument to tokenize can either be a String, IOBuffer or an IOStream.

julia> collect(tokenize("function f(x) end"))
 1,1-1,8          KEYWORD        "function"
 1,9-1,9          WHITESPACE     " "
 1,10-1,10        IDENTIFIER     "f"
 1,11-1,11        LPAREN         "("
 1,12-1,12        IDENTIFIER     "x"
 1,13-1,13        RPAREN         ")"
 1,14-1,14        WHITESPACE     " "
 1,15-1,17        KEYWORD        "end"
 1,18-1,17        ENDMARKER      ""

`Token`s

Each Token is represented by where it starts and ends, what string it contains and what type it is.

The API for a Token (non exported from the Tokenize.Tokens module) is.

startpos(t)::Tuple{Int, Int} # row and column where the token start
endpos(t)::Tuple{Int, Int}   # row and column where the token ends
startbyte(T)::Int            # byte offset where the token start
endbyte(t)::Int              # byte offset where the token ends
untokenize(t)::String        # string representation of the token
kind(t)::Token.Kind          # kind of the token
exactkind(t)::Token.Kind     # exact kind of the token

The difference between kind and exactkind is that kind returns OP for all operators and KEYWORD for all keywords while exactkind returns a unique kind for all different operators and keywords, ex;

julia> tok = collect(tokenize("⇒"))[1];

julia> Tokens.kind(tok)
OP::Tokenize.Tokens.Kind = 90

julia> Tokens.exactkind(tok)
RIGHTWARDS_DOUBLE_ARROW::Tokenize.Tokens.Kind = 128

All the different Token.Kind can be seen in the token_kinds.jl file

Name		Name	Last commit message	Last commit date
Latest commit History 248 Commits
.github/workflows		.github/workflows
benchmark		benchmark
src		src
test		test
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tokenize

API

Tokenization

`Token`s

About

Releases 44

Sponsor this project

Packages

Contributors 16

Languages

License

JuliaLang/Tokenize.jl

Folders and files

Latest commit

History

Repository files navigation

Tokenize

API

Tokenization

Tokens

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 44

Sponsor this project

Packages 0

Contributors 16

Languages

`Token`s

Packages