|
subroutine | mod_lexer::lexer_read_token (lex, tok) |
| Read a token and ignore spaces and new lines. Also detect keywords. More...
|
|
subroutine | mod_lexer::lexer_unget_token (lex, tok) |
| Push a token into the buffer. More...
|
|
subroutine | mod_lexer::lexer_peek_token (lex, tok) |
| Read a token and push it into the buffer. More...
|
|
logical function | mod_lexer::lexer_next_token (lex, kind, peek) |
| Check if the next token is of the given kind. More...
|
|
subroutine | mod_lexer::lexer_set_position (lex, tok) |
| Set the source file read position to the position of the given token. More...
|
|
character(len=:) function, allocatable | mod_lexer::token_id_to_string (kind) |
| Return a string with the formal name of a token kind. Useful for debug and error management. More...
|
|
subroutine | mod_lexer::lexer_initialize (lex, filename, keywords) |
| Initialize a lexer type. More...
|
|
subroutine | mod_lexer::token_finalize (tok) |
| Deallocate a token. More...
|
|
subroutine | mod_lexer::lexer_finalize (lex) |
| Deallocate a lexer. More...
|
|
subroutine | mod_lexer::lexer_read_new_source_file (lex, filename) |
| Open a new source file and switch the reading to this new file. More...
|
|
subroutine | mod_lexer::lexer_read_new_source_file_from_string (lex, string, label) |
| Switch the reading to the provided string. More...
|
|
character function | mod_lexer::lexer_peek (lex) |
| Read a character without advancing the position in the source file. More...
|
|
logical function | mod_lexer::lexer_next (lex, expect) |
| Check if the next character is the expected character. More...
|
|
subroutine | mod_lexer::lexer_skip_line (lex) |
| Skip all characters until a new line character is reached or until the file ends. More...
|
|
logical function | mod_lexer::lexer_skip_one_space_char (lex) |
| Skip one space characters. Return .true. if any space character has been skipped. Comments are also skipped and are considered as space characters. More...
|
|
subroutine | mod_lexer::lexer_skip_spaces (lex, is_skipped) |
| Skip all spaces until a non-space character is reached. Also skip comments. More...
|
|
Brief description
The lexer aims to transform a stream of character into a stream of tokens by gathering the characters according to some patterns. For instance, a set of digits put beside each other forms an integer.
Let us recall the lexer/parser organization with the 3 main units:
keyword list
v
┌──────────┐ ┏━━━━━━━━━━━┓ ┌────────────┐
files │ │ characters ┃ ┃ tokens │ │ ───────>
─────> │ File │ ──────────>┃ Lexer ┃ ───────> │ Parser │ tokens / actions
│ │ ┃ ┃ │ │ <───────
└──────────┘ ┗━━━━━━━━━━━┛ └────────────┘
mod_source_file mod_lexer mod_parser
mod_identifier
mod_scope
Behavior
- The lexer merges multiple space characters into one space characters.
- If the
ignore_space
is set to .true.
(default), the lexer does not generate space token.
- If the
ignore_newline
is set to .true.
(default), the lexer does not generate newline token.
- The lexer skips comment lines. The comment character can be set in the
comment
attribute. '#' is default character for comments.
Usage
As an example, let us consider the following source file (example.txt):
# Print a message to screen
print "I love Notus <3";
# Compute the square of 42 an print the result to screen
square 42;
Specifications:
- A comment line starts with a '#'
- An instruction line must ends with ';' (tk_semicolon)
- The language has two keywords with the following syntax:
- 'print':
print tk_string tk_semicolon
- 'square':
square tk_integer tk_semicolon
To create a lexer for this language, the lexer requires a list of keywords to convert identifier tokens (tk_identifier
) to keyword tokens (tk_keyword
).
The following code initializes the lexer with a list of keywords and a source file:
type(t_lexer) :: lex
type(t_keyword_name), dimension(2) :: keywords
type(t_token) :: tok
keywords(kw_print )%name = "print"
keywords(kw_square)%name = "square"
call lexer_initialize(lex, "example.txt", keywords)
The following code reads the file and displays the tokens:
do while(tok%kind /= tk_eof)
select case(tok%kind)
case(tk_error)
write(*,'("Error found at position ",g0)') tok%pos
stop
case(tk_keyword)
write(*,'("kind: ",a," name: ",a)') trim(token_id_to_string(tok%kind
case(tk_string)
write(*,'("kind: ",a," value: ",g0)') trim(token_id_to_string(tok%kind
case(tk_integer)
write(*,'("kind: ",a," value: ",g0)') trim(token_id_to_string(tok%kind
case default
write(*,'("kind: ",a') trim(token_id_to_string(tok%kind))
end select
call lexer_read_token(lex, tok)
end do
write(*,'("EOF")')
Expected output:
kind: keyword name: print
kind: string value: I love Notus <3
kind: semicolon
kind: keyword name: square
kind: integer value: 42
kind: semicolon
EOF
◆ lexer_finalize()
subroutine mod_lexer::lexer_finalize |
( |
type(t_lexer), intent(inout) |
lex | ) |
|
◆ lexer_initialize()
subroutine mod_lexer::lexer_initialize |
( |
type(t_lexer), intent(out) |
lex, |
|
|
character(len=*), intent(in) |
filename, |
|
|
type(t_keyword_name), dimension(:), intent(in) |
keywords |
|
) |
| |
- Parameters
-
[out] | lex | Lexer to initialize |
[in] | filename | Source file name |
[in] | keywords | List of keywords |
◆ lexer_next()
logical function mod_lexer::lexer_next |
( |
type(t_lexer), intent(inout) |
lex, |
|
|
character, intent(in) |
expect |
|
) |
| |
If the next character does not match the expected character, does not advance the position in the source file.
- Parameters
-
[in,out] | lex | Lexer |
[in] | expect | Expected character |
- Returns
- Return
.true.
if the next character match the expected character
◆ lexer_next_token()
logical function mod_lexer::lexer_next_token |
( |
type(t_lexer), intent(inout) |
lex, |
|
|
integer, intent(in) |
kind, |
|
|
logical, optional |
peek |
|
) |
| |
Behavior:
- If no match found, the token is pushed into the buffer.
- If
peek
is present and set to true, the token is always pushed into the buffer
- Parameters
-
[in,out] | lex | Lexer |
[in] | kind | Token kind |
| peek | Peek instead of get if set to .true. |
◆ lexer_peek()
character function mod_lexer::lexer_peek |
( |
type(t_lexer), intent(inout) |
lex | ) |
|
◆ lexer_peek_token()
subroutine mod_lexer::lexer_peek_token |
( |
type(t_lexer), intent(inout) |
lex, |
|
|
type(t_token), intent(out) |
tok |
|
) |
| |
- Parameters
-
[in,out] | lex | Lexer |
[in] | tok | Token |
◆ lexer_read_new_source_file()
subroutine mod_lexer::lexer_read_new_source_file |
( |
type(t_lexer), intent(inout) |
lex, |
|
|
character(len=*), intent(in) |
filename |
|
) |
| |
- Parameters
-
[in] | lex | Lexer |
[in] | filename | Source file name |
◆ lexer_read_new_source_file_from_string()
subroutine mod_lexer::lexer_read_new_source_file_from_string |
( |
type(t_lexer), intent(inout) |
lex, |
|
|
character(len=*), intent(in) |
string, |
|
|
character(len=*), intent(in) |
label |
|
) |
| |
- Parameters
-
[in] | lex | Lexer |
[in] | string | String character |
[in] | label | Label for the string |
◆ lexer_read_token()
subroutine mod_lexer::lexer_read_token |
( |
type(t_lexer), intent(inout) |
lex, |
|
|
type(t_token), intent(out) |
tok |
|
) |
| |
- Parameters
-
[in,out] | lex | Lexer |
[out] | tok | Token |
◆ lexer_set_position()
subroutine mod_lexer::lexer_set_position |
( |
type(t_lexer), intent(inout) |
lex, |
|
|
type(t_token), intent(in) |
tok |
|
) |
| |
- Parameters
-
[in,out] | lex | Lexer |
[in] | tok | Token containing the source position |
◆ lexer_skip_line()
subroutine mod_lexer::lexer_skip_line |
( |
type(t_lexer), intent(inout) |
lex | ) |
|
◆ lexer_skip_one_space_char()
logical function mod_lexer::lexer_skip_one_space_char |
( |
type(t_lexer), intent(inout) |
lex | ) |
|
- Parameters
-
- Returns
- Return
.true.
if any space character has been skipped
◆ lexer_skip_spaces()
subroutine mod_lexer::lexer_skip_spaces |
( |
type(t_lexer), intent(inout) |
lex, |
|
|
logical, intent(out) |
is_skipped |
|
) |
| |
- Parameters
-
[in,out] | lex | Lexer |
[out] | is_skipped | .true. if any space character has been skipped |
◆ lexer_unget_token()
subroutine mod_lexer::lexer_unget_token |
( |
type(t_lexer), intent(inout) |
lex, |
|
|
type(t_token), intent(in) |
tok |
|
) |
| |
- Parameters
-
[in,out] | lex | Lexer |
[in] | tok | Token |
◆ token_finalize()
subroutine mod_lexer::token_finalize |
( |
type(t_token), intent(inout) |
tok | ) |
|
◆ token_id_to_string()
character(len=:) function, allocatable mod_lexer::token_id_to_string |
( |
integer, intent(in) |
kind | ) |
|
- Parameters
-
- Returns
- name: Formal name of the token