Poly/ML release 5.3 introduces support for an Integrated Development Environment to extract extra information about a program. This documents the protocol used to exchange information with the front-end. It is written on top of functions that extract the information from the compiler's parse tree. Some applications may find it more convenient to interact directly with these functions and implement their own protocol. This document is primarily aimed at writers of IDEs or plug-ins who are interacting with the normal ML top-level.
The basic format uses a binary XML-like representation in which the escape character (0x1b) is used as a special marker. It may be followed by other characters that determine how the remainder of the input is to be treated. Strings are sent as a sequence of bytes terminated by the escape character. If the escape character itself appears in the string it is sent as two escape characters, except within compilation input (see below). Where a value represents a number it is sent as base ten, possibly preceded by ~ or -.
There are two different ways in which escape combinations may occur. Within the communications protocol data is exchanged between the IDE and the Poly/ML front-end using packets of data. These begin and end with an escape sequence and use escape sequences, usually escape followed by comma, to separate the elements. The opening escape sequence is always escape followed by an upper case character and the closing sequence is always escape followed by the lower case version of the opening sequence. For many cases, the format of the packet is fixed but there is an exception in the case of marked-up text. Marked-up text can arise in the case of error messages or some other output from the compiler where extra information can be inserted at arbitrary point within the text of the message. Such mark-up uses the same format as the protocol packets but the opening section is delimited by escape followed by semicolon. Having a standard format provides for upwards compatibility since an IDE can easily skip mark-up that it does not recognise.
Poly/ML can be run in a mode where it produces enhanced output but otherwise
runs a normal top-level. This can be used by the IDE to give the user access
to a normal interactive ML session. The --with-markup
option to
Poly/ML runs the normal Poly/ML top-level loop but causes it to add mark-up
to some of its messages. Currently it is used in two cases; in error messages
and in messages showing where an identifier was declared.
The format of the information showing a location is:
An error message packet consists of
"kind" is either 'E' indicating a hard error or 'W' indicating a non-fatal warning. This is followed by the text of the message and then the closing packet:
Mark up in the future will follow the same pattern allowing the IDE to skip unrecognised mark-up. This mark-up is also used in some of the packets within the full IDE protocol.
When run with the --ideprotocol
option the top-level loop runs the full IDE communication protocol. This can also be started by PolyML.IDEInterface.runIDEProtocol()
from within PolyML.
This is intended primarily for compiling files while they are being edited, either as the result of an explicit request from the user or automatically. When this option is given the front-end retains the parse tree and requests can be made to extract information from the parse tree.
When the IDE mode is started, PolyML sends the following message to std-out:
protocol-version-number
is the version number identifying the particular version of the PolyML protocol. Requests to PolyML should wait until this message has been sent by PolyML. The current version of the protocol is 1.0.0
.
Requests to PolyML are in terms of byte offsets within the last source text. If the text has been edited since it was last sent to ML the IDE must convert positions within the current source text into positions within the original before sending requests to ML and do the reverse conversion before displaying the results.
Simple requests about the current parse tree all have the same format. They contain a request code that describes the kind of information to return and a pair of positions. Frequently the start and end positions will be the same. PolyML searches for the smallest node in the parse tree that spans the positions and returns information about that node. It always retains the actual span for the node in the result so that the IDE can highlight the actual text in the display.
Every request contains a request identifier which is returned in the result. This allows the IDE to run asynchronously. A request identifier is an arbitrary string generated by the IDE. The request identifier used in a compilation request has a special status. This identifier is used to mark the version of the parse tree that results from the compilation and must be included in commands that query the parse tree. In that way Poly/ML is able to tell whether a request refers to the current tree or to an older or newer version.
The format of a request packet is:
O | Return a list of properties for the node |
T | Return type information |
I | Return declaration location |
M | Move relative to a given position |
V | Return a list of references to an identifier |
Responses follow a similar structure to the request. The start and end code for a response is the same as the start and end for the request. All responses contain the actual start and end points of the current tree. If there is no parse tree the start and end offsets will be zero. An unrecognised command will return an empty response for forwards compatibility. Where a command is invalid or unrecognised the response will be
In particular, because the IDE may issue requests while a compilation is running the parse tree id in the request packet may not match the current parse tree within Poly/ML. In that case the parse tree id in the result packet will contain the current parse tree and not the parse tree id in the request. The IDE must keep a list of requests it has sent along with the parse tree id it used and if it receives a response with a different parse tree id it should reissue the request adjusting the offsets to account for any changes.
O Request:
T Request:
I Request:
I | Return declaration location |
J | Return the location where an identifier was opened |
S | Return the location of an identifier's parent structure |
M request:
"U" | Move to the parent node |
"C" | Move to the first child node |
"N" | Move to the next sibling |
"P" | Move to the previous sibling |
V Request:
In order to compile a piece of text the IDE sends it to ML through the protocol. Because any previous compilation may have executed code and affected the global state it is assumed that the IDE will set up some form of context for the file by previously saving some state. Typically, this would require it to have compiled all the files that this particular piece of source text will depend on and to have saved it in a saved state. A compilation request therefore has the following structure:
prelude-length
' and
'source-length
' are
the number of bytes in the prelude and source text respectively. Since the prelude
and source text may be large it is much more efficient to use these lengths
to read the input. Because these lengths are provided it is not necessary to
search for the escape code within the text and so if the escape character appears
within the text it is not itself escaped.
Poly/ML responds with a result block. The format of the result block depends on the result of the compilation and possible execution of the code. The result block has the form:
result
" is a single character indicating success or failure.finaloffset
" is the byte position that indicates the extent of
the valid parse tree. If there was an error this may be less than the end of
the input. It may be the start of the input if there was a syntax error and
no parse tree could be created. As usual "request-id
" is the ID of
the request. The "parse-tree-id
" will normally be the same as the
"request-id
" indicating that the compilation has updated the parse
tree, even if type checking failed. However, if there was a failure, such as
during parsing, that meant that no new parse tree could be produced the ID returned
will be the original parse tree ID, or the empty string if there was none. errors_and_messages
have the same format as described above for mark-up.
The result codes are
S - Success. The file compiled successfully and ran without an exception.
X - Exception. The file compiled successfully but raised an exception when it
ran.
L - The prelude code failed to compile or raised an exception.
F - Parse or type checking failure.
C - Cancelled during compilation.
The parse tree will be updated to reflect the result of the compilation and
the current parse tree identifier used by Poly/ML will be set to the identifier
supplied in the request.
For a result code of S (compiled successfully) there may be warnings.
For a result code of L (prelude code failed) the result packet contains the exception packet that
was returned and has the form:
For a result code of F (parsing or type checking failed) the result packet contains a list of one or more error packets. The format of the result packet is:
Where the error packets have the same format as described above for mark-up.
For a result code of C (cancel compiled) the result may or may not contain error packets depending on whether the compilation had produced error messages before the compilation was cancelled.
For a result code of X the errors_and_messages
result packet contains the exception message first within a X
tag, as
a string. The string may also containing output mark-up such as the D-style mark-up showing the location
where the exception was raised. Thus the format of the packet with exception data
is:
Compilation is run as a separate thread and may be cancelled using the K-request.
The action on receiving a cancel request depends on the current state of the compilation. If the compilation has already finished no action is taken. Poly/ML will have already sent a result packet for the compilation. If the compilation is in progress Poly/ML will attempt to cancel it by sending an interrupt to the compilation thread to ask it to terminate. If the thread is actually in the compiler at the time the interrupt is received the result will be a C result code but if it is actually executing the result of compilation this code will receive the Interrupt exception. Assuming it does not trap it the result will be an X result code. The thread may actually have completed before the interrupt is processed so any other result is also possible.