Data Fields
The Data
field is very simple: it is just a piece of bytes.
The interesting part is that its size is not fixed and it is determined at run-time.
It can be defined:
- by the value of another field.
- based on some pattern or marker in the data.
- by a function called in run-time.
Size based on another field
Let’s begin with an example:
>>> from bisturi.packet import Packet
>>> from bisturi.field import Data, Int
>>> import re
>>> class BasedOnOther(Packet):
... length = Int(1)
... a = Data(2)
... b = Data(length)
... c = Data(length * 2)
The field a
has a fixed size of 2 bytes. No much to say.
The interesting fields are b
and c
which their sizes depend
on the value of the length
. As you may guess b
is a data of
size length
and c
is twice as large.
more on expressions
>>> s = b'\x01abCdd'
>>> p = BasedOnOther.unpack(s)
>>> p.length
1
>>> p.a # size is fixed to 2 bytes
b'ab'
>>> p.b # size is the value of 'length' (1 byte in this case)
b'C'
>>> p.c # size is the value of 'length * 2' (2 bytes in this case)
b'dd'
>>> p.pack() == s
True
Size based on patterns
Data
fields can also use the same string that it is packing/unpacking
to determinate the size.
With patterns, Data
will consume all the string until it finds
a particular token or a regular expression’s match.
>>> from bisturi.packet import Packet
>>> from bisturi.field import Data, Int
>>> import re
>>> class BasedOnPattern(Packet):
... a = Data(until_marker=b'\0', include_delimiter=True)
... b = Data(until_marker=b'fff')
... c = Data(until_marker=re.compile(b'X+|$'), include_delimiter=True)
... d = Data(until_marker=re.compile(b'X+|$'))
As you may guess a
and b
will read bytes until a '\0'
and a 'fff'
are found respectively.
For c
and d
the cut condition is based on a regular expression which
in this case says “until you find one or more X or you reach the end of
the string”.
>>> s = b'ddd\x00eeeefffghiXXXjk'
>>> p = BasedOnPattern.unpack(s)
>>> p.a # 'c' will be everything until a '\0' is found (not included)
b'ddd\x00'
>>> p.b # 'd' is the same, until 'fff' is found (not included)
b'eeee'
>>> p.c # this uses a regex: "until a 'X' is found or it is the end"
b'ghiXXX'
>>> p.d # the same, in this case the limit was the end of the string
b'jk'
>>> p.pack() == s
True
The fields a
and c
have the include_delimiter
enabled which makes
the fields to include the until_marker
as part of their content.
For a
this means include the '\0'
and for c
this means include the
'XXX'
.
[extra] Searching space
By default, the until_marker
expression is used to search the marker in
the whole raw string starting from the field’s offset.
But when the raw string is huge, searching in the whole space may lead to a performance problems.
Fortunately most of the cases we can expect to find the marker in the first few bytes so we can set a maximum search buffer length to avoid the scanning of the full string in memory.
Let see an example of search_buffer_length
:
>>> class DataWithSearchLengthLimit(Packet):
... __bisturi__ = { 'search_buffer_length': 4 }
...
... a = Data(until_marker=b'\0')
>>> s = b'ab\x00eeee'
>>> p = DataWithSearchLengthLimit.unpack(s)
>>> p.a
b'ab'
If the marker is not found, bisturi
will not attempt to scan further
and instead it will raise an exception:
>>> class DataWithSearchLengthLimitTooShort(Packet):
... __bisturi__ = { 'search_buffer_length': 2 } # tooooo short!
...
... a = Data(until_marker=b'\0')
>>> s = b'ab\x00eeee'
>>> p = DataWithSearchLengthLimitTooShort.unpack(s)
Traceback (most recent call last):
<...>PacketError: Error when unpacking the field 'a' of packet DataWithSearchLengthLimitTooShort at 00000000<...>
There is an exception to this rule: when the marker is the $
regex the
limit is not honored.
The $
regex means “give me all until the end of the string”. This can
be done very efficiently so there is no need for a limit.
>>> from bisturi.field import EOS
>>> class DataWithSearchLengthLimitTooShortButIgnored(Packet):
... __bisturi__ = { 'search_buffer_length': 2 }
...
... a = Data(until_marker=EOS, include_delimiter=False)
>>> s = b'abeeee'
>>> p = DataWithSearchLengthLimitTooShortButIgnored.unpack(s)
>>> p.a
b'abeeee'
Yes, EOS
is just an alias of re.compile(b"$")
to signal that we want
to read until the end of string.
Size based on functions
So, what happen if you need to compute some non-trivial size that depend on something that can be resolved only when the packet disassembled the byte string?
Just call a function!
>>> class BasedOnFunc(Packet):
... size = Int(1)
... payload = Data(lambda pkt, raw, offset, **k: pkt.size if pkt.size < 255 else len(raw)-offset)
or, if you prefer
>>> class BasedOnFunc(Packet):
... def calc_size(pkt, raw, offset, **k):
... return pkt.size if pkt.size < 255 else len(raw)-offset
...
... size = Int(1)
... payload = Data(calc_size)
...
In this example, the size of the payload
is determined by
the value of size
only if it is not equal to 255; when size
is 255,
the payload will consume all the bytes until the end of the packet.
>>> s1 = b'\x01a'
>>> s2 = b'\x02ab'
>>> s3 = b'\x01abc'
>>> BasedOnFunc.unpack(s1).payload # size was 1
b'a'
>>> BasedOnFunc.unpack(s2).payload # size was 2
b'ab'
>>> BasedOnFunc.unpack(s3).payload # size was 1
b'a'
>>> s4 = b'\xffa'
>>> s5 = b'\xffabc'
>>> BasedOnFunc.unpack(s4).payload # size was 255, grab everything
b'a'
>>> BasedOnFunc.unpack(s5).payload # size was 255, grab everything
b'abc'
>>> BasedOnFunc.unpack(s1).pack() == s1
True
>>> BasedOnFunc.unpack(s5).pack() == s5
True
Defaults for Data
fields
When Data
is of fixed size, the default is a string of null bytes
of the size specified.
Otherwise, the default is the empty string. bisturi
tries to be
pragmatic here: it is not clear what would be a good default for
something non-trivial as Data(until_marker=re.compile(b'X+|$'))
so the empty string is as good or bad as another option, but simpler.
>>> q = BasedOnOther()
>>> q.length
0
>>> q.a
b'\x00\x00'
>>> q.b
b''
>>> q.c
b''
>>> q = BasedOnPattern()
>>> q.a
b''
>>> q.b
b''
>>> q.c
b''
>>> q.d
b''
TODO
link to expressions