Sequence Data (pyncbitk.objects.seqdata)#
Actual sequence data of biological sequences.
The NCBI C++ Toolkit provides a unified API for encoding the actual data of protein and nucleotide sequences. It supports textual representation, ordinal encoding (replacing A, C, G, T with 1, 2, 3, 4), as well as compressed bitmap representations.
See also
The Data Model chapter of the NCBI C++ Toolkit documentation.
Base Classes#
- class pyncbitk.objects.seqdata.SeqData(Serial)#
An abstract base storage of sequence data.
- classmethod __new__(*args, **kwargs)#
- __init__(*args, **kwargs)#
- __reduce__()#
Helper for pickle.
- class pyncbitk.objects.seqdata.SeqAaData(SeqData)#
An abstract base storage of amino-acid sequence data.
- classmethod __new__(*args, **kwargs)#
- classmethod encode(data)#
Encode the textual sequence to a compressed representation.
- __reduce__()#
Helper for pickle.
- class pyncbitk.objects.seqdata.SeqNaData(SeqData)#
An abstract base storage of nucleotide sequence data.
- classmethod __new__(*args, **kwargs)#
- classmethod encode(data)#
Encode the textual sequence to a compressed representation.
- Parameters:
data (
str,bytes, or buffer-like object) – The ASCII nucleotide sequence to be encoded. Python strings, and any other object supporting the buffer protocol is supported.- Raises:
ValueError – When the sequence data contains invalid characters that do not belong to the nucleotide alphabet.
- __reduce__()#
Helper for pickle.
Nucleotide Data#
- class pyncbitk.objects.seqdata.IupacNaData(SeqNaData)#
Nucleotide sequence data stored as a IUPAC nucleotide string.
Example
>>> seqdata = IupacNaData("ATTAGCCATGCATA") >>> seqdata.length 14 >>> seqdata.data b'ATTAGCCATGCATA'
- classmethod __new__(*args, **kwargs)#
- classmethod encode(data)#
Encode the textual sequence to a compressed representation.
- Parameters:
data (
str,bytes, or buffer-like object) – The ASCII nucleotide sequence to be encoded. Python strings, and any other object supporting the buffer protocol is supported.- Raises:
ValueError – When the sequence data contains invalid characters that do not belong to the nucleotide alphabet.
- __buffer__(flags, /)#
Return a buffer object that exposes the underlying memory of the object.
- __eq__(value, /)#
Return self==value.
- __ge__(value, /)#
Return self>=value.
- __gt__(value, /)#
Return self>value.
- __init__(*args, **kwargs)#
- __le__(value, /)#
Return self<=value.
- __lt__(value, /)#
Return self<value.
- __ne__(value, /)#
Return self!=value.
- __reduce_ex__(protocol)#
Helper for pickle.
- __repr__()#
Return repr(self).
- class pyncbitk.objects.seqdata.Ncbi2NaData(SeqNaData)#
Nucleotide sequence data stored with 2-bit encoding.
A nucleic acid containing no ambiguous bases can be encoded using a two-bit encoding per base, representing one of the four nucleobases:
A,C,GorT. This encoding is the most compact for unambiguous sequences.Example
>>> seqdata = Ncbi2NaData.encode("ATTAGCCATGCATA") >>> seqdata.data b'<\x94\xe4\xc0'
- classmethod __new__(*args, **kwargs)#
- __buffer__(flags, /)#
Return a buffer object that exposes the underlying memory of the object.
- __init__(*args, **kwargs)#
- __reduce_ex__(protocol)#
Helper for pickle.
- __repr__()#
Return repr(self).
- class pyncbitk.objects.seqdata.Ncbi4NaData(SeqNaData)#
Nucleotide sequence data stored with 4-bit encoding.
- classmethod __new__(*args, **kwargs)#
- __buffer__(flags, /)#
Return a buffer object that exposes the underlying memory of the object.
- __init__(*args, **kwargs)#
- __reduce_ex__(protocol)#
Helper for pickle.
- __repr__()#
Return repr(self).
Protein Data#
- class pyncbitk.objects.seqdata.IupacAaData(SeqAaData)#
Nucleotide sequence data stored in a IUPAC-UBI amino-acid string.
The IUPAC-IUB Commission on Biochemical Nomenclature defined a code of one-letter abbreviations for the 20 standard amino-acids, as well as undeterminate and unknown symbols.
References
IUPAC-IUB Commission on Biochemical Nomenclature. “A One-Letter Notation for Amino Acid Sequences” 1–3. (1968). Journal of Biological Chemistry, 243(13), 3557–3559. doi:10.1016/S0021-9258(19)34176-6.
- classmethod __new__(*args, **kwargs)#
- classmethod encode(data)#
Encode the textual sequence to a compressed representation.
- __buffer__(flags, /)#
Return a buffer object that exposes the underlying memory of the object.
- __init__(*args, **kwargs)#
- __reduce_ex__(protocol)#
Helper for pickle.
- __repr__()#
Return repr(self).
- class pyncbitk.objects.seqdata.Ncbi8AaData(SeqNaData)#
Amino-acid sequence data with support for modified residues.
- classmethod __new__(*args, **kwargs)#
- __reduce__()#
Helper for pickle.
- class pyncbitk.objects.seqdata.NcbiEAaData(IupacAaData)#
Amino-acid sequence data storing an NCBI-extended string.
This representation adds symbols for the non-standard selenocysteine amino-acid (
U) as well as support for termination or gap characters.- classmethod __new__(*args, **kwargs)#
- classmethod encode(data)#
Encode the textual sequence to a compressed representation.
- __buffer__(flags, /)#
Return a buffer object that exposes the underlying memory of the object.
- __init__(*args, **kwargs)#
- __reduce_ex__(protocol)#
Helper for pickle.
- __repr__()#
Return repr(self).
- class pyncbitk.objects.seqdata.NcbiPAaData(SeqNaData)#
Amino-acid sequence data storing probabilities for each position.
- classmethod __new__(*args, **kwargs)#
- __reduce__()#
Helper for pickle.
- class pyncbitk.objects.seqdata.NcbiStdAa(SeqNaData)#
Amino-acid sequence data stored as ordinal encoding.
This encoding represents the NCBI-extended amino-acids as consecutive integer values, starting with 0 for the gap character.
- classmethod __new__(*args, **kwargs)#
- __reduce__()#
Helper for pickle.