The suffix name of the file submitted in this page should be ".txt" or ".fasta".
And the data should be fasta format and including "|"character and labels.
Welcome to IKP-DBPPred server
This paper uses mixed feature representation with the best performance
according to the experimental results. The method entails the following
main steps. The protein sequences are represented by three feature
representation methods, and the classifier used is SVM. These three
methods are combined, and max-relevance-max-distance (MRMD) is used to
reduce the dimensions. The mixed feature representation is finally tested
through an experiment. Figure 1 shows the experimental process in this paper.
Datasets
The benchmark dataset used in the paper is PDB186. It is first proposed
by Lou et al.[1], and contains 93 actual DNA-binding proteins (positive samples)
and 93 non-DNA-binding proteins (negative samples) .
The dataset PDB186 with FASTA format can be (Download Here)
References:
[1] W. Lou, et al., "Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naive Bayes," PloS one, vol. 9, 2014.
Help
The site is for the identification of protein sequences to determine whether it is DNA binding protein.
The format of this site should be fasta format.The first line is any text that starts with ">" and
the analysis only works for fasta header with "|" character and labels. Starting from the second line is the sequence itself, allowing
only the use of established amino acid encoding symbols.
e.g >2MA1A|1
HDAPLFEALRAWRLQKAKELSLPPYTIFHDATLKTIAELRPGSHATLGTVSGVGGRKLAAYGDEVLQVVRDSSGG