1 Introduction
Developing a new drug to the market is very costly and usually takes too much time [1, 2]. Therefore, in order to save time and cost, scientists have tried to identify new uses of existing drugs, known as drug repositioning. Moreover, predicting these interactions between drugs and targets is one of the most active domains in drug research since they can help in the drug discovery [4, 5, 3], drug sideeffect [6, 7] and drug repositioning [8, 9, 11, 10]. Currently, the known interactions between drugs and target proteins are very limited [12, 13], while it is believed that any single drug can interact with multiple targets [14, 15, 16, 17]. However, laboratory experiments of biochemical verification on drugtarget interactions are extremely expensive, laborious and timeconsuming [20, 18, 19] since there are too many possible interactions to check. Thanks to the increasing capability to collect, store and process largescale chemical and protein data, in silico prediction of the interactions between drugs and target proteins becomes an effective and efficient tool for discovering the new uses of existing drugs. The prediction results provide helpful evidences to select potential candidates of drugtarget interactions for further biochemical verification, which can reduce the costs and risks of failed drug development [22, 21].
There are two major approaches in in silico prediction, including docking simulation and machine learning [23]. Docking simulation is a common method in biology, but it has two major limitations. Firstly, this method requires threedimensional structures of targets to compute the binding of each drug candidate [24, 25, 26], but such kind of information is usually not available [27, 28]. Secondly, it is very timeconsuming. Therefore, in the last decade, many efforts have been made to solve the DTI prediction problem by machine learning approaches [23, 29, 30]. Most known machine learning methods [32, 31, 33]
treat DTI prediction as a binary classification prediction in which the drugtarget interactions are regarded as instances and the characteristics of the drugs and target proteins are considered as features. To train the classifier, machine learning methods require label data, e.g., positive samples of truly existing interactions and negative samples of noninteractive pairs. Normally, the positive samples are available, but the negative samples are not known.
Whenever extra characteristic information about the drugs and targets are not available and only a portion of interactions between drugs and targets are known, similaritybased methods are suitable to solve this problem. However, there are also some limitations of the similaritybased methods. Firstly, similaritybased methods are not applicable for new drugs or targets that do not have any interactions at all since the methods cannot compute their similarities with the others. Secondly, although similaritybased methods are simple, sometimes they do not perform well [34] since common neighbor and Jaccard indices, Cannistraci resource allocation (CRA) [35] and Cannistraci Jaccard index (CJC) [35] utilize only local information of networks. Katz index also performs unsatisfactorily (will show later) since the large decay factor will bring redundant information [36], while the small decay factor makes Katz index close to common neighbor index or local path index [38, 37].
To overcome the limitations of machine learning methods and similaritybased methods, we propose a matrixbased method, namely lowrank matrix projection (LMP). LMP does not require negative samples. When the extra characteristic information of drugs and target proteins are not available, LMP utilizes only the known interactions. On the other hand, if the extra information is available, LMP can also take such information into consideration and remarkably improve the performances. LMP has been shown to perform better than similaritybased methods on the five renown datasets (e.g., MATADOR, enzyme, ion channel, GPCR and nuclear receptor). By embedding extra information of drugs and targets, LMP has been shown to outperform many baseline methods that also use extra information. Finally, LMP can effectively predict the potential interactions of new drugs or targets that do not have any known interactions at all. The proposed method can help selecting the most likely existing interactions for further chemical verifications in case only interaction information is known or only some of characteristic information is available. In a word, LMP can reduce cost and failure in drug development, and thus advance drug discoveries.
2 Methods
2.1 Notations
In this work, various matrices and their similarity matrices are computed by the proposed method based on the characteristic information of the biological data. We denote them as follows:

: the adjacency matrix denoted the known interactions between drugs and proteins and its entries are defined as in Eq. (1).

: the lowrank similarity matrix computed based on drug information in , i.e., similar drugs interact with similar targets.

: the lowrank similarity matrix computed based on target information of (transpose of ), i.e., similar targets is interacted by similar drugs.

: the score matrix obtained from the projection of onto .

: the score matrix obtained from the projection of onto .

: the score matrix computed by combining and .

: the lowrank similarity matrix computed by the proposed method on .

: the score matrix obtained by projecting on .

: the lowrank similarity matrix computed by the proposed method on .

: the score matrix obtained by projecting on .

: the score matrix computed by combining , and .
2.2 Datasets
In this work, we implement the proposed method as well as similaritybased methods on five benchmark and renown datasets namely MATADOR [42], enzyme [29], ion channel [29], Gproteincoupled receptors (GPCR) [29], and nuclear receptors [29]. The manually annotated target and drug online resource (MATADOR) (May 2017) dataset is a free online dataset of chemical and target protein interactions. There are 13 columns in the dataset, however, we utilize only two columns, including Chemical ID and Protein ID, to construct the adjacency matrix. MATADOR dataset has no characteristic information about the drugs and targets. Enzyme, ion channel, GPCR, and nuclear receptors (May 2017) are the drugtarget interaction networks for human beings. The statistics of the five datasets are presented in table 1.
Dataset  Drug  Target  Interaction  Sparsity of 

MATADOR  801  2901  15843  0.007 
enzyme  445  664  2926  0.010 
ion channel  210  204  1476  0.034 
GPCR  223  95  635  0.030 
nuclear receptors  54  26  90  0.061 
2.3 LowRank Matrix Projection
In the realworld problems, many data that are lying on the highdimensional space and full of noise normally contain hidden features which can be seen after they are projected onto the lower dimensional space and simultaneously the noise are subtracted from them [43]. Lowrank matrix has been shown to be a powerful and suitable tool to capture the patterns in high dimensionalspace and noisy data [46, 44, 45]. Therefore, it is deserved to be investigated to solve DTI problem. In this section we assume that only known interactions between drugs and targets are available so we aim at learning the lowrank matrix from this interaction information. First of all, we construct the adjacency matrices of the drugtarget interactions in the five datasets. Mathematically, the adjacency matrix is defined as
(1) 
Then we obtain , where is the number of drugs and is the number of target proteins.
The real data are normally far from perfect, meaning that a portion of the drugtarget interactions in the real data may be incorrect or redundant, and also some other drugtarget interactions may be missing from the observed data. Therefore, the adjacency matrix can be decomposed into two parts. The first part is a linear combination of with the lowrank matrix, which is essentially a projection from the noisy data
into a more refined or informative and lowerdimensional space. The second part can be considered as the noise or the outliers, which is strained off from the original data
and represented by a sparse matrix with most entries being zeros. The method seeks the lowestrank matrix among all the candidates which is further utilized to construct the score matrix that estimates the likelihoods of the potential interactions.
Algorithm 1: Solving problem of Eq. (5) by Inexact ALM  

Input: Given a dataset parameters  
Output: and  
Initialize: ,  
while not converged do  
1.  fix the other and update by  
2.  fix the other and update by  
3.  fix the other and update by  
4.  update the multiplier  
5.  update parameter by  
6.  check the convergence condition  
and  
end while 

The setting of the hyperparameters follows the implementation of
[44]: Since as stated in the literature, they are the optimal ones.
Firstly, we decompose as follows,
(2) 
Obviously, there are infinite solutions of Eq. (2). However, since we wish
to be lowrank, where rank of a matrix is the maximum number of linearly independent column (or row) vectors in the matrix, and
to be sparse, we can enforce the nuclear norm or trace norm on and sparse norm on . Mathematically, Eq. (2) can be thus relaxed as(3) 
where (i.e.,
is the singular values of
), is the noise regularization strategy and is a positive free parameter taking a role to balance the weights of lowrank matrix and sparse matrix. Minimizing the trace norm of a matrix well favors the lowerrank matrix, meanwhile the sparse norm is capable of identifying noise and outliers.Eq. (3) can also be regarded as a generalization of the robust PCA [47, 48] because if the matrix in in the right side of Eq. (3
) is set as identity matrix, then the model is degenerated to the robust PCA. Eq. (
3) can be rewritten into an equivalent problem as,(4) 
Eq. (4) is the constraint and convex optimization problem which can be solved by many offtheself methods, e.g., iterative method (IT) [49], accelerated proximal gradient (APG) [46], dual approach [50], and augmented Lagrange multiplier (ALM) [44]. In this work, we employ Inexact ALM method by firstly converting Eq. (4) to an unconstraint problem, then minimize this problem by utilizing augmented Lagrange function such that
(5) 
where is a penalty parameter and is the trace norm. The Eq. (5) is unconstraint and can be solved by minimizing with respect to and , respectively, by fixing the other variables and then updating the Lagrange multipliers . The detailed illustration of how to solve Eq. (5) is shown in table 2.
We denote the solution of Eq. (5) as , if represents the interaction drug and protein , then . It can be considered as a similarity matrix that describes similarity between proteins. While if represents the interactions between protein and drug (as the transposition of the adjacency matrix in Eq. (2)), then the solution of Eq. (5) is denoted as which describes similarity between drugs. After obtaining these two similarity matrices, we project the adjacency matrix onto these lower dimensional spaces, respectively, as
(6) 
Finally, we combine the two similarity matrices as
(7) 
After obtaining , we remove the known interactions by setting the entries of corresponding to nonzero entries in to zeros and sort the remaining scores in descending order. The drugstarget pairs with highest scores are the most likely unknown interacting pairs. The full process of the proposed method is illustrated in Fig. 1. The algorithm 1 illustrates the detailed procedure of the proposed method LMP.
Algorithm 2: The algorithm of the proposed method  

Input: Given an adjacency matrix  
1.  compute the lowrank similarity matrix and  
sparse noise of Eq. (5) by using Algorithm 1  
2.  compute the similarity matrices and by Eq. (6)  
3.  combine the two similarity matrices as in Eq. (7)  
4.  sort the scores in in descending order  
Output: The highest scores are the most potential interactions 
2.4 Working with Heterogeneous Data
Using only interaction dataset and ignoring the extra characteristic information of the drugs and targets is throwing away the important information. In this subsection, we show how the proposed method is capable of utilizing this characteristic information. Based on the hypothesis that similar drugs interact with similar targets and vice versus, we can utilize the two kinds of characteristic information, namely drug similarity and target similarity to infer the potential interactions.
Drug similarity was computed by using SIMCOMP [40] from the chemical structures of drugs which are obtained from KEGG LIGAND [13]. On the other hand, target similarity is computed by using a normalized SmithWaterman score [41] from GEGG GENES [13]. These two datasets are available online [29]. Directly using these two similarity datasets makes the proposed method perform unsatisfactorily since there exist some noise inside these two datasets. Therefore, we compute the new similarity matrices which are lowrank from these two similarity matrices by the proposed method, then projecting the adjacency matrix onto these lowerdimensional spaces. There are two main properties of this lowrank matrix learning from the characteristic information. Firstly, as discussed above the noise are subtracted. Moreover, the interaction information is projected on the lowerdimensional feature space which is more informative.
The lowrank similarity matrices of the characteristic information and can be computed as shown in Eq. (3) by replacing the adjacency matrix with and , respectively. After obtaining the lowrank similarity matrices of the drug and target denoted as and , we project the adjacency matrix onto them as shown in Eq. (6) and we call them the score matrices denoted as and , respectively. Finally, we combine all these three score matrices which are , and as
(8) 
where , and are the weighting parameters and are set to 0.5, 0.25 and 0.25, respectively. Since the known interactions are experimentally verified, the similarity matrix obtained from this information plays more important role than the other two similarity matrices. The algorithm of the proposed method is illustrated in table 3.
2.5 Predicting the New Drugs and Targets
In case that there are new drugs or targets which do not have any known interactions at all, the proposed method can also be simply extended to predict their interactions. However, we need to use the characteristic information about the new drugs or new targets. On one hand, once we are given a drug with characteristic information, we wish to predict which target proteins that this drug interactions with. On the other hand, we aim at predicting the new drugs based on the known protein target by using the characteristic information of the target proteins. Consider predicting new targets, first of all, one needs to compute the lowrank similarity matrix of the given drug with others based on their characteristic information, i.e., computing . With the assumption that similar drugs interact with similar targets, we can predict the potential interactions based on this similarity matrix, i.e., projecting onto . Similarly, when a new target is given with its biological information, one can compute the lowrank similarity matrix, i.e., , of that protein with the others. The potentially interacted drugs with this protein are those that interact with proteins that are most similar to the given protein.
Dataset  MATADOR  NR  GPCR  ion channel  enzyme  Average  

Metric  AUC  AUPR  AUC  AUPR  AUC  AUPR  AUC  AUPR  AUC  AUPR  AUC  AUPR 
CN  0.930  0.603  0.688  0.249  0.826  0.491  0.915  0.724  0.910  0.678  0.854  0.549 
Katz  0.894  0.393  0.679  0.263  0.800  0.479  0.893  0.707  0.869  0.644  0.827  0.497 
Jaccard  0.933  0.612  0.686  0.249  0.828  0.531  0.920  0.680  0.911  0.704  0.856  0.555 
CJC  0.930  0.609  0.693  0.265  0.828  0.531  0.914  0.731  0.910  0.673  0.855  0.562 
CRA  0.937  0.714  0.696  0.289  0.833  0.565  0.923  0.737  0.912  0.752  0.860  0.612 
LMP  0.946  0.796  0.702  0.276  0.853  0.601  0.941  0.846  0.900  0.766  0.868  0.657 
BLM  –  –  0.694  0.204  0.884  0.464  0.918  0.591  0.928  0.496  0.856  0.439 
LapRLS  –  –  0.855  0.539  0.941  0.640  0.969  0.804  0.962  0.826  0.932  0.702 
NetLapRLS  –  –  0.859  0.563  0.946  0.703  0.977  0.898  0.968  0.874  0.938  0.760 
LMP  –  –  0.863  0.513  0.950  0.706  0.979  0.900  0.973  0.875  0.941  0.747 
2.6 Evaluation and Experimental Settings
We adopt a cross validation technique and two popular metrics to test the proposed method as well as previous benchmarks. We apply the 10fold cross validation [23, 51], which divides the total known interactions between the chemicals and proteins into 10 sets with approximately the same size, and then utilize 9 sets as training data and keep the remaining set as testing data. We repeat it for ten times where each set has one chance to be the testing set. In the simulation, we independently run the 10fold cross validation for five times and report average values accordingly.
We consider the two popular metrics, including the area under the receiver operating characteristic (ROC) curve (AUC) and the area under precision and recall curve (AUPR), to evaluate the performances of the proposed method and the benchmarks. ROC is the diagnostic ability of a binary classifier with regarding to different thresholds
[52], while AUC curve displays true positive rate (sensitivity) versus false positive rate (1specificity) at different values of thresholds. The sensitivity is the percentage of the test samples with ranks higher than a given threshold, whereas, specificity is the percentage the test samples that fall below the threshold. When there are many fewer positive elements in the testing data comparing to the total number in testing data, AUC may give overoptimistic results of the algorithms [30, 54, 55, 53]. Therefore, utilizing only AUC may mislead our conclusion. In such case, AUPR can give better evaluation, especially in biological significance.The simulations are conducted within three manners, e.g., drugtarget pairs, new drugs, and new targets. In the first manner, we divide the total known interactions into 10folds with approximately the same size. On the other hand, in the second manner we divide the total drugs into 10folds. For the last manner, all the targets are divided into 10folds. In each simulation, we use 9 sets as training data and keep the remaining as testing data.
2.7 Baseline Methods
First, we compare LMP with the similaritybased methods, e.g., common neighbor index (CN), Katz index and Jaccard index, Cannistraci resource allocation (CRA) [35] and Cannistraci Jaccard index (CJC) [35]. In this comparison, we use only the known interactions of drug and targets. The methods that utilize both interaction information and characteristic information normally outperform the methods that utilize only the known interactions. However, they cannot work with the dataset that do not have characteristic information such as MATADOR. We further compare LMP with bipartite local learning model (BLM) [30], Laplacian regularized least squares (LapRLS) and Net Laplacian regularized least squares (NetLapRLS) [39] which utilizes both the known interaction information and characteristic information about the drugs and targets. The first group of methods can only predict the interactions between drugs and targets, meanwhile the second group can predict the new drugs given targets or predict the new targets given drugs by using their characteristic information.
Dataset  nuclear receptors  GPCR  ion channel  enzyme  Average  
Metric  AUC  AUPR  AUC  AUPR  AUC  AUPR  AUC  AUPR  AUC  AUPR 
BLM  0.693  0.194  0.829  0.210  0.770  0.167  0.781  0.092  0.768  0.166 
LapRLS  0.820  0.482  0.845  0.397  0.796  0.366  0.800  0.368  0.815  0.403 
NetLapRLS  0.819  0.418  0.834  0.397  0.783  0.343  0.791  0.298  0.807  0.364 
LMP  0.831  0.384  0.854  0.399  0.778  0.353  0.824  0.392  0.822  0.382 
BLM  0.458  0.325  0.627  0.367  0.881  0.641  0.843  0.611  0.702  0.486 
LapRLS  0.563  0.432  0.788  0.508  0.920  0.778  0.914  0.792  0.796  0.628 
NetLapRLS  0.561  0.433  0.787  0.503  0.916  0.762  0.909  0.787  0.793  0.621 
LMP  0.688  0.293  0.847  0.583  0.938  0.790  0.928  0.803  0.850  0.617 
2.8 Parameter Settings
In the proposed method, there is a free parameter that balances the weights of lowrank matrix and sparse matrix as shown in Eq. (3). When is set too large, the sparse norm will compress most of the entries of matrix to zeros, while if is very small, most of the entries of will be small but not zeros. In this work, we obtain the optimal value of the parameter by manually and empirically tuning it and check the accuracy according to each value of . Predicting the interactions based on only the known interaction, we perform the grid search for in which one dimension is corresponding to drug similarity and another one is corresponding to target similarity. Based on the empirical simulation, for predicting the interactions based on only known interactions falls between [0.1,0.25]. When characteristic information is embedded, one needs to tune according to this information, e.g., the parameter falls between [0.1,2]. Moreover, there is only one in each case, eg., predicting the new drugs or targets. We visualize the sensitivity of corresponding to the predicted results, i.e., AUPR, on the new drugs and targets, in Fig. 2.
3 Results
We report the performances of the proposed method and the others that use only the known interaction information in the first part of table 4, meanwhile for the performances of the methods that use together interaction information and characteristic information are shown in the second part of the same table. Consider the first group of the methods. LMP outperforms the other methods on three datasets including MATADOR, GPCR, and ion channel in terms of AUC and AUPR. In term of AUPR, LMP outperforms the others in enzyme, while LMP outperforms only Katz index in enzyme in term of AUC. It is worth noting that for the small matrix or network such as nuclear receptor, the predicted results from all the methods are not stable and they are approximately the same.
When the characteristic information of the drugs and targets are employed, the performances of the LMP remarkably improved as shown in the second part of table 4. Since the characteristic information of the drugs and targets in MATADOR is not available, we either cannot obtain the results of this dataset. LMP outperforms the others on nuclear receptor and GPCR in term of AUC, while NetLapRLS is the best in term of AUPR on nuclear receptor. LMP perform better than the others measured by AUPR in GPCR and ion channel. LMP outperforms the others in term of AUC, while NetLapRLS produces the highest AUPR on enzyme.
The predicted results on the new drugs and target proteins are illustrated in table 5 in the first and second part, respectively. All the methods produce competitive results in predicting the new drugs, while LMP outperform the others in term of AUC and AUPR on GPCR, ion channel and enzyme in predicting new targets.
4 Conclusion and discussion
In this work, we have proposed a matrixbased method, namely lowrank matrix projection (LMP), to solve the DTI prediction problem. It has been shown that LMP overcomes the drawbacks of the machine learning and similaritybased methods. On one hand, LMP can work on datasets that have only known interaction information between the drugs and targets such as MATADOR. On the other hand, LMP can integrate the information about the characteristics of the drugs and targets to improve the predicted results. Moreover, the proposed method can also effectively deal with the new drugs and targets that do not have any known interaction at all by utilizing only some characteristic information of the drugs or targets.
In LMP, the lowrank matrix plays a very important role in making the data homogenous, meanwhile the sparse matrix captures the noise or outliers in the data. By decomposing the original data into a clean (a linear combination of lowrank matrix and the adjacency matrix) and noise (sparse matrix) parts, we can obtain a clean data to predict the interactions between drugs and target proteins. The disadvantage of LMP is that we need to empirically tune and check the accuracy corresponding to each value of . Until now, designing the effective method to estimate the optimal value of this parameter is still an open question. Moreover, LMP may not perform well with small matrix, e.g., nuclear receptor, since the information is very limited for matrix decomposition.
LMP is an alternative, effective and efficient in silico tool for predicting the drugtarget interactions. It can help drug development and drug reposition. In this paper, LMP aims at learning the lowrank similarity matrices from the known drugtarget interaction information and similarity matrices of the drugs and targets. We believe that the proposed method can also be applied to learn other highdimensional biological data such as drug compound, chemical structures, and so on, and we leave these problems to the future work.
References
 [1] J.A. Dimasi “New drug development in the United States from 1963 to 1999,” Clin. Pharmacol. The., vol. 69, n. 6, pp. 286–296, 2001.
 [2] X. Chen, C. C. Yan, X. Zhang, X. Zhang, F. Dai, J. Yin and Y. Zhang, “Drugtarget interaction prediction: databases, web servers and computational models,” Brief Bioinform., vol. 17, no. 4, pp. 696–712, 2015.
 [3] C. Durán, S. Daminelli, J. M. Thomas, V. J. Haupt, M. Schroeder and C. V. Cannistraci, “Pioneering topological methods for networkbased drug–target prediction by exploiting a brainnetwork selforganization theory,” Brief Bioinform., p. bbx041, 2017.
 [4] A.L. Hopkins, “Drug discovery: predicting promiscuity,” Nature, vol. 462, no. 7270, pp. 167–168, 2009.
 [5] D. S. Wishart, C. Knox, A. C. Guo, S. Shrivastava, M. Hassanali, P. Stothard, Z. Chang, and J. Woolsey. “DrugBank: a comprehensive resource for in silico drug discovery and exploration,” Nucleic Acids Res., vol. 34, pp. D668–D672, 2006.
 [6] E. Lounkine, M. J. Keiser, S. Whitebread, D. Mikhailov, J. Hamon, J. Jenkins, P. Lavan, E. Weber, A. K. Doak, Allison and S. Côté, “Largescale prediction and testing of drug activity on sideeffect targets,” Nature. vol. 486, no. 7403, pp. 361–367, 2012.
 [7] E. Pauwels, V. Stoven and Y. Yamanishi, “Predicting drug sideeffect profiles: a chemical fragmentbased approach,” BMC Bioinformatics, vol. 12, no. 1, p. 169, 2011.
 [8] F. Cheng, C. Liu, J. Jiang, W. Lu, W. Li, G. Liu, W. Zhou, J. Huang and Y. Tang, “Prediction of drugtarget interactions and drug repositioning via networkbased inference,” PLoS Comput. Biol., vol. 8, no. 5, p. e1002503, 2012.
 [9] J. T. Dudley, T. Deshpande and A. A. J. Butte, “Exploiting drug–disease relationships for computational drug repositioning,” Brief Bioinform.. vol. 12, no. 4, pp. 303–311, 2011.
 [10] F. Moriaud, S. B. Richard, S. A. Adcock, L. Chanasmartin, J. Surgand, M. Jelloul, B. Marouane and F. Delfaud, “Identify drug repurposing candidates by mining the Protein Data Bank,” Brief Bioinform., vol. 12, no. 4, pp. 336–340, 2011.
 [11] S. J. Swamidass, “Mining smallmolecule screens to repurpose drugs,” Brief Bioinform., vol. 12, no. 4, pp. 327–335, 2011.
 [12] C. M. Dobson, “Chemical space and biology,” Nature, vol. 432, no. 7019, pp. 824–828, 2004.
 [13] M. Kanehisa, S. Goto, M. Hattori, K. F. AokiKinoshita, M. Itoh, S. Kawashima, T. Katayama, M. Araki and M. Hirakawa, “From genomics to chemical genomics: new developments in KEGG,” Nucleic Acids Res.. vol. 34, pp. D354–D357, 2006.
 [14] C. R. Chong and D. J. Sullivan, “New uses for old drugs,” Nature, vol. 448, no. 7154, pp. 645–646, 2007.
 [15] M. L. Macdonald, J. Lamerdin, S. Owens, B. H. Keon, G. K. Bilter, Z. Shang, Z. Huang, H. Yu, J. Dias and T. Minami, “Identifying offtarget effects and hidden phenotypes of drugs in human cells,” Nat. Chem. Biol., vol. 2, no. 6, pp. 329–337, 2006.
 [16] L. Xie, L. Xie, S. L. Kinnings, P. E. Bourne, “Novel computational approaches to polypharmacology as a means to define responses to individual drugs,” Annu. Rev. Pharmacol., vol 52, no. 1, pp. 361–379, 2012.
 [17] A. C. A. Nascimento, R. B. C. Prudêncio and I. G. Costa, “A multiple kernel learning algorithm for drugtarget interaction prediction,” BMC Bioinformatics, vol. 17, no. 1, p. 46, 2016.
 [18] S. J. Haggarty, K. M. Koeller, J. C. Wong, R. A. Butcher, S. L. Schreiber, “Multidimensional chemical genetic analysis of diversityoriented synthesisderived deacetylase inhibitors using cellbased assays,” Chem Biol, vol. 10, no. 5, pp. 383–396, 2003.
 [19] F. G. Kuruvilla, A. F. Shamji, S. M. Sternson, P. J. Hergenrother and S. L. Schreiber, “Dissecting glucose signalling with diversityoriented synthesis and smallmolecule microarrays,” Nature, vol. 416, no. 6881, pp. 653–657, 2002.
 [20] S. Whitebread, J. Hamon, D. Bojanic and L. Urban, “Keynote review: in vitro safety pharmacology profiling: an essential tool for successful drug development,” Drug Discov. Today, vol. 10, no. 21, 1421–1433, 2005.
 [21] T. T. Ashburn and K. B. Thor, “Drug repositioning: identifying and developing new uses for existing drugs,” Nat. Rev. Drug Discov., vol. 3, no. 8, pp. 673–683, 2004.
 [22] Y. Liu, M. Wu, C. Miao, P. Zhao and X. L. Li, “Neighborhood regularized logistic matrix factorization for drugtarget interaction prediction,” PLoS Comput. Biol., vol. 12, no. 2, p. e1004760, 2016.
 [23] H. Ding, I. Takigawa, H. Mamitsuka S. and Zhu, “Similaritybased machine learning methods for predicting drug–target interactions: a brief review,” Brief Bioinform., vol. 15, no. 5, pp. 734–747, 2013.
 [24] I. Halperin, B. Ma, H. Wolfson and R. Nussinov, “Principles of docking: An overview of search algorithms and a guide to scoring functions,” Proteins: Structure, Function, and Bioinformatics, vol. 47, no. 4, pp. 409–443, 2002.
 [25] M. Rarey, B. Kramer, T. Lengauer and G. Klebe, “A fast flexible docking method using an incremental construction algorithm,” J. Mol. Biol., vol. 261, no. 3, pp. 470–489, 1996.
 [26] B. K. Shoichet, I. D. Kuntz and D. L. Bodian, “Molecular docking using shape descriptors,” J. Comput. Chem., vol. 13, no. 3, pp. 380–397, 1992.
 [27] B. Juan and P. Krzysztof, “Proteincoupled receptor drug discovery: implications from the crystal structure of rhodopsin,” Curr. Opin. Drug Discov. Devel., vol. 4 no. 561, pp. 561474, 2001.
 [28] T. Klabunde and G. Hessler, “Drug design strategies for targeting Gproteincoupled receptors,” Chembiochem.. vol. 3, no. 51, pp. 928–944, 2011.
 [29] Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda and M. Kanehisa, “Prediction of drug–target interaction networks from the integration of chemical and genomic spaces,” Bioinformatics, vol. 24, no. 13, pp. i232–i240, 2008.
 [30] K. Bleakley and Y. Yamanishi, “Supervised prediction of drug–target interactions using bipartite local models,” Bioinformatics, vol. 25, no. 18, pp. 2397–2403, 2009.
 [31] N. Nagamine, T. Shirakawa, Y. Minato, K. Torii, H. Kobayashi, M. Imoto and Y. Sakakibara, “Integrating statistical predictions and experimental verifications for enhancing proteinchemical interaction predictions in virtual screening,” PLoS Comput. Biol., vol. 5, no. 6, p. e1000397, 2009.
 [32] N. Nagamine and Y. Sakakibara, “Statistical prediction of protein–chemical interactions based on chemical structure and mass spectrometry data,” Bioinformatics, vol. 23, no. 15, pp. 2004–2012, 2007.
 [33] H. Yabuuchi, N. Satoshi, T. Hiromu, I. Tomomi, H. Takatsugu, H. Takafumi, O. Teppei, M. Yohsuke, T. Gozoh and O. Yasushi, “Analysis of multiple compound–protein interactions reveals novel bioactive molecules,” Mol. Syst. Biol., vol. 7, no. 1, p. 472, 2011.
 [34] R. Pech, D. Hao, L. Pan, H. Cheng and T. Zhou, “Link prediction via matrix completion,” EPL, vol. 117, no. 3, p. 38002, 2007.
 [35] S. Daminelli, J. M. Thomas, C. Durán and C. Cannistraci. “Common neighbours and the localcommunityparadigm for topological link prediction in bipartite networks,” New J. Phys., vol. 17, no. 11, p. 113037, 2015.
 [36] T. Zhou, R. Q. Su, R. R. Liu, L. L. Jiang, B. H. Wang and Y. C. Zhang, “Accurate and diverse recommendations via eliminating redundant correlations,” New J. Phys., vol. 11, no. 12, p. 123008, 2009.
 [37] L. Lü, C. H. Jin and T. Zhou, “Similarity index based on local paths for link prediction of complex networks,” Phys. Rev. E, vol. 80, no. 2, p. 046122, 2009.
 [38] T. Zhou, L. Lü and Y. C. Zhang, “Predicting missing links via local information,” Eur. Phys. J. B., vol. 71, no. 4, pp. 623–630, 2009.
 [39] Z. Xia, LL. Wu, X. Zhou, and S. T. C. Wong, “Semisupervised drugprotein interaction prediction from heterogeneous biological spaces,” BMC Syst. Biol., vol. 4, p.S6, 2010.
 [40] M. Hattori, Y. Okuno, A. G. Susumu and M. Kanehisa, “Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways,” J. Am. Chem. Soc., vol. 125, no. 39, pp. 11853–11865, 2003.
 [41] T.F. Smith and M.S. Waterman, “Identification of common molecular subsequences,” J. Mol. Biol., vol. 147, no. 1, pp. 195–197, 1981.
 [42] S. Günther, M. Kuhn, M. Dunkel, M. Campillos, C. Senger, E. Petsalaki, J. Ahmed, E. G. Urdiales, A. Gewiess and L. J. Jensen, “SuperTarget and Matador: resources for exploring drugtarget relationships,” Nucleic Acids. Res., vol. 36, pp. D919–D922, 2007.
 [43] R. Vidal, “Subspace clustering,” IEEE Signal Processing Magazine, vol. 28, no. 2, pp. 52–68, 2011.
 [44] M. Chen, Z. Lin, Y. Ma and L. Wu, “The augmented lagrange multiplier method for exact recovery of corrupted lowrank matrices,” UIUC Technical Report UILUENG092215, 2010.
 [45] Y. Peng, A. Ganesh, J. Wright, W. Xu and Y. Ma, “RASL: Robust alignment by sparse and lowrank decomposition for linearly correlated images” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2233–2246, 2012.

[46]
C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin and S. Yan, “Robust principal component analysis: Exact recovery of corrupted lowrank matrices via convex optimization,”
Adv. Neural. Inf. Process Syst., pp. 2080–2088, 2009.  [47] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu and Y. Ma, “Robust recovery of subspace structures by lowrank representation,” IEEE Trans. Pattern. Anal. Mach. Intell.. vol. 35, pp. 171–184, 2013.
 [48] G. Liu, Z. Lin and Y. Yu, “Robust subspace segmentation by lowrank representation,” Proceedings of the 27th International Conference on Machine Learning (ICML10). 663–670, 2010.
 [49] J. F. Cai, S. Cand, J. Emmanuel and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM J. Optim., vol. 20, no. 4, pp. 1956–1982, 2010.
 [50] M. Chen, A. Ganesh, Z. Lin, Y. Ma, J. Wright and L. Wu, “Fast convex optimization algorithms for exact recovery of a corrupted lowrank matrix,” Computational Advances in MultiSensor Adaptive Processing (CAMSAP). vol. 61, no. 3, pp. 707722, 2009.
 [51] Y. Lu, Y. Guo and A Korhonen, “Link prediction in drugtarget interactions network using similarity indices,” BMC Bioinformatics. vol. 18, no. 1, p. 39, 2017.
 [52] J. A. Hanley and B. J. Mcneil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, vol. 143, no. 1, pp. 29–36, 1982.
 [53] J. M. Lobo, A. JiménezValverde and R. Real, “AUC: a misleading measure of the performance of predictive distribution models,” Global. Ecol. Biogeogr., vol. 17, no. 2, pp. 145–151, 2008.
 [54] J. Davis and M. Goadrich, “The relationship between PrecisionRecall and ROC curves,” Proceedings of the 23rd International Conference on Machine learning, pp. 233–240, 2006.
 [55] L. Lü, M. Medo, H. Y. Chi, Y. C. Zhang, Z. K. Zhang and T. Zhou, “Recommender systems,” Phys. Rep., vol. 519, no. 1, pp. 1–49, 2012.
Comments
There are no comments yet.