Abstract
Background
The basic RNA secondary structure prediction problem or single sequence folding problem (SSF) was solved 35 years ago by a now wellknown \(O(n^3)\) time dynamic programming method. Recently three methodologies—Valiant, FourRussians, and Sparsification—have been applied to speedup RNA secondary structure prediction. The sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number basepairs L . These sparsity properties satisfy \(0 \le L \le n / 2\) and \(n \le Z \le n^2 / 2\) , and the method reduces the algorithmic running time to O ( LZ ). While the FourRussians method utilizes tabling partial results.
Results
In this paper, we explore three different algorithmic speedups. We first expand the reformulate the single sequence folding FourRussians \(\Theta \left(\frac{n^3}{\log ^2 n}\right)\) time algorithm, to utilize an ondemand lookup table. Second, we create a framework that combines the fastest Sparsification and new fastest ondemand FourRussians methods. This combined method has worstcase running time of \(O(\tilde{L}\tilde{Z})\) , where \(\frac{{L}}{\log n} \le \tilde{L}\le min\left({L},\frac{n}{\log n}\right)\) and \(\frac{{Z}}{\log n}\le \tilde{Z} \le min\left({Z},\frac{n^2}{\log n}\right)\) . Third we update the FourRussians formulation to achieve an ondemand \(O( n^2/ \log ^2n )\) time parallel algorithm. This then leads to an asymptotic speedup of \(O(\tilde{L}\tilde{Z_j})\) where \(\frac{{Z_j}}{\log n}\le \tilde{Z_j} \le min\left({Z_j},\frac{n}{\log n}\right)\) and \(Z_j\) the number of subsequence with the endpoint j belonging to the optimal folding set.
Conclusions
The ondemand formulation not only removes all extraneous computation and allows us to incorporate more realistic scoring schemes, but leads us to take advantage of the sparsity properties. Through asymptotic analysis and empirical testing on the basepair maximization variant and a more biologically informative scoring scheme, we show that this Sparse FourRussians framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by the minimum of the two methods alone.
Keywords
RNA folding Single sequence folding RNA secondary structure Secondary structure prediction FourRussians SparsificationBackground
Noncoding RNA (ncRNA) affects many aspects of gene expression, regulation of epigenetic processes, transcription, splicing, and translation []. It has been observed that in eukaryotic genomes the ncRNA function is more clearly understood from the structure of the molecule, than from sequence alone. While there have been advances in methods that provide structure experimentally, the need for computational prediction has grown as the gap between sequence availability and structure has widened. In general, RNA folding is a hierarchical process in which tertiary structure folds on top of thermodynamically optimalsecondary structure, secondary structure is a key component of structure prediction [].
Efficient \(O(n^3)\) time dynamic programming algorithms were developed more than thirty years ago to find noncrossing secondary structure of a single RNA molecule with n bases [,,,,,]. We call this basic folding or single sequence folding (SSF) problem. In addition, McCaskill [] created an \(O(n^3)\) time algorithm for the partition function for RNA secondary structure. Based on these algorithms, software has been developed and widely used [,,,,]. Probabilistic methods, employing Stochastic contextfree grammar (SFCG), were also developed to solve the basic folding problem [,].
The accuracy of all these methods is based on the parameters given by the scoring function. Thermodynamic parameters [,,,] and statistical parameters [,], or a combination of the two [,] are currently employed.
The Valiant [,], Sparsification [,], and the FourRussians (FR) [,] methods where previously applied to improve on the computation time for secondary structure prediction. For SSF, the Valiant method achieves the asymptotic time bound of \(O\left(\frac{n^3}{2^{\Omega {\log (n)}}}\right)\) by incorporating the current fastest min/maxplus matrix multiplication algorithm [,]. The FourRussians method was applied to single sequence [,], cofolding [] and pseudoknotted [] folding problems. The Sparsification method, was developed to improve computation time in practice for a family of RNA folding problems, while retaining the optimal solution matrix [,,,,,].
Methods
In this paper, we combine the FourRussians method [] and the Sparsification method []. While the former method reduces the algorithm’s asymptotic running time to \(\Theta \left(\frac{n^3}{\log ^2 n}\right)\) , the latter eliminates many redundant computations. To combine these methods, we use an ondemand tabulation (instead of a preprocessing approach which is typically applied in FR algorithms), removing any redundant computation and guaranteeing the combined method is at least as fast as each individual method, and in certain cases even faster. First, we reformulate SSF FourRussians \(\Theta \left(\frac{n^3}{\log ^2 n}\right)\) time algorithm [] to utilizes ondemand lookup table creation. Second, we combine the fastest Sparsification and FourRussians SSF speedup methods. The Sparse Four Russians speedup presented here leads to a practical and asymptotically fastest combinatorial algorithm (even in the worstcase). The new algorithm has an \(O(\tilde{L}\tilde{Z})\) run time where \(\frac{{LZ}}{\log ^2 n}\le \tilde{L}\tilde{Z} \le \min \left( \frac{n^3}{\log ^2 n}, {LZ}\right) \) . In practice, when accounting for every comparison operation the Sparse Four Russians outperforms both the FourRussians and Sparsification methods. Third, we extended the FourRussian SSF algorithm to be computed in \(O(n^2/\log ^2n)\) time. The simulated results for this formulation and O ( n ) processors achieve a practice speedup on the number of comparison operations performed.
Results
Problem definition and basic algorithm
Let \(s = s_0 s_1 \ldots s_{n1}\) be an RNA string of length n over the fourletter alphabet \(\Sigma = \{A,U,C,G\}\) , such that \(s_i \in \Sigma \) for \(0 \le i < n\) . Let \(\varvec{ s_{i,j}}\) denote the substring \(s_i s_{i+1} \ldots s_{j1}\) . We note that for simplicity of exposition substring \(s_{i,j}\) does not contain the nucleotide j . A folding (or a secondary structure ) of s is a set M of position pairs ( k , l ), such that: (1) \(0 \le k< l < n\) ; (2) and there are no two different pairs \((k,l),(k', l') \in M\) such that \(k \le k' \le l \le l'\) (i.e. each position participates in at most one pair, and the pairs are noncrossing).
$$\begin{aligned} L(s, M) = \sum _{(i,j) \in M} {\beta (i,j)}, \end{aligned}$$
where \(\beta (i,j) = 1\) if \((s_i, s_j) \in \{(A, U), (U, A), (C, G), (G, C)\}\) , and \(\beta (i,j) = 0\) otherwise. Richer scoring schemes allow more biologically significant information to be captured by the algorithm. However, the algorithms for solving the problem similar recurrences and other discrete scoring schemes may be accelerated in a similar way to what we present here.$$\begin{aligned} L[i, j] = \max (L^p[i, j], L^c[i, j]),\end{aligned}$$
(1)
$$\begin{aligned} L^p[i, j] = \max _{k \in (i,j)}(L[i, k] + L[k, j]),\end{aligned}$$
(2)
$$\begin{aligned} L^c[i, j] = L[i+1, j1] + \beta (i,j1). \end{aligned}$$
(3)
For completeness, when \(j < i\) , define \(L[i,j] = L^p[i,j] = L^c[i,j] = \infty \) .The above recurrence may be efficiently implemented using a dynamic programming (DP) algorithm. Essentially, the DP algorithm computes and maintains values of the form \(L[i, j], L^p[i, j]\) and \(L^c[i, j]\) for every \(0 \le i \le j \le n\) in three \(n+1 \times n+1\) matrices. The algorithm traverses the matrices in increasing column order index j from 1 to n . Within each column, the cell L [ k , j ] is computed in decreasing index order k from \(j1\) to 0. Once L [ k , j ] is computed, \(L^p[i,j]\) is updated for all \(i<k\) such that \(L^p[i,j]=max(L^p[i,j],L[i,k]+L[k,j])\) . The solution L ( s , M ) is stored in cell L [0, n ]. Clearly, computing \(L^p\) is the bottleneck of the computation, since for a given i , j , there may be \(\Theta (n)\) split points to examine.
Extending the notation and moving towards a vector by vector computation of L
For a matrix A and some integer intervals I , J , denote by A [ I , J ] the submatrix of A obtained by projecting it onto the row interval I and column interval J . When \(I = [i]\) or \(J = [j]\) , we simplify the notation by writing A [ i , J ] or A [ I , j ].
Definition 1
$$\begin{aligned} L^p_K[i,j] \;\;= \;\; L[i, K] \otimes L[K, j]\;\;=\;\;\displaystyle {\max _{k \in K}{(L[i,k] + L[k,j])}} . \end{aligned}$$
For an interval \(I = [i, i + 1, \ldots i']\) , define \(L^p_K[I, j]\) to be the vector such that$$\begin{aligned} L^p_K[I, j] \;\;=\;\; L[I, K] \otimes L[K, j] \;\;= \;\; \left[ L^P_K[i,j]\; \text {for all }\;{i\in I}\right] \end{aligned}$$
We divide the solution matrix L in two ways: \(q \times q\) submatrices (Fig.) and size q sub column vectors (the value of q will be determined later). Let \(\varvec{K_g}\) be the g th interval such that \(K_g=\{q \cdot g , q \cdot g+1, \ldots , q \cdot g+q1\}\) . We call these sets Kgroups , and use \(K_g\) as the interval starting at index \(g\cdot q\) . For an index i , define \(\varvec{g_i} = \left\lfloor \frac{i}{q}\right\rfloor \) . It is clear that \(i \in K_{g_i}\) .
Similarly, we break up the row indices into groups of size q , denoted by \(\varvec{I_g}\) where \(I_g=\{k= q \cdot g , k+1, ...k+q1\}\) . (Clearly, row index set \(I_g\) is equivalent to the Kgroup \(K_g\) . We only introduce this extra notation for simplicity of the exposition).
$$\begin{aligned} \begin{array}{ll} L^p[i,j]&=\displaystyle \max _{g_i \le g \le g_j} L^p_{K_g}[i,j] \end{array} \end{aligned}$$
(4)
We extend the notation further, to compute the matrix \(L^p\) not cell by cell but instead by vectors of size q corresponding to the \(I_{g'}\) row sets, as follows.$$\begin{aligned} \begin{array}{ll} L^p[I_{g'},j] =&\displaystyle \max _{g' \le g \le g_j} L^p_{K_g}[I_{g'},j]. \end{array} \end{aligned}$$
(5)
The DP algorithm can be updated to incorporate the extended notation. Within each column, compute the matrices in vectors of size q . Once \(L[K_g,j]\) is computed it is used in computation of \(L^p_{K_g}[I_{g'},j]\) for \(g'<g\) . When computing \(L^p_{K_{g'}}[I_{g'},j]\) we follow Eq.–to complete the computation of cells \(L[I_{g'},j]\) .Sparsification of the SSF algorithm
The Sparsification method achieves a speedup by reducing the number of split points examined during the computation of \(L^P[i,j]\) . As Fig.shows the focus of Sparsified Four Russians algorithm will narrow down only on those submatrices whose split points are stepoct for a particular \(i,j\) [,].
OCT and STEP subinstances of sequence s
Subinstance \(s_{i,j}\) is optimally coterminus ( OCT ) if every optimal folding of \(s_{i,j}\) is coterminus. We introduce the extra notation below
if \(L[i,j]=L^c[i,j]>L^p[i,j]\) then we say L [ i , j ] is OCT .
Subinstance \(s_{i,j}\) is STEP , if \(L[i,j]>L[i+1,j]\) where \(L[i,j]=L(s_{i,j})\) and \(L[i+1,j]=L(s_{i+1,j})\) . For ease of exposition we also say L [ i , j ] is STEP when \(s_{i,j}\) is STEP . A STEP subinstance \(s_{i,j}\) implies that nucleotide i is paired in every optimal folding of \(s_{i,j}\) .
Fact 1
For every subinstance \(s_{i,j}\) with \(j>i\) there is an optimal split point \(k\in (i,j)\) such that either \(k=i+1\) or L [ i , k ] is STEP and L [ k , j ] is OCT [].
Notation:For the index set \(K=\{k,k+1, \ldots k'\}\) and column j , let \(\varvec{K^{oct_j}}\) be the set of indices such that \(K^{oct_j}\subset K\) and \(\forall _{ k \in K^{oct_j}} \;\; L[k,j]\) is OCT . Given the row interval \(I=\{i,i+1, \ldots i'\}\) , let \(I^{step_k}\) be the set of rows such that \(I^{step_k} \subset I\) , and for all \({i \in I^{step_k}}\) L [ i , k ] is STEP .
$$\begin{aligned} \begin{array}{ll} L^p[I_{g'},j]&=\displaystyle \max _{g' \le g \le g_j} L^p_{K_g}[I_{g'},j] =\displaystyle \max _{g' \le g \le g_j} L[I_{g'},K_g]\otimes _{{stepoct}} L[K_g,j]. \end{array} \end{aligned}$$
(6)
Note Eq.does not explicitly show that for \(L_{K_{g'}}^P[I_{g'},j]\) the splitpoint \(i+1\) must be examined for every \(i\in I_{g'}\) .Asymptotic time bound of sparsified SSF When computing matrix \(L^p[i,j]\) we examine value L [ i , k ] only if L [ k , j ] is OCT . Let Z , be the total number of subinstances in s or cells in matrix L that are OCT . Given that L [ k , j ] is OCT , \(L^p[i,j]\) must examine the split point k , for all \(i \in \{0,1, \ldots k\}\) such that L [ i , k ] is STEP . Let \(\varvec{{L}}\) be the total number of STEP subinstances in column k . More precisely \({L}=\{0,1, \ldots k\}^{step_k}\) (Creating the list of splitpoints that correspond to STEP incidence requires no additional computation time []). The total time to compute SSF when examining only STEP , OCT combinations (Sparsification method), is O ( LZ ). As shown in Backofen et al. [] Z is bounded by \(Z\le n^2\) and L is bounded by \({L} \le \frac{n}{2}\) . The overall asymptotic time bound of the Sparsification method is O ( LZ ) remains \(O(n^3)\) .
Ondemand Four Russians speedup
Presented here is an ondemand version of the \(\Omega (\log ^2 n)\) time FourRussians algorithm implied by Pinhas et al. [].
Observation 1
The scores stored in L [ k , j ] and \(L[k+1,j]\) differ by the effect of adding only one more nucleotide (i.e., \(s_k\) ). Therefore , \(L[k,j]L[k+1,j]\) belongs to a finite set of differences \(\mathbb {D}\) , where \(\mathbb {D}\) is the set of scores created as the result of the scoring scheme \(\beta \) . The cardinality of the set of differences , \(D=\mathbb {D}\) , is O (1) when \(\beta \) is discrete. For the simple \(\beta \) scoring function (+1 for every permitted pair, and 0 otherwise), the set \(\mathbb {D}\) is equal to \(\{0,1\}\) and therefore \(\mathbb {D}=2\) [].
Let \(\vec {x} = [x_0, x_1, \ldots , x_{q1}]\) be an integer vector of length q . We say that \(\vec {x}\) is Ddiscrete if \(\forall _{ l \in (0,q)} x_{l1}  x_{l} \in \mathbb {D}\) . We define the \(\Delta \) encoding of 2discrete vector \(\vec {x}\) to be a pair of integers \((x_0, \Delta _{{x}})\) such that \(x_0\) is the first element in \(\vec {x}\) and \(\Delta _{{x}}\) is the integer representation of the binary vector \([x_0x_1, x_1x_2, \ldots , x_{q2}  x_{q1}]\) . Note that \(0 \le \Delta _{{x}} < 2^{q1}\) . For simplicity, we will interchangeably use \(\vec {x}\) to imply either \((x_0, \Delta _{x})\) or \([x_0, x_1, \ldots , x_{q1}]\) . Clearly, \(\Delta \)  encoding takes O ( q ) time to compute.

Let \( (x_0,\Delta _{\vec {x}})+ c= (x_0+c,\Delta _{\vec {x}})\) be equivalent to \(\vec {x}+c =[x_0+c, x_1+c, \ldots , x_{q1}+c]\) .

Let \(B\otimes (x_0,\Delta _{x})\) be equivalent to \(B\otimes \vec {x}\) .

Let \(\max ((x_0,\Delta _x),(y_0,\Delta _y))\) be equivalent to \(\max (\vec {x},\vec {y})\) .
MUL lookup table
Based on Observation, any column vector in matrix L is 2discrete. Given vector \(L[K_g,j]\) and its \(\Delta \) encoding ( \(x_0=L[gq,j]\) , \(\Delta _x= \Delta _{L[K_g,j]}\) ), it is clear that \(\Delta _x \in [0,2^q1]\) .
Fact 2
\(L[I_{g'},K_g] \otimes L[K_g,j] \text { is equivalent to }L[I_{g'},K_g] \otimes (0,\Delta _{L[K_g,j]}) + L[gq,j]\) [].
$$\begin{aligned} L^p_{K_g}[I_{g'},j]=L[I_{g'},K_g]\otimes L[K_g,j] = MUL_{L[I_{g'},K_g]}[\Delta _{L[K_g,j]}]+L[gq,j]. \end{aligned}$$
(7)
Equation, abstracts the detail that we still have to compute each referenced entry in the MUL lookup table. Each entry in the MUL lookup table is computed ondemand i.e. only when it corresponds to a required calculation. (This removes any extraneous calculation incurred when preprocessing all possible entries as in the typical FourRussians implementation.) If entry \(MUL_{L[I_{g'},K_g]}[\Delta _{L[K_g,j]}]\) does not exist we compute \(L[I_{g'},K_g]\otimes (0,\Delta _{L[K_g,j]})\) directly in \(O(q^2)\) time. If entry \(MUL_{L[I_{g'},K_g]}[\Delta _{L[K_g,j]}]\) exists then the operation is O (1)time lookup.There are \(O\left(\frac{n^2}{q^2}\right)\) submatrices within L . For each submatrix the maximum number of entries we compute for lookup table MUL is \(2^{q1}\) . In total, the asymptotic time bound to populate lookup table MUL is \(O\left(\frac{n^2}{q^2}\cdot 2^{q1}\cdot q^2)=O(n^2 \cdot 2^q\right)\) .
MAX lookup table
Let the max of two 2discrete q size vectors \(\vec {v}\) and \(\vec {w}\) , denoted \(max(\vec {v},\vec {w})\) , result in a q size vector \(\vec {z}\) , where \(\forall _{0\le k < q}\, z_k=\max (v_k,w_k)\) . Without loss of generality, let \(w_0 \ge v_0\) . Comparing the first element in each vector there are two possibilities either (1) \(w_0v_0>q1\) or (2) \(w_0v_0\le q1\) . In the first case, ( \(w_0v_0>q1\) ), it is clear that \(\max (\vec {v},\vec {w})\) is equal to \(\vec {w}\) . In the second case, we make use of the following fact [].
Fact 3
Given two vectors \((w_0,\Delta _w)\) and \((v_0,\Delta _v)\) , if \(w_0v_0\le q1\) then \(\max (\vec {v},\vec {w})= \max \left( (0,\Delta _v),(w_0v_0,\Delta _w)\right) +v_0\) .
Lets define lookup table MAX such that entry
$$\begin{aligned} \max (\vec {v},\vec {w})=MAX[\Delta {v_0},\Delta {w_0},(w_0v_0)]+v_0 \end{aligned}$$
We summarize these results in the function \(\Delta \) max:$$\begin{aligned} L^p[I_{g'},j] = \varvec{\Delta }\!\!\!\!\max _{g' \le g \le g_j}\left( MUL_{L[I_{g'},K_g]} \left[ \Delta _{L[K_g,j]} \right] +L[gq,j] \right) \end{aligned}$$
(8)
The matrix \(L^p\) and hence L is solved by a total of \(O\left(\frac{n^2}{q}\right)\) computations of Eq.. In total, given lookup table MUL and \(M\!A\!X\) , the time to compute the FourRussians SSF is \(O\left(\underbrace{\frac{n^3}{q^2}}_{computation}+\underbrace{{2^{2q}}q+{n^2}{2^q}}_{\text {{ ondemand} lookup table }}\right)\) .Setting \(q=\epsilon \log n\) , where \(\epsilon \in (0,.5)\) [], the total computation time is equal to \(\Theta (\frac{n^3}{\log ^2 n})\) , which achieves a speedup by a factor of \(\Omega {(\log ^2 n)}\) , compared to the original \(O(n^3)\) time solution method.
Extending to D discrete vectors
$$\begin{aligned} O\left(\underbrace{\frac{n^3}{q^2}}_{computation}+\underbrace{{D^{2q}}q+{n^2}{D^q}}_{\text {{ ondemand} lookup table }}\right). \end{aligned}$$
Setting \(q=\epsilon \log _D n\) , where \(\epsilon \in (0,.5)\) [], the total computation time is equal to \(\Theta \left(\frac{n^3}{\log ^2 n}\right)\) , which achieves a speedup by a factor of \(\Omega {(\log ^2 n)}\) , compared to the original \(O(n^3)\) time solution method.Sparse FourRussian method
With the FourRussians method, a speedup is gained by reducing q split point index comparisons for q subsequences to a single O (1) time lookup. The Sparsification method reduces the comparison to only those indices which correspond to STEP – OCT folds.
STEP–OCT condition for sets of split points
In this section, we achieve a Sparsified FourRussian speedup for the computation of the \(L^p\) matrix. As in the Four Russians method, we will conceptually break up the solution matrix L in two ways: in \(q \times q\) size submatrices, and q size subcolumn vectors. The submatrices are indexed by \(g'\) and g such that the corresponding submatrix is \(L[I_{g'},K_g]\) . The subcolumn vectors are indexed by g and j , such that the corresponding subcolumn vector is \(L[K_g,j]\) .
We augment the FourRussians SSF to reduce the number of entries, and lookups into the MUL table. If and only if, the matrix \(L[I_{g'},K_g]\) contains at least one cell L [ i , k ] that is STEP and within vector \(L[K_g,j]\) the cell L [ k , j ] is OCT we will lookup \(MUL_{L[I_{g'},K_g]}[\Delta _{L[K_g,j]}]\) . If such an entry does not exist we will compute \(L[I_{g'},K_g]\otimes (0,\Delta _{L[K_g,j]})\) and store the result into lookup table MUL .
The following notation will be used to help determine if a split point Kgroup should be examined in the computation.
OCT subcolumn vector
Given the vector \(L[K_g,j]\) let \(\vec {m}\) be a q size binary vector such that \(\forall _{0\le x \le q1} m[x]=1\) if \(L[gq+x,j]\) is OCT . Let the sigOct of the vector \(L[K_g,j]\) , written \(sigOct(L[K_g,j])\) , be equal to m the integer representation of the binary vector \(\vec {m}\) . Clearly \(0 \le m < 2^q\) , and if and compute the dot product in \(m>0\) then \(L[K_g,j]\) contains at least one OCT instance. Let \(O(\tilde{Z})\) be the total number of subcolumn vectors which contain an instance that is OCT . Clearly, \( \frac{{Z}}{q} \le \tilde{Z} \le \min \left(\frac{n^2}{q},Z\right)\) .
STEP submatrix
Given the submatrix \(L[I_{g'},K_g]\) , let \(\vec {m'}\) be a q size binary vector such that \(\forall _{ x\in [0,q) }m'[x]=1\) if \(\exists _{0 \le i \le q1}\) \(L[qg'+i,qg+x]\) is STEP . Let sigStep of a submatrix, written \(sigStep(L[I_{g'},K_g])\) , be equal to \(m'\) the integer representation of the binary vector \(\vec {m'}\) . Clearly \(0\le m' < 2^q\) . Let \(\tilde{L}\) be the total number of submatrices which contain an instance that is STEP within \(L[[0,n],K_g]\) . Clearly, \( \frac{{L}}{q} \le \tilde{L} \le \min (\frac{n}{q},L) \) .
Observation 2
Suppose that , \(s_{i,k}\) is STEP , and integer
\(m' = sigStep(L[I_{g'},K_g])\) such that \(i \in I_{g'}\) (or \(I_{g'}=I_{g_i}\) ) and \(k \in K_g\) (or \(K_g=K_{g_k}\) ). Then, the corresponding binary vector \(\vec {m'}\) must be set to 1 in position x where x is an index such that \(k=qg+x\) . More precisely, if L [ i , k ] is STEP then \(m'[x]=1\) by the definition of sigStep .
Observation 3
Suppose \(s_{k,j}\) is OCT , and suppose integer
\(m=sigOct(L[K_g,j])\) such that \(k \in K_g\) . Then, the corresponding binary vector \(\vec {m}\) must be set to 1 in position x, where x is an index such that \(k=qg+x\) . More precisely, if \(s_{k,j}\) is OCT then m [ x ] = 1 by the definition of sigOct .
Given two binary vectors v and w the dot product of their integer representation is equal to a binary number x such that \(x=v \odot w= v_0 \wedge w_0 \vee v_1 \wedge w_1 \vee ...\vee v_{q1} \wedge w_q\) where \(v=w=q1\)
Theorem 1
For any subinstance \(s_{i,j}\) either \(i+1\) is the optimal split point, or there is an optimal split point \(k \in (i,j)\) , such that \(sigStep(L[I_{g_i},K_{g_k}]) \odot sigOct(L[K_{g_k},j])\) equals 1 .
Proof
Based on Factfor any subinstance \(s_{i,j}\) there is an optimal split point k such that either \(k=i+1\) or \(s_{i,k}\) is STEP and \(s_{k,j}\) is OCT . If \(s_{i,k}\) is STEP and \(s_{k,j}\) is OCT then L [ i , k ] is STEP and L [ k , j ] is OCT . The cell L [ i , k ] belongs to submatrix \(L[I_{g_i},K_{g_k}]\) and the cell L [ k , j ] belongs to the vector \(L[K_{g_k},j]\) . Let x be an index such that \(k=qg_k+x\) . Let \(\vec {m'}\) be a binary vector that corresponds to \(sigStep(L[I_{g_i},K_{g_k}])\) . Based on Observation, \(m'[x]\) must equal 1. Let \(\vec {m}\) be the binary vector that corresponds to \(sigOct(L[K_{g_k},j])\) . Based on Observation, m [ x ] equals 1. Therefore, \(m[x]\wedge m'[x]=1\) and \(sigStep(L[I_{g_i},K_g])\odot sigOct(L[K_g,j])= 1\) . \(\square \)
Notation:The index g is STEP – OCT if given the set of rows \(I_{g'}\) and the column j if \(sigStep(\;L[I_{g'},K_g]\;) \varvec{\odot } sigOct(\;L[K_g,j]\;)=1\) .
$$\begin{aligned} L^p[I_{g'},j] = \Delta \!\!\!\!\!\!\!\!\!\!\!\displaystyle \max _{\begin{array}{c} g \text { is } S\!T\!E\!P\!\!O\!C\!T \\ \text {where } g \in [g',g_j] \end{array}} \left( MUL_{L[I_{g'},K_g]} \left[ \Delta _{L[K_g,j]}\right] + L[gq,j]\right) \end{aligned}$$
(9)
We update the DP algorithm to only access the MUL lookup table for matrix and vector combinations that satisfy the property\(sigStep(\;L[I_{g'},K_g]\;) \varvec{\odot } sigOct(\;L[K_g,j]\;)=1\) .
$$\begin{aligned} \text {if } I_{g'} \in G[g][m] \text { then } sigStep(L[I_{g'},K_g]) \varvec{\odot } m=1 . \end{aligned}$$
Lookup table G (updated ondemand ) allows us to implement Eq.. As \(L[K_g,j]\) is computed, the corresponding SigOct is also computed. Let \(m=sigOct(L[K_g,j])\) . By iterating through \(I_{g'}\in G[g][m]\) set of row indices we access table MUL only when both of the following conditions hold at the same time: the submatrix \(L[I_{g'},K_g]\) contains at least one cell L [ i , k ] where \(s_{i,k}\) is STEP and within vector \(L[K_g,j]\) the cell L [ k , j ] contains \(s_{k,j}\) that is OCT (where \(i\in I_{g'}\) and \(k \in K_g\) ).Discussion
Asymptotic analysis of sparsified FourRussians.
We assume O (1)time RAM access for \(\log (n)\) bits. The calculation for column j can be broken down into \(L^P_{K=[qg_j,j)}[i,j]\) and \(L^P_{K=[0,qg_j)}[i,j]\) for all \(i<j\) . The computation of
\(L^P_{[qg_j,j)}[[0,n],j]\) occurs when Kgroup \(K_{g_j}\) is not full, and follows the Sparsification algorithm maximizing over STEP – OCT split points only. This reduces the comparisons made from \(O(n\cdot q)\) to \(O({L}\tilde{q})\) where \(\tilde{q}<q\) is the total number OCT instances within the interval \([qg_,j)\) . The computation of \(L^P_{[0,qg_j)}[[0,n],j]\) employs Sparsified Four Russians speedup. The MUL table entries are created and references only for STEP – OCT submatrix vector combinations. This reduces the comparisons made to \(O(\tilde{L}\tilde{Z})\) .
$$\begin{aligned} O(\tilde{L}\tilde{Z})+O\left(\frac{n^2}{q}\cdot \tilde{x}\right)+O\left(\text {updating lookup tables ondemand}\right)=O(\tilde{L}\tilde{Z}) \end{aligned}$$
Asymptotic analysis of ondemand lookup tables calculation
We compute the lookup tables G , MUL , and \(M\!A\!X\) ondemand . For each vector \(L[K_g,j]\) containing an OCT instance (where \(m=sigOct(L[K_g,j])\) ), if G [ g ][ m ] does not exist then we directly compute it. For the computation of a single entry into lookup table G , we iterate through \(O(\tilde{L})\) submatrices and compute the dot product in O ( q ) time.In total, an update is called to lookup table G at most \(O(\tilde{C}=min(2^q,\tilde{Z}))\) times. The entire G lookup table ondemand computation takes \(O(\text {ondemand} G)=O(\tilde{L}\tilde{C}\cdot q)\) or \(\varvec{O(G)} \le O\left( \min (\tilde{L} 2^q,\tilde{L}\tilde{Z})\cdot q \right)\le O\left(min\left(\frac{n2^q}{q},\frac{{LZ}}{q}\right)\right)\) .
For each vector containing an OCT instance if an entry doesn’t exist in the lookup table MUL it is computed ondemand. Each entry takes \(O(\tilde{L}\cdot q^2)\) time to compute. There are \(min(2^q,\tilde{Z)}\) such computation. In total, lookup table MUL takes \(O(\tilde{L}q^2\cdot min(2^q,\tilde{Z}))\) time. Setting \(q=\epsilon \log {n}\) where \(\epsilon \in (0,.5)\) the asymptotic runtime for ondemand computation is \(O(\tilde{L}\tilde{Z})\) .
The entire algorithm takes \(O(\tilde{L}\tilde{Z})\) where \(\frac{{LZ}}{\log ^2 n}\le \tilde{L}\tilde{Z} \le \min \left(\frac{n^3}{\log ^2 n}, {LZ}\right)\) .
Empirical results
We tested 20 randomly generated sequences for each size \(N=64,128,256,512\) .
Table 1
Number of all comparisons computed
Size 
\(O(n^3)\) 
FR 
SP 
SFR 

64 
43,680 
12,014 
2733 
1837 
128 
349,504 
49,456 
13,196 
9982 
256 
2,796,160 
346,692 
79,544 
41,393 
512 
22,500,863 
5,746,853 
650,691 
503,425 
For \(N=128\) the Sparse FourRussians(SFR) algorithm performs 25 % less comparisons than the Sparsified(SP) SSF algorithm and 80 % less comparison than the FourRussians (FR) algorithm. In all test cases, the Sparse FourRussians performed better than the minimum of either method alone.
An \(O(n^2/\log ^2(n))\) simple parallel FourRussians RNA folding algorithm
Lets solve the recurrence relation (Eq.–) in increasing index j order and then move up the column j computing one cell at a time in decreasing i order. Each cell L [ i , j ] is solved by calculating Eq.–for all \(i<k\le j\) .
Given this j , i , k order, let us reformulate the computation by moving up each column in O ( n / q ) q size subcolumn vectors instead of n cells.
Utilizing n processors
Invariant 1
Given \(g_i\) and \(g_j\) \(\forall _{i\in I_{g_i}} \forall _{k \in K_g} L[i,k]=L(s_{i,k})\) . In other words, submatrix \(L[I_{g_i},K_g]\) is computed. Similarly \(L[K_g,j]\) is computed or \(\forall _{k \in K_g} L[k,j]=L(s_{k,j})\) .
Replacing the \( \max (L^p[I_{g_i},j], L[I_{g_i}, K_g])\otimes L[K_g,j])\) computation with lookups into MUL and MAX tables would reduces the runtime for finding the solution matrix L to \(O(n^2/log^2n)\) . As stated in " Extending to Ddiscrete vectors " section it is possible to create lookup tables ondemand and achieve a reduction in computation time of \(\Omega (log^2 n)\) factor.
The preprocessing can also be achieve in parallel reducing the asymptotic cost form \(O(n^3/\log ^2 n)\) to \(O(n^2/\log ^2 n)\) . If entry \(MUL_{L[I_{g_i},K_g]}[\Delta _{L[K_g,j]}]\) does not exist we compute \(L[I_{g_i},K_g]\otimes (0,\Delta _{L[K_g,j]})\) directly in \(O(q^2)\) .
There are \(O\left(\frac{n^2}{q^2}\right)\) submatrices within L . For each submatrix the maximum number of entries we compute for lookup table MUL is \(D^{q1}\) . However, in each iteration at worse O ( n ) of the entries are computed simultaneously. In total, the asymptotic time bound to populate lookup table MUL is \(O\left(\displaystyle \frac{{\frac{n^2}{q^2}\cdot D^{q1}\cdot q^2} }{n} \right)=O\left(\frac{n^2 \cdot D^q}{n}\right)=O(n \cdot D^q)\) .
Parallel sparisified FourRussians single sequence folding algorithm
Let \(Z_x\) be the number of OCT cells in column x . Let \(\forall _{x\in [0,n]}Z_j \ge Z_x\) .
The parallel algorithm would take as long as would take as a it takes for the last processor to complete.
In order to extend the parallel FourRussians single sequence folding algorithm to utilize the Sparsification speedup we will limit the call to MUL table only if \(sigSTEP(L[I_{g_i},K_g]) \odot sigOCT(L[K_g,j]) >0\) . As result given \(Z_j\) the total time to compute for processor j is \(O(\tilde{L}\tilde{Z_j})\) where \(\frac{{Z_j}}{\log n}\le \tilde{Z_j} \le min \left({Z_j},\frac{n}{\log n}\right)\) .
Conclusion
This work combines the asymptotic speedup of FourRussians with the very practical speedup of Sparsification. The ondemand formulation of the FourRussians not only removes all extraneous computation. This approach allows the FourRussians SSF to achieve a speedup in practice for realistic scoring schemes. This also leads us to take advantage of the sparsity properties. Through asymptotic analysis and empirical testing on the basepair maximization variant and a more biologically informative scoring scheme, we show that the Sparse FourRussians framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by the minimum of the two methods alone. We also showed that through some reorganization we could apply the FourRussians speedup to parallel algorithm and achieve and asymptotic time of \(O(n^2/\log ^2 n)\) . The algorithm created here can be implemented in CUDA to compute on multiprocessor GPUs. Because the algorithm allows for memory cell independence one can apply memory and cache optimization without affecting the algorithm. The utility in this framework lies not only on its ability to speedup single sequence folding but its ability to speedup the family of RNA folding problems for which both Sparsification and FourRussians have bene applied separately.
Future work in this area would be to examine the ability to sparsify memory [], as FourRussians at worst case requires an additional factor of \(2^{log(n)}\) in memory. Another open question is wether it is possible to apply the \(\Omega (\log ^3 n)\) [] speedup of boolean matrix multiplication to RNA folding.
Footnotes
Or close to optimal.
Using some word tricks the dot product could be computed in O (1)time.
Declarations
Authors’ contributions
YF contributed to the conception and analysis of the framework. YF and DG jointly contributed to the design and interpretation of the framework, and jointly contributed to the writing and editing of the manuscript. Implementation and testing was done by YF. All authors read and approved the final manuscript.
Acknowledgements
We would like to sincerely thank Shay Zakov and Michal ZivUkelson for their their many helpful comments and suggestions. This research was partially supported by the IIS1219278 Grant. uld like to sincerely thank Shay Zakov and Michal ZivUkelson for their their many helpful comments and suggestions. This research was partially supported by the IIS1219278 Grant.
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.