# Research on recovery strategy in embedded real-time main memory databases

- Tan Yonghong
^{1}Email author and - Yin Xiangdong
^{2}

**2016**:9

https://doi.org/10.1186/s13639-016-0030-1

© Yonghong and Xiangdong. 2016

**Received: **22 December 2015

**Accepted: **20 April 2016

**Published: **4 May 2016

## Abstract

In order to recover data from embedded real-time main memory databases effectively and efficiently, this paper proposes a real-time log-based recovery approach. With respect to the real-time requirement in embedded systems, we classify the consistency in real-time main memory databases into data and transaction consistencies, analyze them theoretically, design rules for correct recovery strategy, and propose real-time log-based recover algorithms for different types of transactions. The experiments show that the proposed approach is more effective and efficient than methods in both traditional and eXtremeDB database systems.

## Keywords

## 1 Review

With the development of embedded systems, the application of databases in embedded systems [1] is a hotspot in both industry and academia. Embedded systems work in an environment without manual intervention, so when a fault occurs in these systems, they need to diagnose the fault and recover it automatically all by themselves [2]. The main memory databases [3, 4] can reduce the I/O operations greatly while running, and satisfy the real-time requirement of embedded systems, so the databases implemented in embedded systems usually work in the main memory.

In real-time main memory databases [5–7], the main copy of database works in the volatile RAM, and the data is very vulnerable, so the recovery is necessary. Meantime, the I/O operations in real-time main memory databases are few, and recovery is the only part that affects the I/O performance, so the performance of recovery is critical for real-time main memory databases [8, 9]. While recovering from a fault, real-time main memory databases need to satisfy multiple constraints [10, 11], and this pose a huge challenge for designing reasonable recovering strategies.

Checkpoint or memory snapshots [12, 13] is a commonly used program recovery strategy, but the overhead of storing states of running program is very high, and it is not suitable for embedded applications. In addition, the logs in embedded systems record the behaviors of embedded systems, and researchers use different logs to design different recovery strategies, such as partition log [14], real-time log [15], remote log [16], and operation log [17]. However, these strategies only take the requirement of real-time into consideration, and ignore other specific requirements in embedded systems, so they cannot be applied to the embedded environment efficiently. In addition, the method proposed in [13], studied the recovery strategy in main-memory, but the method is based on virtual memory snapshots. In order to improve real-time ability, Levy and Silberschatz [18] proposed an incremental recovery strategy in main-memory database.

In this paper, we analyze the consistency constraints in embedded real-time main memory databases from the perspectives of both data and transaction. Then we design some rules that an efficient recovery strategy must obey in embedded real-time main memory databases. Finally, we propose corresponding recovery algorithms for different tasks of embedded real-time main memory databases.

## 2 Analysis of consistency in embedded real-time main memory databases

In this section, we analyze the consistency of embedded real-time main memory databases from the perspective of both data and transaction.

### 2.1 Data consistency

The embedded real-time main memory databases include three types of data, i.e., image objects, derived objects, and invariant objects.

The objects of real world are sensed by sensors, and their values are written into the databases. The values written into the databases are image objects. An image object is an image of a real world object at some instant, and each image object has its own sampling timestamp and external validity interval.

A derived object is calculated out by a group of image objects during a transaction processing. The timestamp of a derived object is the instant when the transaction is finished, and the validity interval is the intersection of all validity intervals of image objects in the group.

An invariant object is a constant which is invariant as time goes by. The validity of an invariant object is not affected by time, so it is also called non-time series data object.

As there is a validity interval for each image object and derived object, both of them are time series data objects. The sampling time and computing time of time series data are validate only in an interval starting from the system’s current time.

**Definition 1.** If *VI*(*X*) is far less than *AT*(*X*), i.e., *VI*(*X*) < < *AT*(*X*), then *X* is short time-limited data*.*

The data consistency of embedded real-time systems includes internal consistency, external consistency, and mutual consistency.

**Definition 2.**
*X* is internal consistent, *if and only if* it satisfies the predefined integrity and consistency of traditional database systems.

Here, the internal consistency is the internal consistency in traditional database systems, and it only refers to the internal world of database systems.

**Definition 3**. *X* is external consistent, *if and only if* it satisfies *t* ≤ *ST*(*X*) + *VI*(*X*).

The external consistency requires that the sampling data in a database lag the real world within a certain time.

**Definition 4**. A group of related data used for decision or deriving new data is a mutual consistent set *R*, and each *R* is related to a corresponding mutual validity internal *R*
_{mvi}.

**Definition 5**. Let *R* = {*X*
_{1}, *X*
_{2}, …, *X*
_{
n
}}, then *R* is mutual consistent, if and only if ∀ *X*
_{
i
} ∈ *R*, ∀ *X*
_{
j
} ∈ *R* and *k* ≠ *i*, such that |*ST*(*X*
_{
i
}) − *ST*(*X*
_{
j
})| ≤ *R*
_{mvi}.

If *R* is used to generate new data, then the mutual consistency is used to assure the values in *R* are generated within the common validity interval.

### 2.2 Transaction consistency

The embedded real-time main memory database systems interact with real world according to two behaviors. The first one is recording the states and events of the real world into the databases, and the second one is doing some acts to affect the real world. The embedded real-time transactions can be classified into data receiving transactions, data processing transactions and manipulating transactions.

Data receiving transactions sample the external environment periodically and write it into the databases. This kind of transaction generates an image object in one period, and it is a read-only and non-blocking hard real-time transaction.

Data processing transactions do read-only operations to image objects periodically or non-periodically, and read and write deriving objects or invariant objects. This kind of transaction does not interact with the real world, and is a soft real-time transaction.

Manipulating transactions read all kinds of data in a database, and do a set of actions *AS*(*T*) = {*A*
_{
i
}|1 ≤ *i* ≤ *h*} to control the embedded system. If this kind of transaction exceeds the validity interval, disastrous results will be generated, so it is also a hard real-time transaction. Manipulating transactions are read-only operations, and they do not affect the consistency of databases, but they can change the states of real world.

The same as data consistency, transaction consistency in embedded real-time main memory database systems also include internal consistency, external consistency, and mutual consistency.

**Definition 6**. *T* is internal consistent, *if and only if* the value it reads and/or writes satisfies the predefined internal integrity and consistency of traditional database systems.

**Definition 7**. *T* is external consistent, *if and only if t* ≤ *D*(*T*) and ∀ *X*
_{
i
} ∈ *DS*(*T*), *t* ≤ *ST*(*X*
_{
i
}) + *VI*(*X*
_{
i
}).

The external consistency of embedded real-time transactions requires that each transaction is in its validity internal, and all read/write operations are within its validity interval.

**Theorem 1**. *Let MVI*(*T*) *be the minimum of all validate terminal instants of T while reading/writing data objects*, *then the final terminal instant of T is D*
_{
R
}(*T*) = min(*D*(*T*), *MVI*(*T*))*.*

*Proof:* If *MVI*(*T*) < *t* < *D*(*T*), then ∃ *X*
_{
i
} ∈ *DS*(*T*), such that *t* > *ST*(*X*
_{
i
}) + *VI*(*X*
_{
i
}), that is, there exists some *X*
_{
i
}, which loses the external consistency, so this violates the external consistency constraint while *T* reads/writes data objects. On the contrary, if *D*(*T*) < *MVI*(*T*) and *t* > *D*(*T*), then *T* exceeds the validity interval, and this violates the external constraint of *T*. So, we can have *D*
_{
R
}(*T*) = min(*D*(*T*), *MVI*(*T*)).

**Definition 8**. *T* is mutual consistent, *if and only if* ∀ *X*
_{
i
}, *X*
_{
j
} ∈ *DS*(*T*), and *i* ≠ *j*, such that |*ST*(*X*
_{
i
}) − *ST*(*X*
_{
j
})| ≤ *R*
_{mvi}.

The mutual consistency of embedded real-time transactions means that the time interval between any two data objects is not bigger than the given value *R*
_{mvi}(*T*).

With the same reason, when *T* is both external consistent and mutual consistent, then it is time consistent. A validate submit of transaction in embedded real-time systems depends not only on the internal consistency, but also on the time consistency. So, we have the following corollary.

**Corollary 1**.

*T is consistent*,

*if and only if the following constraints satisfy at the same time*:

- (1)
∀

*X*_{ i },*X*_{ i }∈*DS*(*T*); - (2)
*CT*(*T*) ≤*D*_{ R }(*T*); - (3)
∀

*X*_{ i }∈*RS*(*T*),*RT*_{ T }(*X*_{ i }) ≤*ST*(*X*_{ i }) +*VI*(*X*_{ i }); - (4)
∀

*X*_{ i },*X*_{ j }∈*RS*(*T*) and*i*≠*j*, such that |*ST*(*X*_{ i }) −*ST*(*X*_{ j })| ≤*R*_{mvi}(*T*).

## 3 Rules for correct recovery strategy

Taking the internal consistency and time consistency of transactions and data in embedded real-time main memory databases into consideration, we present some rules for correct recovery strategies.

### 3.1 Non-time series data recovery rule

**Rule 1**. If *T* has not been submitted, then for ∀ *X*
_{
i
} ∈ *US*(*T*) satisfying *S*
_{
t
}(*X*
_{
i
}) = *UI*
_{
T
}(*X*
_{
i
}), execute the undo operation.

**Rule 2**. If *T* has been submitted, then for ∀ *X*
_{
i
} ∈ *US*(*T*) satisfying *S*
_{
t
}(*X*
_{
i
}) ≠ *UI*
_{
T
}(*X*
_{
i
}), execute the redo operation.

Rules 1 and 2 can recover the data such that they satisfy the internal consistent constraint, and non-time series data only have internal consistent constraint, so they can also be used to recover non-time series data.

### 3.2 Time series data recovery rule

**Rule 3**. If ∃ *X*
_{
i
} ∈ *US*(*T*) satisfying *S*
_{
t
}(*X*
_{
i
}) = *UI*
_{
T
}(*X*
_{
i
}) and *t* ≤ *ST*(*X*
_{
i
}) + *VI*(*X*
_{
i
}), then whether or not *T* has been submitted, there is no need to execute any recovery operation for *X*
_{
i
}.

**Rule 4**. If ∃ *X*
_{
i
} ∈ *US*(*T*) satisfying *S*
_{
t
}(*X*
_{
i
}) ≠ *UI*
_{
T
}(*X*
_{
i
}) and *t* ≤ *ST*(*X*
_{
i
}) + *VI*(*X*
_{
i
}), then execute the redo operation for *X*
_{
i
}.

**Rule 5**. If ∃ *X*
_{
i
} ∈ *US*(*T*) satisfying *t* > *ST*(*X*
_{
i
}) + *VI*(*X*
_{
i
}), then resample by starting the data receiving transaction of *X*
_{
i
}.

**Theorem 2**. *Rules 3~5 can recover the internal and external state consistency of time series data*.

*Proof*: The recovery of time series data *X*
_{
i
} needs to consider the consistency between its internal state *S*
_{
t
}(*X*
_{
i
}) with its external state *UI*
_{
T
}(*X*
_{
i
}), but not whether or not the transaction has been submitted.

When *t* ≤ *ST*(*X*
_{
i
}) + *VI*(*X*
_{
i
}), if *S*
_{
t
}(*X*
_{
i
}) ≠ *UI*
_{
T
}(*X*
_{
i
}), i.e., the internal and external states of *X*
_{
i
} are not consistent, then whether or not *T* has been submitted, the redo operation should be executed according to *UI*
_{
T
}(*X*
_{
i
}) (Rule 4); and if *S*
_{
t
}(*X*
_{
i
}) = *UI*
_{
T
}(*X*
_{
i
}), i.e., the internal and external states of *X*
_{
i
} are consistent, then whether or not *T* has been submitted, there is no need to execute any recovery operation (Rule 3).

When *t* > *ST*(*X*
_{
i
}) + *VI*(*X*
_{
i
}), executing undo or redo operation is meaningless, and data receiving transaction should be restarted immediately to resample and recover the consistency of *X*
_{
i
} between its internal and external states (Rule 5).

### 3.3 Real world state recovery rule

In embedded real-time applications, if the transactions have been submitted and have changed the real world states, there is no need to recover; and if the transactions have not been submitted, then we should do some compensation to recover the state changes of real world.

**Rule 6**. If *T* has not been submitted, then for each action that has happened, i.e., ∀ *A*
_{
i
} ∈ *AS*(*T*), execute compensation or recovery task for *A*
_{
i
}.

**Theorem 3**. *Rule 6 can recover the consistency of real world state*.

*Proof*: Manipulating transactions is read-only, and they do not violate the consistency of data objects. The atomicity of manipulating transactions is that, whether all actions of *T*, *AS*(*T*) = {*A*
_{
i
}|1 ≤ *i* ≤ *h*}, are executed or none of them is executed.

Let *OAS*(*T*) = {*A*
_{
j
}|1 ≤ *j* ≤ *h*} be the set of actions that has been executed in *T* when a fault occurs. According to Rule 6, when *OAS*(*T*) ≠ ∅ and *OAS*(*T*) ≠ *AS*(*T*), we need to compensate and recover for ∀ *A*
_{
j
} ∈ *OAS*(*T*). So, the real world states, that have been changed, can be recovered correctly.

### 3.4 Transaction restart rule

No manual intervention is a typical feature of embedded real-time databases, and thus, the database systems should restart all kinds of transactions automatically when faults occur. The transactions needed to restart include two kinds. The first one is that restarting period has passed by or running time has exceeded the running period, and the second one includes non-periodic transactions, that do not finish successfully but still satisfy all consistencies.

**Rule 7**. For a periodic transaction *T*, if *T* does not finish normally, or *T* finishes normally and satisfies *t* ≥ *BT*(*T*) + *P*(*T*), then restart *T*.

**Rule 8**. For a non-periodic transaction

*T*, that does not finish normally, if the following conditions satisfy at the same time, then restart

*T*.

- (1)
*t*+*EET*(*T*) ≤*D*_{ R }(*T*); - (2)
∀

*X*_{ i }∈*RS*(*T*),*t*≤*ST*(*X*_{ i }) +*VI*(*X*_{ i }); - (3)
∀

*X*_{ i },*X*_{ j }∈*RS*(*T*),*i*≠*j*and 1 ≤*i*,*j*≤*n*, |*ST*(*X*_{ i }) −*ST*(*X*_{ j })| ≤*R*_{mvi}(*T*).

Rule 8 is the same as Corollary 1, i.e., when a fault occurs, only when all consistencies of a transaction have been satisfied, then we can restart the transaction.

## 4 Log-based recovery strategy

In order to recover from faults, embedded real-time main memory databases need to log the time and triggered actions for each transaction and data. These logs include real-time transaction logs, data logs, and action logs. Taking the limits of CPU, storage and energy in embedded systems, we propose the following data recovering strategies based on the rules in the last section.

**Strategy 1**. If *X* is a series data with short limited time, then there is no need to log the updates of data.

**Strategy 2**. If \( \frac{\left|AFI\left({X}_i\right)-BFI\left({X}_i\right)\right|}{BFI\left({X}_i\right)}\ge \delta \left({X}_i\right) \), then log the current data update operation; and otherwise, log nothing.

**Strategy 3**. Update the time series data objects immediately. That is updating the states of database before a transaction is submitted.

**Strategy 4**. Deferred update the non-time series data objects. That is updating the states of database when a transaction is submitted.

Strategies 1 and 2 can greatly reduce the overhead of logging the updates of time series data, and also accelerate the recovery speed. Rule 3 makes sure that the latest states of time series data can be written to the databases to reduce the redo operations of time series data. Rule 4 clears the logs of non-time series data and their undo recovery, and can further reduce the overhead of storage and recovery.

Based on the above strategies, we propose corresponding recovery algorithms for data receiving transactions, control transactions, and data processing transactions, and they are described as follows:

## 5 Experiments

### 5.1 Experimental setting

Parameter setting

Parameter | Meaning | Default value | Domain |
---|---|---|---|

| Image data object number | 250 | 50~500 |

| Derived data object number | 250 | 50~500 |

| Invariant data object number | 500 | 200~1000 |

| Short time limited data ratio | 20% | 0~40 % |

| Time series data validate time | 50ms | 5ms~10s |

| Time series data state change threshold | 10% | 0~20 % |

| Period of periodic transaction | 50ms | 5ms~10s |

| Periodic transaction generating ratio | 20/s | 5~50/s |

| MT trigger ratio | 5/s | 2~10/s |

| Update number | 2 | 1~3 |

| Action number | 3 | 1~5 |

| Update mode | Hybrid | Deferred/immediate/hybrid |

### 5.2 Experimental results

## 6 Conclusions

In this paper, we study the problem of data recovery strategy in embedded real-time main memory databases. Because of real-time requirement in embedded systems, consistency of embedded real-time main memory databases is different from traditional databases. We analyzed both the data and transaction consistencies in embedded real-time main memory databases, designed rules for correct recovery strategy, and proposed real-time log-based recover algorithms for different types of transactions. The experiments show that the proposed approach is more effective and efficient than methods in both traditional and eXtremeDB database systems. The proposed recovery algorithm can be integrated into the eXtremeDB database, and thus provide better recovery performance. Integrating the proposed algorithm into other main memory database will be our future work.

## Declarations

### Acknowledgements

The work was supported by the following funds: Hunan Provincial Natural Science Foundation of China (Grant No.2015JJ6043); Hunan University of Science and Engineering; Scientific Research Fund of Hunan Provincial Education Department(Grant No.12A054); The Construct Program of the Key Discipline in Hunan University of Science and Engineering(Circuits and Systems).

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- A Nori,
*Mobile and Embedded Databases[C]//Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. ACM*, 2007, pp. 1175–1177Google Scholar - V Narayanan, Y Xie, Reliability concerns in embedded system designs. Computer
**39**(1), 118–120 (2006)View ArticleGoogle Scholar - H Garcia-Molina, K Salem, Main memory database systems: an overview. Knowl. Data Eng. IEEE Trans.
**4**(6), 509–516 (1992)View ArticleGoogle Scholar - J Stankovic, SH Son, J Hansson, Misconceptions about real-time databases. Computer
**32**(6), 29–36 (1999)View ArticleGoogle Scholar - K Ramamritham, Real-time databases. Distrib. Parallel Databases
**1**(2), 199–226 (1993)View ArticleGoogle Scholar - G Özsoyoğlu, RT Snodgrass, Temporal and real-time databases: a survey. Knowl. Data Eng. IEEE Trans.
**7**(4), 513–532 (1995)View ArticleGoogle Scholar - K Ramamritham, SH Son, LC Dipippo, Real-time databases and data services. Real-time Syst.
**28**(2-3), 179–215 (2004)View ArticleMATHGoogle Scholar - KH Kim, HO Welch, Distributed execution of recovery blocks: an approach for uniform treatment of hardware and software faults in real-time applications. Comput. IEEE Trans.
**38**(5), 626–636 (1989)View ArticleGoogle Scholar - RM Sivasankaran, K Ramamritham, JA Stankovic et al.,
*Data Placement, Logging and Recovery in Real-Time Active Databases[M]//Active and Real-Time Database Systems (ARTDB-95)*(Springer, London, 1996), pp. 226–241View ArticleGoogle Scholar - Soparkar NR, Silberschatz A, Korth HF. Time-constrained transaction management: real-time constraints in database transaction systems. Kluwer Academic Publishers; 1996.Google Scholar
- MI Seltzer, MA Olson,
*Challenges in Embedded Database System Administration[C]//Proceeding of the Embedded System Workshop*, 1999, pp. 29–31Google Scholar - GM Liao, JP Li,
*Research on Timely Recovery Technology of Memory Database[C]//Wavelet Active Media Technology and Information Processing (ICWAMTIP), 2012 International Conference on. IEEE*, 2012, pp. 268–271Google Scholar - A Kemper, T Neumann,
*HyPer: A hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots[C]//Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE*, 2011, pp. 195–206Google Scholar - Lam KY, Kuo TW. real-time database systems: architecture and techniques. Kluwer Academic Publishers; 2001.Google Scholar
- LC Shu, JA Stankovic, SH Son, Achieving bounded and predictable recovery using real-time logging. Comput. J.
**47**(3), 373–394 (2004)View ArticleGoogle Scholar - T Niklander, K Raatikainen,
*Using Logs to Increase Availability in Real-Time Main-Memory Database[M]//Parallel and Distributed Processing*(Springer, Berlin Heidelberg, 2000), pp. 720–726Google Scholar - N Malviya, A Weisberg, S Madden et al.,
*Rethinking Main Memory OLTP Recovery[C]//Data Engineering (ICDE), 2014 IEEE 30th International Conference on. IEEE*, 2014, pp. 604–615View ArticleGoogle Scholar - E Levy, A Silberschatz, Incremental recovery in main memory database systems. Knowl. Data Eng. IEEE Trans.
**4**(6), 529–540 (1992)View ArticleGoogle Scholar - MC Majhi, AK Behera, NM Kulshreshtha et al.,
*ExtremeDB: a unified web repository of extremophilic archaea and bacteria [J]*, 2013Google Scholar