Research on recovery strategy in embedded real-time main memory databases

Yonghong, Tan; Xiangdong, Yin

doi:10.1186/s13639-016-0030-1

Review
Open access
Published: 04 May 2016

Research on recovery strategy in embedded real-time main memory databases

Tan Yonghong¹ &
Yin Xiangdong²

EURASIP Journal on Embedded Systems volume 2016, Article number: 9 (2016) Cite this article

2434 Accesses
Metrics details

Abstract

In order to recover data from embedded real-time main memory databases effectively and efficiently, this paper proposes a real-time log-based recovery approach. With respect to the real-time requirement in embedded systems, we classify the consistency in real-time main memory databases into data and transaction consistencies, analyze them theoretically, design rules for correct recovery strategy, and propose real-time log-based recover algorithms for different types of transactions. The experiments show that the proposed approach is more effective and efficient than methods in both traditional and eXtremeDB database systems.

1 Review

With the development of embedded systems, the application of databases in embedded systems [1] is a hotspot in both industry and academia. Embedded systems work in an environment without manual intervention, so when a fault occurs in these systems, they need to diagnose the fault and recover it automatically all by themselves [2]. The main memory databases [3, 4] can reduce the I/O operations greatly while running, and satisfy the real-time requirement of embedded systems, so the databases implemented in embedded systems usually work in the main memory.

In real-time main memory databases [5–7], the main copy of database works in the volatile RAM, and the data is very vulnerable, so the recovery is necessary. Meantime, the I/O operations in real-time main memory databases are few, and recovery is the only part that affects the I/O performance, so the performance of recovery is critical for real-time main memory databases [8, 9]. While recovering from a fault, real-time main memory databases need to satisfy multiple constraints [10, 11], and this pose a huge challenge for designing reasonable recovering strategies.

Checkpoint or memory snapshots [12, 13] is a commonly used program recovery strategy, but the overhead of storing states of running program is very high, and it is not suitable for embedded applications. In addition, the logs in embedded systems record the behaviors of embedded systems, and researchers use different logs to design different recovery strategies, such as partition log [14], real-time log [15], remote log [16], and operation log [17]. However, these strategies only take the requirement of real-time into consideration, and ignore other specific requirements in embedded systems, so they cannot be applied to the embedded environment efficiently. In addition, the method proposed in [13], studied the recovery strategy in main-memory, but the method is based on virtual memory snapshots. In order to improve real-time ability, Levy and Silberschatz [18] proposed an incremental recovery strategy in main-memory database.

In this paper, we analyze the consistency constraints in embedded real-time main memory databases from the perspectives of both data and transaction. Then we design some rules that an efficient recovery strategy must obey in embedded real-time main memory databases. Finally, we propose corresponding recovery algorithms for different tasks of embedded real-time main memory databases.

2 Analysis of consistency in embedded real-time main memory databases

In this section, we analyze the consistency of embedded real-time main memory databases from the perspective of both data and transaction.

2.1 Data consistency

The embedded real-time main memory databases include three types of data, i.e., image objects, derived objects, and invariant objects.

The objects of real world are sensed by sensors, and their values are written into the databases. The values written into the databases are image objects. An image object is an image of a real world object at some instant, and each image object has its own sampling timestamp and external validity interval.

A derived object is calculated out by a group of image objects during a transaction processing. The timestamp of a derived object is the instant when the transaction is finished, and the validity interval is the intersection of all validity intervals of image objects in the group.

An invariant object is a constant which is invariant as time goes by. The validity of an invariant object is not affected by time, so it is also called non-time series data object.

As there is a validity interval for each image object and derived object, both of them are time series data objects. The sampling time and computing time of time series data are validate only in an interval starting from the system’s current time.

Definition 1. If VI(X) is far less than AT(X), i.e., VI(X) < < AT(X), then X is short time-limited data.

The data consistency of embedded real-time systems includes internal consistency, external consistency, and mutual consistency.

Definition 2. X is internal consistent, if and only if it satisfies the predefined integrity and consistency of traditional database systems.

Here, the internal consistency is the internal consistency in traditional database systems, and it only refers to the internal world of database systems.

Definition 3. X is external consistent, if and only if it satisfies t ≤ ST(X) + VI(X).

The external consistency requires that the sampling data in a database lag the real world within a certain time.

Definition 4. A group of related data used for decision or deriving new data is a mutual consistent set R, and each R is related to a corresponding mutual validity internal R _mvi.

Definition 5. Let R = {X ₁, X ₂, …, X _n}, then R is mutual consistent, if and only if ∀ X _i ∈ R, ∀ X _j ∈ R and k ≠ i, such that |ST(X _i) − ST(X _j)| ≤ R _mvi.

If R is used to generate new data, then the mutual consistency is used to assure the values in R are generated within the common validity interval.

2.2 Transaction consistency

The embedded real-time main memory database systems interact with real world according to two behaviors. The first one is recording the states and events of the real world into the databases, and the second one is doing some acts to affect the real world. The embedded real-time transactions can be classified into data receiving transactions, data processing transactions and manipulating transactions.

Data receiving transactions sample the external environment periodically and write it into the databases. This kind of transaction generates an image object in one period, and it is a read-only and non-blocking hard real-time transaction.

Data processing transactions do read-only operations to image objects periodically or non-periodically, and read and write deriving objects or invariant objects. This kind of transaction does not interact with the real world, and is a soft real-time transaction.

Manipulating transactions read all kinds of data in a database, and do a set of actions AS(T) = {A _i|1 ≤ i ≤ h} to control the embedded system. If this kind of transaction exceeds the validity interval, disastrous results will be generated, so it is also a hard real-time transaction. Manipulating transactions are read-only operations, and they do not affect the consistency of databases, but they can change the states of real world.

The same as data consistency, transaction consistency in embedded real-time main memory database systems also include internal consistency, external consistency, and mutual consistency.

Definition 6. T is internal consistent, if and only if the value it reads and/or writes satisfies the predefined internal integrity and consistency of traditional database systems.

Definition 7. T is external consistent, if and only if t ≤ D(T) and ∀ X _i ∈ DS(T), t ≤ ST(X _i) + VI(X _i).

The external consistency of embedded real-time transactions requires that each transaction is in its validity internal, and all read/write operations are within its validity interval.

Theorem 1. Let MVI(T) be the minimum of all validate terminal instants of T while reading/writing data objects, then the final terminal instant of T is D _R(T) = min(D(T), MVI(T)).

Proof: If MVI(T) < t < D(T), then ∃ X _i ∈ DS(T), such that t > ST(X _i) + VI(X _i), that is, there exists some X _i, which loses the external consistency, so this violates the external consistency constraint while T reads/writes data objects. On the contrary, if D(T) < MVI(T) and t > D(T), then T exceeds the validity interval, and this violates the external constraint of T. So, we can have D _R(T) = min(D(T), MVI(T)).

Definition 8. T is mutual consistent, if and only if ∀ X _i, X _j ∈ DS(T), and i ≠ j, such that |ST(X _i) − ST(X _j)| ≤ R _mvi.

The mutual consistency of embedded real-time transactions means that the time interval between any two data objects is not bigger than the given value R _mvi(T).

With the same reason, when T is both external consistent and mutual consistent, then it is time consistent. A validate submit of transaction in embedded real-time systems depends not only on the internal consistency, but also on the time consistency. So, we have the following corollary.

Corollary 1. T is consistent, if and only if the following constraints satisfy at the same time:

(1)
∀ X _i, X _i ∈ DS(T);
(2)
CT(T) ≤ D _R(T);
(3)
∀ X _i ∈ RS(T), RT _T(X _i) ≤ ST(X _i) + VI(X _i);
(4)
∀ X _i, X _j ∈ RS(T) and i ≠ j, such that |ST(X _i) − ST(X _j)| ≤ R _mvi(T).

3 Rules for correct recovery strategy

Taking the internal consistency and time consistency of transactions and data in embedded real-time main memory databases into consideration, we present some rules for correct recovery strategies.

3.1 Non-time series data recovery rule

Rule 1. If T has not been submitted, then for ∀ X _i ∈ US(T) satisfying S _t(X _i) = UI _T(X _i), execute the undo operation.

Rule 2. If T has been submitted, then for ∀ X _i ∈ US(T) satisfying S _t(X _i) ≠ UI _T(X _i), execute the redo operation.

Rules 1 and 2 can recover the data such that they satisfy the internal consistent constraint, and non-time series data only have internal consistent constraint, so they can also be used to recover non-time series data.

3.2 Time series data recovery rule

Rule 3. If ∃ X _i ∈ US(T) satisfying S _t(X _i) = UI _T(X _i) and t ≤ ST(X _i) + VI(X _i), then whether or not T has been submitted, there is no need to execute any recovery operation for X _i.

Rule 4. If ∃ X _i ∈ US(T) satisfying S _t(X _i) ≠ UI _T(X _i) and t ≤ ST(X _i) + VI(X _i), then execute the redo operation for X _i.

Rule 5. If ∃ X _i ∈ US(T) satisfying t > ST(X _i) + VI(X _i), then resample by starting the data receiving transaction of X _i.

Theorem 2. Rules 3~5 can recover the internal and external state consistency of time series data.

Proof: The recovery of time series data X _i needs to consider the consistency between its internal state S _t(X _i) with its external state UI _T(X _i), but not whether or not the transaction has been submitted.

When t ≤ ST(X _i) + VI(X _i), if S _t(X _i) ≠ UI _T(X _i), i.e., the internal and external states of X _i are not consistent, then whether or not T has been submitted, the redo operation should be executed according to UI _T(X _i) (Rule 4); and if S _t(X _i) = UI _T(X _i), i.e., the internal and external states of X _i are consistent, then whether or not T has been submitted, there is no need to execute any recovery operation (Rule 3).

When t > ST(X _i) + VI(X _i), executing undo or redo operation is meaningless, and data receiving transaction should be restarted immediately to resample and recover the consistency of X _i between its internal and external states (Rule 5).

3.3 Real world state recovery rule

In embedded real-time applications, if the transactions have been submitted and have changed the real world states, there is no need to recover; and if the transactions have not been submitted, then we should do some compensation to recover the state changes of real world.

Rule 6. If T has not been submitted, then for each action that has happened, i.e., ∀ A _i ∈ AS(T), execute compensation or recovery task for A _i.

Theorem 3. Rule 6 can recover the consistency of real world state.

Proof: Manipulating transactions is read-only, and they do not violate the consistency of data objects. The atomicity of manipulating transactions is that, whether all actions of T, AS(T) = {A _i|1 ≤ i ≤ h}, are executed or none of them is executed.

Let OAS(T) = {A _j|1 ≤ j ≤ h} be the set of actions that has been executed in T when a fault occurs. According to Rule 6, when OAS(T) ≠ ∅ and OAS(T) ≠ AS(T), we need to compensate and recover for ∀ A _j ∈ OAS(T). So, the real world states, that have been changed, can be recovered correctly.

3.4 Transaction restart rule

No manual intervention is a typical feature of embedded real-time databases, and thus, the database systems should restart all kinds of transactions automatically when faults occur. The transactions needed to restart include two kinds. The first one is that restarting period has passed by or running time has exceeded the running period, and the second one includes non-periodic transactions, that do not finish successfully but still satisfy all consistencies.

Rule 7. For a periodic transaction T, if T does not finish normally, or T finishes normally and satisfies t ≥ BT(T) + P(T), then restart T.

Rule 8. For a non-periodic transaction T, that does not finish normally, if the following conditions satisfy at the same time, then restart T.

(1)
t + EET(T) ≤ D _R(T);
(2)
∀ X _i ∈ RS(T), t ≤ ST(X _i) + VI(X _i);
(3)
∀ X _i, X _j ∈ RS(T), i ≠ j and 1 ≤ i, j ≤ n, |ST(X _i) − ST(X _j)| ≤ R _mvi(T).

Rule 8 is the same as Corollary 1, i.e., when a fault occurs, only when all consistencies of a transaction have been satisfied, then we can restart the transaction.

4 Log-based recovery strategy

In order to recover from faults, embedded real-time main memory databases need to log the time and triggered actions for each transaction and data. These logs include real-time transaction logs, data logs, and action logs. Taking the limits of CPU, storage and energy in embedded systems, we propose the following data recovering strategies based on the rules in the last section.

Strategy 1. If X is a series data with short limited time, then there is no need to log the updates of data.

Strategy 2. If \( \frac{\left|AFI\left({X}_i\right)-BFI\left({X}_i\right)\right|}{BFI\left({X}_i\right)}\ge \delta \left({X}_i\right) \), then log the current data update operation; and otherwise, log nothing.

Strategy 3. Update the time series data objects immediately. That is updating the states of database before a transaction is submitted.

Strategy 4. Deferred update the non-time series data objects. That is updating the states of database when a transaction is submitted.

Strategies 1 and 2 can greatly reduce the overhead of logging the updates of time series data, and also accelerate the recovery speed. Rule 3 makes sure that the latest states of time series data can be written to the databases to reduce the redo operations of time series data. Rule 4 clears the logs of non-time series data and their undo recovery, and can further reduce the overhead of storage and recovery.

Based on the above strategies, we propose corresponding recovery algorithms for data receiving transactions, control transactions, and data processing transactions, and they are described as follows:

5 Experiments

5.1 Experimental setting

In the experiments, we implement the proposed log-based recovery algorithm on the eXtremeDB embedded database [19], and compare it with the traditional recovery method and the method in eXtremeDB. The experiments contain a small database, and the operations include insert, delete, and modification. Query operations are not in our experiments, because they do not change the data in the database, and the recovery strategy does not need to consider this situation. We mainly compare the system overhead, overtime transaction ratio (ratio of transactions that exceed the validity interval), and rejecting service time (downtime). The meanings and values of experimental parameters in eXtremeDB are in Table 1.

Table 1 Parameter setting

Full size table

5.2 Experimental results

Firstly, we compare the CPU utilization and log buffer utilization of the three approaches, and the results are in Figs. 1 and 2, respectively. With respect to CPU utilization, our proposed approach is higher than the other two, and the reason is that the proposed approach uses main memory to store data and it has the highest throughput. With respect to the log buffer utilization, the value of the proposed approach is the lowest, which means that the proposed method only logs necessary data and the usage of log buffer is the most efficient.

Secondly, we compare the ratio of transactions exceeding the validity interval in Fig. 3, and the average rejecting service time in Fig. 4. The ratio of transactions exceeding the validity interval is also the ratio of missing transactions. From Fig. 3, we can see that our proposed approach has the least missing transactions. Rejecting service time is also called downtime. Figure 4 illustrates that the proposed approach has the lowest average downtime.

Next, in our proposed approach, we observe the changes of overtime transaction ratio under different “per_short” (short time-limited data ratio) and “threshold” (time series data state change threshold), and the results are in Figs. 5 and 6, respectively. In Fig. 5, the order of overtime transaction ratios for different per_short is 0 > 0.5 > 0.1 > 0.3 > 0.2, which means that we must carefully select per_short to optimize the overtime transaction ratio. Here, per_short = 0.2 is the best. In Fig. 6, the order of overtime transaction ratios for different threshold is the same as that of per_short, so we can have the same conclusion.

Finally, we observe the time series data ratio of the proposed approach under different update modes, and the results are in Fig. 7. From the figure, we can see that the hybrid of deferred and immediate update modes has the lowest time series data ratio, which means that the hybrid update mode has canceled the overhead of undo recovery for the invariant data objects, and thus reduces the ratio of transactions exceeding the validity interval.

6 Conclusions

In this paper, we study the problem of data recovery strategy in embedded real-time main memory databases. Because of real-time requirement in embedded systems, consistency of embedded real-time main memory databases is different from traditional databases. We analyzed both the data and transaction consistencies in embedded real-time main memory databases, designed rules for correct recovery strategy, and proposed real-time log-based recover algorithms for different types of transactions. The experiments show that the proposed approach is more effective and efficient than methods in both traditional and eXtremeDB database systems. The proposed recovery algorithm can be integrated into the eXtremeDB database, and thus provide better recovery performance. Integrating the proposed algorithm into other main memory database will be our future work.

References

A Nori, Mobile and Embedded Databases[C]//Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. ACM, 2007, pp. 1175–1177
Google Scholar
V Narayanan, Y Xie, Reliability concerns in embedded system designs. Computer 39(1), 118–120 (2006)
Article Google Scholar
H Garcia-Molina, K Salem, Main memory database systems: an overview. Knowl. Data Eng. IEEE Trans. 4(6), 509–516 (1992)
Article Google Scholar
J Stankovic, SH Son, J Hansson, Misconceptions about real-time databases. Computer 32(6), 29–36 (1999)
Article Google Scholar
K Ramamritham, Real-time databases. Distrib. Parallel Databases 1(2), 199–226 (1993)
Article Google Scholar
G Özsoyoğlu, RT Snodgrass, Temporal and real-time databases: a survey. Knowl. Data Eng. IEEE Trans. 7(4), 513–532 (1995)
Article Google Scholar
K Ramamritham, SH Son, LC Dipippo, Real-time databases and data services. Real-time Syst. 28(2-3), 179–215 (2004)
Article MATH Google Scholar
KH Kim, HO Welch, Distributed execution of recovery blocks: an approach for uniform treatment of hardware and software faults in real-time applications. Comput. IEEE Trans. 38(5), 626–636 (1989)
Article Google Scholar
RM Sivasankaran, K Ramamritham, JA Stankovic et al., Data Placement, Logging and Recovery in Real-Time Active Databases[M]//Active and Real-Time Database Systems (ARTDB-95) (Springer, London, 1996), pp. 226–241
Book Google Scholar
Soparkar NR, Silberschatz A, Korth HF. Time-constrained transaction management: real-time constraints in database transaction systems. Kluwer Academic Publishers; 1996.
MI Seltzer, MA Olson, Challenges in Embedded Database System Administration[C]//Proceeding of the Embedded System Workshop, 1999, pp. 29–31
Google Scholar
GM Liao, JP Li, Research on Timely Recovery Technology of Memory Database[C]//Wavelet Active Media Technology and Information Processing (ICWAMTIP), 2012 International Conference on. IEEE, 2012, pp. 268–271
Google Scholar
A Kemper, T Neumann, HyPer: A hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots[C]//Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 2011, pp. 195–206
Google Scholar
Lam KY, Kuo TW. real-time database systems: architecture and techniques. Kluwer Academic Publishers; 2001.
LC Shu, JA Stankovic, SH Son, Achieving bounded and predictable recovery using real-time logging. Comput. J. 47(3), 373–394 (2004)
Article Google Scholar
T Niklander, K Raatikainen, Using Logs to Increase Availability in Real-Time Main-Memory Database[M]//Parallel and Distributed Processing (Springer, Berlin Heidelberg, 2000), pp. 720–726
Google Scholar
N Malviya, A Weisberg, S Madden et al., Rethinking Main Memory OLTP Recovery[C]//Data Engineering (ICDE), 2014 IEEE 30th International Conference on. IEEE, 2014, pp. 604–615
Book Google Scholar
E Levy, A Silberschatz, Incremental recovery in main memory database systems. Knowl. Data Eng. IEEE Trans. 4(6), 529–540 (1992)
Article Google Scholar
MC Majhi, AK Behera, NM Kulshreshtha et al., ExtremeDB: a unified web repository of extremophilic archaea and bacteria [J], 2013
Google Scholar

Download references

Acknowledgements

The work was supported by the following funds: Hunan Provincial Natural Science Foundation of China (Grant No.2015JJ6043); Hunan University of Science and Engineering; Scientific Research Fund of Hunan Provincial Education Department(Grant No.12A054); The Construct Program of the Key Discipline in Hunan University of Science and Engineering(Circuits and Systems).

Author information

Authors and Affiliations

Experimental Training Center, Hunan University of Science and Engineering, YongZhou City, Hunan Province, China
Tan Yonghong
School of Electronics and Information Engineering, Hunan University of Science and Engineering, YongZhou City, Hunan Province, China
Yin Xiangdong

Authors

Tan Yonghong
View author publications
You can also search for this author in PubMed Google Scholar
Yin Xiangdong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tan Yonghong.

Additional information

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Yonghong, T., Xiangdong, Y. Research on recovery strategy in embedded real-time main memory databases. J Embedded Systems 2016, 9 (2016). https://doi.org/10.1186/s13639-016-0030-1

Download citation

Received: 22 December 2015
Accepted: 20 April 2016
Published: 04 May 2016
DOI: https://doi.org/10.1186/s13639-016-0030-1

Research on recovery strategy in embedded real-time main memory databases

Abstract

1 Review

2 Analysis of consistency in embedded real-time main memory databases

2.1 Data consistency

2.2 Transaction consistency

3 Rules for correct recovery strategy

3.1 Non-time series data recovery rule

3.2 Time series data recovery rule

3.3 Real world state recovery rule

3.4 Transaction restart rule

4 Log-based recovery strategy

5 Experiments

5.1 Experimental setting

5.2 Experimental results

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords