1.2 How to compute the skyline object?
In the database research field, there are already a lot of algorithms for
the skyline query for different environment. Following is a list on the most
common used algorithms in the research and discussion, with the source they
are from:
-
BNL(Block-nested-loops), [BKS01]: The most basic and straight-forward
algorithm, which is also the basic algorithm in our system. This
algorithm requires all-pair compare for all the data objects so the
efficiency is theoretically bad. However, in the actuarial practice, it
always shows good performance because there are no other tricks which
may take more efforts:)
-
D&C(Divide-and-conquer), [BKS01]:
-
SFS(Sort Filter Skyline), [CGGL03]:
-
FLET(Fast Linear Expected-time), [BCL90]:
-
Bitmap, [TEO01]:
-
B-Tree Index, [TEO01]:
-
NN, [KRR02]:
More and more efficient algorithms will come out for better performance on
skyline query, which is also the purpose for our website.
1.3 About the platform: ASP.NET 2.0
Due to the require of the course project, the Skyline Website is constructed
on the .NET Framework 2.0 using C#, edited in VS2005.
1.4 About this website and the Task List
This site is for the research aim, so with the going on research in Skyline
Computation, more tests and features will be added in. Till now the
performance we achieved can be far away from satisfication. So the following
task list will show the direction we will plunge into in the future.
-
Efficient algorithms and implementation on Skyline computation for
existing data set.
-
User defined data schemas, data set and query requirement.
-
More algorithms embedded into the website for comparation between them.
1.5 About this document
This document is developer-aimed, for the possible improvement, suggestions
and modifications from any researchers and other else.
Lots of the description on the algorithms we used in our website can be
found from their original papers, and the code here will not guarantee to be
the best implementation, so try to give us your cool codes on them to
improve the performance!
2. Roadmap: Whole structure of the website
Until Nov 29th, 2006, when it is the deadline of the project for the course,
the whole website just achieved a few features on the skyline computation:
basic skyline query using BNL algorithm and presorting algorithm; user
defined data schema and data set.
The whole project contains the following
folders and files.
3. Compute Skyline objects
In this part, the ideas used to compute the skyline objects in our website
will be discussed. All the codes about these algorithms can be found in the
App_code folder. In implementation of
our algorithms, all the objects in data set have been considered as a row of
data in a data table, which is correspond to the data table in database.
3.1 BNL(Block-nested-loops)
BNL algorithm is a straight-forward computation process for skyline query.
This algorithm keep a window to store candidate skyline objects and then fix
the real skyline objects by comparing objects in the window with all the
objects not in the window. Finally all the objects which cannot be dominated
by any other objects will be left in the window as the skyline objects.
In [BKS01], the window for the BNL algorithm is fixed, however in our
implementation, it is dynamic ranged, which can improve the efficiency but
heavier memory load.
In
BasicSkyline.cs, we firstly defined
the operator
blnDominate, which will
return whether one row in a data table can dominate another row. This
operator will be called in the
BasicBNLSkyline function, and then this
function will compare all the objects for the skyline object which will not
be dominated by any other objects.
3.2 SFS(Sort Filter Skyline)
SFS requires presorted data on all dimensions of the data table for each
column(attribution) according to the descend or ascend sequence. It is based
on the theory that the skyline represents the closure over the maximum
scoring tuples of the relation with respect to all monotone scoring
functions[CGGL03].
With no duplicate in the data table, all the data can be ranked according to
each of the columns(attributions), and then the one with higher rank
compared to the others will surely be the skyline object. Based on this
idea, a sweep algorithm can be implemented on the presorted data to scan
row-by-row to record the objects with higher ranks in each columns as the
skyline objects.
PreSort is the
function to presort the date for each columns(attributions), and then
SkylineQuery_Presorted is used for the
skyline query by sweeping.
The actual data always have duplication, however, and the simple sweep can
no longer get the correct answer due to the reason that the non-skyline
object which has the same values on some columns(attributions) as one of the
skyline objects will be counted as a skyline object. Now our attempt is use
basic BNL algorithm for the objects gained from the sweep algorithm.
3.3 Epsilon Skyline
Epsilon skyline is a new attemption on the fixed number of skyline query, or
k-skyline, which will find exactly k objects in the data table with better
attributions than any others. Common skyline query always can't control the
number of the objects queried out, and this method is an improvement for the
user defined skyline computation.
4. User defined data schema
Users defined data schema provides easier and more customized skyline
queries. Using this feature, users can upload their own data schemes and
also the whole data table into the database, and then run skyline query on
them. All the features are included in the
UserDefined_1.aspx.cs and
UserDefined_2.aspx.cs.
4.1 Database setup
These features are supported by special data table in database: Schemas,
which includes three attributions: SchemaID, SchemaName, Columns.
+Schemas
|-----SchemaID
ID of the schemes
|-----SchemaName
Name of the
schemes
|-----Columns
Columns contained in the scheme, divided with ,
between columns
4.2 User defined data schema upload
This feature uses the DataSet adapter provided by ADO.NET 2.0, which will
setup a data connection between the data and the project, and then provide
necessary interface to the developer for select, delete, insert.
<asp:ObjectDataSource
ID="ObjectDataSource1"
runat="server"
DeleteMethod="Delete"
InsertMethod="Insert"
OldValuesParameterFormatString="original_{0}"
SelectMethod="GetSchemaData"
TypeName="dsSchemaTableAdapters.SchemasTableAdapter"
UpdateMethod="Update">
<DeleteParameters>
...
</DeleteParameters>
<UpdateParameters>
...
</UpdateParameters>
<InsertParameters>
...
</InsertParameters>
</asp:ObjectDataSource>
Then we use two controls: GridView and DetailView, both of which are new
controls in .NET2.0, to provide the on-the-fly data schemes upload:
<asp:GridView
ID="GridView1"
runat="server"
DataSourceID="ObjectDataSource1"
OnSelectedIndexChanged="GridView1_SelectedIndexChanged"
Width="650px"
AutoGenerateColumns="False"
DataKeyNames="SchemaID">
...
<asp:DetailsView
ID="DetailsView1"
runat="server"
AutoGenerateRows="False"
DataSourceID="ObjectDataSource1"
Height="50px"
Width="649px"
DataKeyNames="SchemaID"
DefaultMode="Insert">
...
We also provide the feature to upload schemes from text file. The format of
the text file is required to be like the following:
Column1 Column2
Column3
In the text file, there will be a tab
("\t") between two column names, and
the scheme will be terminated by an enter
("\n"). The
btnUpload_Click behavior then will read
all the text in the text file and just turn the first line as the active
scheme for storing, which means that this feature can only read one scheme
from one text file each time.
4.3 Bulk upload user defined data into database
This feature will enable user to upload their data into database from a text
file which contains data in the follow format:
R1C1 R1C2 R1C3 ... R1Cn
R2C1 R2C2 R2C3 ... R2Cn
...
RmC1 RmC2 RmC3 ... RmCn
There is a tab ("\t") between two columns and each data row is terminated by
an enter ("\n").
The upload feature uses the
System.Data.SqlClient.SqlBulkCopy in
blnBulk2DB, which will bulk upload data
into an existing data table in database. It is new in the ADO.NET 2.0.
To avoid possible failure of upload, we use transaction to protect the
integration of the data:
SqlConnection conn =
new
SqlConnection(
System.Configuration.ConfigurationManager.AppSettings["DB_NBAConnectionString"]);
conn.Open();
//Start
transaction
SqlTransaction trans =
conn.BeginTransaction();
... ...
//Start to bulk add data into new table
SqlBulkCopy sqlBulkObj =
new SqlBulkCopy(conn,
SqlBulkCopyOptions.KeepIdentity, trans);
... ...
trans.Commit();
5 System Error Logger
For the purpose of research, there will be unavoidable error in the running
time of the website, and always these error messages are important for
developer and researcher to check their algorithms. For this purpose, we
make a error logger in the file
SystemLog.cs to store most of the error
into the log file
ErrLog.txt.
5.1 How Error Logger works
The principle of the logger is very simple: we just maintain a log file in
the main directionary of the website. If there is no such a file, a new log
file will be created.
//Check whether the log file is
existing.
if
(!File.Exists(FilePath))
{
FileStream fs = new
FileStream(FilePath,
FileMode.CreateNew);
fs.Close();
}
The length of the log file is limited in 24KB. If one error occurs, all the
error message will be appended into this file.
//Write error
log.
if
(File.ReadAllBytes(FilePath).Length
>= 240000)
{
File.WriteAllText(FilePath,
DateTime.Now + "\t" + strException +
"\n");
}
else
{
File.AppendAllText(FilePath,
DateTime.Now + "\t" + strException +
"\n");
}
5.2 How to use this logger in your developing?
To redirect any error in your implementation to the log file, you can simply
do it like this:
try
{
// Your code
here
}
catch
(Exception
ex)
{
SystemLog.WriteException(ex.ToString());
// The terminate
process
}
5.3 The content of the ErrLog.txt
The content of the ErrLog.txt is the same as the standard format of any
error threw out by .NET CLR debugger:
2006-11-17
18:05:21 System.ArgumentException: Column 'c2' does not
belong to table .
at
System.Data.DataRow.GetDataColumn(String
columnName)
at
System.Data.DataRow.get_Item(String
columnName)
at
BasicSkyline.blnDominate(DataRow dr1, DataRow dr2, ArrayList columnList,
ArrayList TypeList) in e:\Documents and Settings\LeeWinnie\My
Documents\Visual Studio
2005\WebSites\SkylineWebSite\App_Code\BasicSkyline.cs:line
173
at
BasicSkyline.BasicBNLSkyline(DataTable dt, ArrayList columnList, ArrayList
TypeList) in e:\Documents and Settings\LeeWinnie\My Documents\Visual Studio
2005\WebSites\SkylineWebSite\App_Code\BasicSkyline.cs:line 91
6 Reference
All the following reference paper can be found here:
http://groups-beta.google.com/group/DB_Skyline
, which is a discussion group on Skyline computation maintained by the
author of this document.
-
[BKS01]: Stephan B¨orzs¨onyi, Donald Kossmann and Konrad Stocker.
The Skyline Operator, In ICDE,
pp421-430, 2001.
-
[CGGL03]: Jan Chomicki, Parke Godfrey, Jarek Gryz, and Dongming Liang.
Skyline with Presorting: Theory and
Optimizations
-
[BCL90]: J. L. Bentley, K. L. Clarkson, and D. B. Levine. Fast linear
expected-time algorithms for computing maxima and convex hulls. In SODA,
pp. 179-187, Jan. 1990.
-
[TEO01]: K. -L. Tan, P. K. Eng, and B. C. Ooi. Efficient progressive
skyline computation. In VLDB, pp 301-310, Sept. 2001
-
[KRR02]: D. Kossmann, F. Ramask, and S. Rost. Shooting stars in the sky:
An online algorithm for skyline queries. In VLDB, pp275-286, Aug. 2002
7 LICENSE: GPL and GFDL
This document is under the protection of GFDL, and the program of the
website is under GPL. Both of them can be found here:
http://www.gnu.org/licenses/
Skyline Website
Last Modified:Sun Nov 26 20:22:25 2006
Generated on Sun Nov 26 20:22:25 2006 for Skyline Website by
1.5.1-p1