1. Introduction of the dataset
Title:Raw data and joint calculation data for RNN regression calculators of RAS tank DO soft-sensor
Two dataset files are included in the dataset, the first one is an RAR compressed file named "Frequency domain dataset of RAS DO.rar", which contains the raw data for RNN regression calculators; the second one is an Matlab matrix file named "joint_calculation_data.mat", which contains the joint calculation variables, monitoring data and soft-sensor simulation data, the joint calculation results could be obtained, based on witch optimized the simulation data and made contrast with monitoring data.
Firstly, after decompressing "Frequency domain dataset of RAS DO.rar", three sub folders were included under the decompressed folder,following are the introduction about these sub folders and the files belongs to them:
a). AR_DATA: The abbreviation "AR" is short for "aeration". This folder is a collection of frequency domain distribution data from aeration experiments in RAS, the raw data are devided into four txt files in this folder, whose names are "Imag part-with condition.txt", "Imag part-without condition.txt" , "Real part-with condition.txt", "Real part-without condition.txt" respectively. The naming convention is that for each file, two features are specified, one feature is whether the data is "Real part" or "Imag part", the other feature is whether the data is "with" or "without" condition vector. For instance, the file "Imag part-with condition.txt" means this raw data is the real part of the frequency domain distribution of aeartion experiments with condition vector for each record.
b). FD_DATA: The abbreviation "FD" is short for feeding. This folder is a collection of frequency domain distribution data from feeding experiments in RAS. Similarly, the raw data are devided into four txt files in this folder, whose names are "Imag part-with condition.txt", "Imag part-without condition.txt" , "Real part-with condition.txt", "Real part-without condition.txt" respectively. The naming convention is that for each file, two features are specified, one feature is whether the data is "Real part" or "Imag part", the other feature is whether the data is "with" or "without" condition vector. For instance, the file "Imag part-with condition.txt" means this raw data is the real part of the frequency domain distribution of feeding experiments with condition vector for each record.
c). WF_DATA: The abbreviation "WF" is short for water flow. This folder is a collection of frequency domain distribution data from water flow experiments in RAS. Similarly, the raw data are devided into four txt files in this folder, whose names are "Imag part-with condition.txt", "Imag part-without condition.txt", "Real part-with condition.txt", "Real part-without condition.txt" respectively. The naming convention is that for each file, two features are specified, one feature is whether the data is "Real part" or "Imag part", the other feature is whether the data is "with" or "without" condition vector. For instance, the file "Imag part-with condition.txt" means this raw data is the real part of the frequency domain distribution of water flow experiments with condition vector for each record.
Secondly, after importing "joint_calculation_data.mat" into Matlab, 16 variables are included:
a).AR_IMAG: indicates the imag part of aeration RNN regression calculator sub-model output;
b).AR_REAL: indicates the real part of aeration RNN regression calculator sub-model output;
c).FD_IMAG:indicates the imag part of feeding RNN regression calculator sub-model output;
d).FD_REAL:indicates the real part of feeding RNN regression calculator sub-model output;
e).WF_IMAG:indicates the imag part of water flow RNN regression calculator sub-model output;
f).WF_REAL:indicates the real part of water flow RNN regression calculator sub-model output;
g).DCM_simulation1: indicates the DCM time domain simulation sequences data;
h).simu_combined_AR: The frequency distribution of aeration RNN regression calculator sub-model output;
i).simu_combined_FD: The frequency distribution of feeding RNN regression calculator sub-model output;
j).simu_combined_WF: The ffrequency distribution of water flow RNN regression calculator sub-model output;
k).SIMU_AMPLITUDE_AR: The Centrally symmetric form of simu_combined_AR for inverse-Fourier transformation.
l).SIMU_AMPLITUDE_FD: The Centrally symmetric form of simu_combined_FD for inverse-Fourier transformation.
m).SIMU_AMPLITUDE_WF: The Centrally symmetric form of simu_combined_WF for inverse-Fourier transformation.
n).T: The time axis;
o).monitoring_data: The monitoring data of RAS tank DO.
p).target: The name of the mat file, for save the variables more conveniently.
2. Methodology information and processing code advices
Fistly, for the raw data in "Frequency domain dataset of RAS DO.rar", the data were the frequency domain distribution of experiment time sequence data of aeartion, feeding and water flow respectively, in other words, the data were generated by Fourier transformation with a sampling rate of 1 HZ, at first, the frequency distribution were in the form of complex number, considering the possibility for just use the real part or imaginary part value, the original data were divided into real part and imag part respectively; more over, considering the possibility to mark the data with experimental condition or training the model with experimental condtion ,two extra copies with condition vectors were presented, that's the reason why the raw data files in "Frequency domain dataset of RAS DO.rar" contains two features in naming convention. For each raw data files, the rows were splited by carriage return character, the columns were splited by commas.
We suggest using python to deal with the raw data files in "Frequency domain dataset of RAS DO.rar", following are the codes of a data import function, you can call the function by suplying the directory where the raw data files is.
def makeDataSet(dir):
s=open(dir)
dataSet=s.readlines()
for i in range(len(dataSet)):
dataSet[i]=dataSet[i].split(",")
print("Row.",i+1,"Contains",len(dataSet[i])," values")
for s in range(len(dataSet[i])):
dataSet[i][s]=float(dataSet[i][s])
return dataSet
Secondly, for the variables in "joint_calculation_data.mat", the variables AR_IMAG,AR_REAL,FD_IMAG,FD_REAL,WF_IMAG,WF_REAL are calculation results of corresponding calculators, based on which joint calculation results could be derived, the joint calculation results then optimized the simulation data and made contrast with monitoring data.
We suggest using Matlab to deal with the variables in "joint_calculation_data.mat", for convenience, we have already generate middle variables to obtain joint calculation result as well as the optimized simulation data for final contrast. Following are the scripts to plot the figures:
load('joint_calculation.mat')
plot(T,SIMU_AMPLITUDE_AR(1:10:600),'or',T,SIMU_AMPLITUDE_WF(1:10:600),'*b',T,SIMU_AMPLITUDE_FD(1:10:600),'+g');
3. Data specific information
a). Definitions of data: For raw data of "Frequency domain dataset of RAS DO.rar", the data in files named "Xxxx part-with condition.txt" are series data of frequency distribution of corresponding part, as aforementioned, the rows were splited by carriage return character, the columns were splited by commas. Each row corresponds to one experiment, as "Xxxx part-with condition.txt" contains experimental condtion vectors, the first seven values of each row is the corrsponding experimental condtion vectors, the vectors were listed in the order of flow rate, aeration strength, aeration outlet amount, aeration type, feeding temperature, rearing density and feeding appetite. In contrast, the data in files named "Xxxx part-with condition.txt" do not include condition vectors. Regards to the variables in "joint_calculation_data.mat", please refer the introduction.
b). Units of measurement: To the condition vectors in files named "Xxxx part-with condition.txt", the units of flow rate, aeration strength is liter per minute, the unit of feeding tempreture is centigrade, the unit of rearing density and feeding appetite is gram.
To the variables in "joint_calculation.mat", the unit of T is seconds, the units of monitoring_data and DCM_simulation1 is milligram per liter. Except aforementioned, other values have no unit.
c). Definitions for codes or symbols used to record missing data: the missing data is marked as -9999, so far, no missing data is found.
d). Specialized formats or abbreviations used: no other specialized formats or abbreviations except aforementioned.