diff 函数是用来将数据进行某种移动之后与原数据进行比较得出的差异数据,举个例子,现在有一个 DataFrame 类型的
数据 df,如下:
index
A
B
C
D
如果执行:
df.diff()
则会得到:
index
A
B
C
D
value1
0
1
2
3
value1
NaN
1
1
1
怎么得到的呢,其实是经过了两个步骤,首先会执行:
df.shift()
然后再将该数据与原数据做差,即:
df.shift()-df
函数原型:
DataFrame.diff(periods=1, axis=0)
参数:
periods:移动的幅度,int 类型,默认值为 1。
axis:移动的方向,{0 or ‘index’, 1 or ‘columns’},如果为 0 或者’index’,则上下移动,如果为 1 或
者’columns’,则左右移动。
返回值
diffed:DataFrame 类型
例如:执行
df.diff(2)
得到:
index
A
B
C
D
执行
df.diff(-1)
得到:
index
A
B
C
D
value1
NaN
NaN
2
2
value1
-1
-1
-1
NaN
import pandas as pd
import numpy as np
import seaborn as sb #绘图
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn import metrics #评分
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix #混淆矩阵
from sklearn import svm,tree, ensemble,model_selection #模型选择
#%%
train_pc1=pd.read_csv("data/test_data1536128609.98_screen1.csv",header=None)
print(train_pc1.shape)
train_pc1.head()
train_pc2=pd.read_csv("data/test_data1536128871.84_screen2.csv",header=None)
print(train_pc2.shape)
train_pc1.head()
train_sensor=pd.read_csv("data/test_data1536129261.77_device253.csv",header=None)
print(train_sensor.shape)
train_sensor.head()
test=pd.read_csv("data/test_data1536129670.55_main.csv",header=None)
print(test.shape)
test.head()
#
重命名列这些列被赋值——时间戳、MaxCurrent、EffCurrent。这些符合问题中可用的规范。
#%%
train_pc1.rename(columns={0:"Timestamp",1:"MaxCurrent",2:"EffCurrent"},inplace=True)
train_pc2.rename(columns={0:"Timestamp",1:"MaxCurrent",2:"EffCurrent"},inplace=True)
train_sensor.rename(columns={0:"Timestamp",1:"MaxCurrent",2:"EffCurrent"},inplace=True)
test.rename(columns={0:"Timestamp",1:"MaxCurrent",2:"EffCurrent"},inplace=True)
#%% md
分析设备的状态
有两种类型的设备。这些设备的状态可以通过下面的线图观察到:
1. PC 1 和 PC 2 有 3 种状态:>。
ON—当出现一个突然的峰值(用状态 1 表示)时>
空闲-当存在一个稳定模式(由状态 2 表示)时>
OFF—无电流流动时,即有效电流=0(状态 0 表示)
2. 温度传感器-有两种状态->。
ON—当有电流流(由状态 1 表示)时>
OFF-当没有电流流时,即有效电流或 RMS=0(状态 0 表示)
linesPC1 = train_pc1.plot.line(x='Timestamp', y='EffCurrent',title='PC Screen 1')
linesPC2 = train_pc2.plot.line(x='Timestamp', y='EffCurrent',title='PC Screen 2')
linesSensor = train_sensor.plot.line(x='Timestamp', y='EffCurrent',title='Sensor')
#%%
train_pc1['Time Period']=''
train_pc1['Standalone_Time']=''
for index,row in train_pc1.iterrows():
if(index==0):
train_pc1['Standalone_Time'][0]=0
train_pc1['Time Period'][0]=0
elif(train_pc1['EffCurrent'][index-1]==train_pc1['EffCurrent'][index]):
train_pc1['Standalone_Time'][index]=round(train_pc1['Timestamp'][index]-train_pc1['Timestamp
'][index-1],2)
train_pc1['Time
Period'][index]=round(train_pc1['Timestamp'][index]-train_pc1['Timestamp'][index-1],2)
else:
train_pc1['Standalone_Time'][index]=0
train_pc1['Time
Period'][index]=round(train_pc1['Timestamp'][index]-train_pc1['Timestamp'][index-1],2)
train_pc1['PC Screen1']=''
train_pc1['EffCurrent_Diff'] = train_pc1['EffCurrent'].diff()
train_pc1.loc[(train_pc1['EffCurrent_Diff']== 0), 'PC Screen1'] = 2
train_pc1.loc[(train_pc1['EffCurrent_Diff']!= 0), 'PC Screen1'] = 1
train_pc1.loc[(train_pc1['EffCurrent'] == 0), 'PC Screen1'] = 0
train_pc1.drop('EffCurrent_Diff',axis=1,inplace=True)
train_pc1.head()
#%%
train_pc2['Time Period']=''
train_pc2['Standalone_Time']=''
for index,row in train_pc2.iterrows():
if(index==0):
train_pc2['Standalone_Time'][0]=0
train_pc2['Time Period'][0]=0
elif(train_pc2['EffCurrent'][index-1]==train_pc2['EffCurrent'][index]):
train_pc2['Standalone_Time'][index]=round(train_pc2['Timestamp'][index]-train_pc2['Timestamp
'][index-1],2)
train_pc2['Time
Period'][index]=round(train_pc2['Timestamp'][index]-train_pc2['Timestamp'][index-1],2)
else:
train_pc2['Standalone_Time'][index]=0
train_pc2['Time
Period'][index]=round(train_pc2['Timestamp'][index]-train_pc2['Timestamp'][index-1],2)
train_pc2['PC Screen2']=''
train_pc2['EffCurrent_Diff'] = train_pc2['EffCurrent'].diff()
train_pc2.loc[(train_pc2['EffCurrent_Diff']== 0), 'PC Screen2'] = 2
train_pc2.loc[(train_pc2['EffCurrent_Diff']!= 0), 'PC Screen2'] = 1
train_pc2.loc[(train_pc2['EffCurrent'] == 0), 'PC Screen2'] = 0
train_pc2.drop('EffCurrent_Diff',axis=1,inplace=True)
train_pc2.head()
%%
train_sensor['Time Period']=''
for index,row in train_sensor.iterrows():
if(index==0):
train_sensor['Time Period'][0]=0
else:
train_sensor['Time
Period'][index]=round(train_sensor['Timestamp'][index]-train_sensor['Timestamp'][index-1],2)
train_pc2['PC Screen2']=''
train_sensor['EffCurrent_Diff'] = train_sensor['EffCurrent'].diff()
train_sensor.loc[(train_sensor['EffCurrent_Diff']== 0), 'Temperature Sensor'] = 0
train_sensor.loc[(train_sensor['EffCurrent_Diff']!= 0), 'Temperature Sensor'] = 1
train_sensor.loc[(train_sensor['EffCurrent'] == 0), 'Temperature Sensor'] = 0
train_sensor.drop('EffCurrent_Diff',axis=1,inplace=True)
train_sensor.head()
#%%
linesPC1 = train_pc1.plot.line(x='Timestamp', y='PC Screen1',title='PC Screen 1')
linesPC2 = train_pc2.plot.line(x='Timestamp', y='PC Screen2',title='PC Screen 2')
linesSensor = train_sensor.plot.line(x='Timestamp', y='Temperature Sensor',title='Sensor')
#%% md