دسته‌ها
process mining علمی

process mining introduction 3 – decision tree

a good article here
Entropy: the degree of uncertainty
invest of compressibility (zippability)
Goal: reduce entropy in leaves of the tree to improve predictability.
E = – (Sigma from i=1 to k)Pi log(Pi) in base 2
K: possible values enumerated
Pi = Ci / n is the fraction of elements having value i with Ci>= ۱ the number of i value and n= (sigma i = 1 to k ) Ci
decision tree

دسته‌ها
Buiness Analysis Data Science process mining

Process mining – Introduction 2

  • Case ID
  • Activity Name
  • Time Stamp

Play out: A possible scenario

Play in: simple process allowing for 4 traces

Replay

Process mining:

  1. Discovery
  2. Conformance
  3. Enhancement

Machine learning:

  1. Supervised learning: response variable that labels each instance (we labeled each data and the machine will learn from that)
    1. Classification: classify to predict (i.e. decision tree)
    2. Regression: final function that fits data
  2. Unsupervised learning: unlabeled. (i.e. clustering like K-means, pattern discovery)

Example: smoker, drinker, weight: supervised learning

Smoker, drinker: predictor variable

Weight: response variable

دسته‌ها
Buiness Analysis Data Science process mining

Process mining – Introduction 1

Process mining is the combination of Data mining and Business process management. It works with log files. Every log file must have:

  1. Case ID (order ID)
  2. Activity (purchased, Request, rejected, …)
  3. Time stamp

Process mining Internet of events
Big data Internet of contents (google, Wikipedia)
Social media Internet of people
Cloud Internet of things
Mobility Internet of places

 Big data issue:

  • Volume (data size)
  • Velocity (speed of change)
  • Variety (different forms of sources)
  • Veracity (uncertainty of data)

Data science questions:

  • What happened
  • Why did it happen
  • What will happen
  • What is the best that can happen?

Process mining questions:

  • What is the process that people really follow?
  • What are the bottlenecks in the process?
  • Where do people deviate from the expected?