Azure Anomaly Detector detecting _detection

Azure Anomaly Detector (AAD) provides you with a service to determine inclinations (anomalies) in your time-series data. If you want to analyse the canonical set of points (double-time) then there is nothing tricky: just use as is. But if you want to analyse something unusual, then you have to perform several steps to transform your data into the AAD format and use it. Consider how to use AAD to determine whether an incoming HTTP request is suspicious or not.

Source

As a source for the data to analyse we will use Azure Log Analytics. This is a native Azure logging product you most probably use. In any case, you can adjust this approach to your data source. Consider Azure Function HTTP request log data.

Go AzureFunction -> AppInsights -> Logs

There is the standard source “requests” on the left panel. Basic query enumerates all possible columns so we need to amend the query:

traces
    | project
        timestamp,
        hash_many(
            client_StateOrProvince,
            client_CountryOrRegion,
            client_City,
            client_Type,
            client_IP) % 10000
    | order by timestamp asc

What we want here is to get some user’s client-specific data to understand (for example) from where a request came from. Here we have geo-data plus IP. The idea is if we face with a request from an unusual place we do not want to response on it, or response but notify folks.

To make a point to feed AAD we need two: a double and a timestamp. The time we have by default, we need to produce a double from the data. To do it let’s hash the data.

There are two functions to hash several values:

  • hash_combine() – Combines hash values of two or more hashes.
  • hash_many() – Returns a combined hash value of multiple values.

The difference:

hash_combine()
hash_many()

The answer is on the surface. The last stroke is normalisation X mod N. We don’t want to operate with such big numbers, so do X mod 10000.

Backend

On the backend, you need to handle incoming data and transform it into the array of Points (timestamp, double). Here you should keep in mind you send JSON to the AAD service and it’s very sensitive to the data it receives. For the timestamp, it awaits to get the following format:

"timestamp": "1972-01-01T00:00:00Z"

and there is no way for you to break it. So I recommend using this service: https://westus2.dev.cognitive.microsoft.com/docs/services/AnomalyDetector/operations/post-timeseries-entire-detect to check the data you are going to send and verify it. If there are any errors, this service provides you verbose error you can fix.

For the case presented here it’s necessary to hack a date to match destination format:

var dt = DateTime.Parse(item[timePropPos].ToString());
dt = dt.AddTicks(-(dt.Ticks % TimeSpan.TicksPerSecond));

Conclusion

And that is pretty much it. Nothing complex but nevertheless some tricks are required to escape format struggling.


Links


Read other Azure related articles by tag Azure

Leave a Reply