;; Regular Expressions in Clojure ;; Every time I want to use a regex in clojure I find myself having to ;; learn how to do it again. ;; for some reason the functions just won't stick in my mind. ;; There are six functions starting with re- in clojure.core: (def d (with-out-str (doc re-seq) (doc re-pattern) (doc re-find) (doc re-groups) (doc re-matcher) (doc re-matches))) ;; This is probably the function that you want: (re-seq #"f.nd" d) ;-> ("find" "find" "find" "find" "find") ;; re-pattern is for making regular expressions out of strings (re-pattern "f.nd") ;-> #"f.nd" ;; But normally we'd just use the syntactic sugar #"f.nd" directly ;; The return type is dependent on the regex in a nasty way: (re-seq #"f..d" d) ;-> ("find" "find" "find" "find" "find") (re-seq #"f(.)nd" d) ;-> (["find" "i"] ["find" "i"] ["find" "i"] ["find" "i"] ["find" "i"]) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Here are some classic example regexes (ripped off from ;; http://myregexp.com/examples.html), because I can never remember ;; how the more complicated cases work: ;; Behold the non-capturing group (?:...), the alternation | , the ;; greedy 0 or 1 ?, the greedy 3 {3} and the word boundary \b: (re-seq #"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b" "Here's an ip address:111.123.0.127, and here is another 136.54.23.108 but this is not one 1.1.1.257 and neither is this: 243.1.231 or this 1.1.1.1.1 but is this?255.000.1.255") ;; Here we demonstrate the both the 'repeated capturing group' problem for MAC addresses (re-seq #"^([0-9a-fA-F][0-9a-fA-F]:){5}([0-9a-fA-F][0-9a-fA-F])$" "AA:0a:be:23:01:02") ;-> (["AA:0a:be:23:01:02" "01:" "02"]) ;; And a use case for re-pattern, together with the (?m) multiline flag (re-seq (re-pattern (str "(?m)^" (clojure.string/join ":" (repeat 6 "([0-9a-fA-F][0-9a-fA-F])")) "$")) "AA:0a:be:23:01:02\nAA:0a:be:123:01:02\nAG:0a:be:23:01:02\n00:01:02:03:04:55\nAA:0a:be:23:0:02:AA\n:0a:be:23:01:02\n") ;-> (["AA:0a:be:23:01:02" "AA" "0a" "be" "23" "01" "02"] ;-> ["00:01:02:03:04:55" "00" "01" "02" "03" "04" "55"]) (re-seq #"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b" "Here's an ip address:111.123.0.127, and here is another 136.54.23.108 but this is not one 1.1.1.257 and neither is this: 243.1.231 or this 1.1.1.1.1 but is this?255.000.1.255") (re-seq #"\b([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}\b" "Amongst the worlds domain names are such as www.google.com, ssh.aspden.com, aspden.com, aspden.co.uk, 123.com, ssh.123.com and .com") ;; The other four functions in clojure seem to be to part of the implementation ;; of re-seq, and directly manipulate highly stateful ;; java.util.regex.Matcher objects ;; re-find, re-groups, re-matcher, and re-matches seem to be only ;; useful as part of re-seq, and I wonder if they should be ;; deprecated, or moved to a namespace that needs to be specifically ;; imported. ;; Although if you just want the first match then : (re-find #"f.nd" d) ;; seems to be an acceptable alternative to: (first (re-seq #"f.nd" d))
Blog Archive
-
▼
2013
(52)
-
▼
January
(10)
- Ring: Bringing it All Back Home
- A Warm Welcome to Readers from China (你好世界)
- Regular Expressions in Clojure
- Understanding Ring : A Web App that Keeps its Data...
- Maths Tuition (Cambridge STEP)
- Ring: Using Sessions in a Web Application
- How Sessions Work in Ring
- How Cookies Work in Ring
- Really Rather Overdoing the Interest in How Parame...
- Getting Started with Ring
-
▼
January
(10)
Search This Blog
Friday, January 25, 2013
Regular Expressions in Clojure
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment